2013년 8월 22일 목요일

Hadoop multiple mapper | reducer

하나의 하둡 잡에는 여러개의 맵퍼는 가능하지만 리듀서는 하나만 가능하다.
From : http://stackoverflow.com/questions/11122832/hadoop-mapreduce-possible-to-define-two-mappers-and-reducers-in-one-hadoop-job
You can have multiple mappers, but in one job, you can only have one reducer. And the features you need are MultipleInputMultipleOutput and GenericWritable.
Using MultipleInput, you can set the mapper and the corresponding inputFormat. Here is my post about how to use it.
Using GenericWritable, you can separate different input classes in the reducer. Here is my post about how to use it.
Using MultipleOutput, you can output different classes in the same reducer.

여러개의 리듀서를 사용하고 싶다면 잡을 연결시키면 된다. (다른 방법도 있을지도)
먼저 하둡 잡을 잘 설명한 내용이 있어서 가져와봤다.
From: http://stackoverflow.com/questions/12872590/hadoop-streaming-chaining-jobs
                              /  \
                             /    \
                            /      \
                           /        \
                Configuration       Execution
                     /\                 |
                    /  \                |
                   /    \   executable or script files
                  /      \
                 /        \
                /          \
  hadoopEnvironment     userEnvironment
           |                   /\
           |                  /  \
           |                 /    \ 
    $HADOOP_HOME/conf       /      \
                           /        \   
                genericOptions   streamingOptions
                      |                 |
                      |                 | 
            GenericOptionsParser StreamJob 

From : http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/IdentityMapper.html
From: http://stackoverflow.com/questions/9749655/map-map-reduce-reduce-final-output
send the result of this job to another job, and set the mapper to IdentityMapper and the reducer to the second phase reducer that you have.
두개의 잡을 이어줄 때 IdentityMapper를 이용해서 결과값을 전달해 줄 수 있다고 한다.

참조: 하둡소개-http://www.slideshare.net/KeeyongHan/hadoop-introduction-10
맵리듀스 체이닝-http://gandhigeet.dinstudio.com/blog_1_7.html
야후 맵리듀스 예제-http://developer.yahoo.com/hadoop/tutorial/module4.html

