From : http://stackoverflow.com/questions/11122832/hadoop-mapreduce-possible-to-define-two-mappers-and-reducers-in-one-hadoop-job
You can have multiple mappers, but in one job, you can only have one reducer. And the features you need areMultipleInput
,MultipleOutput
andGenericWritable
.UsingMultipleInput
, you can set the mapper and the corresponding inputFormat. Here is my post about how to use it.UsingGenericWritable
, you can separate different input classes in the reducer. Here is my post about how to use it.UsingMultipleOutput
, you can output different classes in the same reducer.
여러개의 리듀서를 사용하고 싶다면 잡을 연결시키면 된다. (다른 방법도 있을지도)
먼저 하둡 잡을 잘 설명한 내용이 있어서 가져와봤다.
From: http://stackoverflow.com/questions/12872590/hadoop-streaming-chaining-jobs
hadoopJob
/\
/ \
/ \
/ \
/ \
Configuration Execution
/\ |
/ \ |
/ \ executable or script files
/ \
/ \
/ \
hadoopEnvironment userEnvironment
| /\
| / \
| / \
$HADOOP_HOME/conf / \
/ \
genericOptions streamingOptions
| |
| |
GenericOptionsParser StreamJob
From : http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/IdentityMapper.html
From: http://stackoverflow.com/questions/9749655/map-map-reduce-reduce-final-output
send the result of this job to another job, and set the mapper to IdentityMapper and the reducer to the second phase reducer that you have.두개의 잡을 이어줄 때 IdentityMapper를 이용해서 결과값을 전달해 줄 수 있다고 한다.
참조: 하둡소개-http://www.slideshare.net/KeeyongHan/hadoop-introduction-10
맵리듀스 체이닝-http://gandhigeet.dinstudio.com/blog_1_7.html
야후 맵리듀스 예제-http://developer.yahoo.com/hadoop/tutorial/module4.html
댓글 없음:
댓글 쓰기