programming repository: Hadoop mode - 하둡 모드, 각 요소 노트

하둡은 Stand alone, Pseudo-distributed, Fully-distributed 총 3가지 모드로 설치할 수 있다.
(설치 후 설정 파일을 어떻게 설정하냐에 따라서)

Stand alone mode와 Pseudo-distributed mode의 공통점은 하나의 머신에서 실행된다는 것이다.
하지만 stand alone 모드는 tasktacker와 namenode, datanode, jobtracker 모두 하나의 JVM에서 실행된다는 것이 다르다. (Pseudo-distributed mode는 각각의 JVM에서 실행된다)
그래서 stand alone 모드에서는 데이터의 직렬화가 크게 중요한 문제는 아니다. 하지만 가상 분산 모드에서는 맵퍼와 리듀서간에 직렬화된 데이터를 주고 받으므로 직렬화가 중요하다.
(그렇다는 내용을 StackOverflow에서 봤었다)

하둡은 마스터와 슬레이브 구조로 이루어져 있는데
마스터는 Namenode, Jobtracker (추가적으로 SecondaryNamenode) 로 이루어져 있고
슬레이브는 Datanode, Tasktracker로 이루어져 있다.
마스터/슬레이브의 설정, Namenode, Datanode 설정 Job/Task tracker 설정 등 모두 하둡의 conf 폴더에서 직접 설정 해 줄 수 있다.

각 요소들의 기능
Copy From : http://amalgjose.wordpress.com/2012/12/08/making-a-pseudo-distributed-hadoop-cluster/

NameNodes

Name node is the master server of the cluster. It doesnot store any file but knows where the blocks are stored in the child nodes and can give pointers and can re-assemble .Namenodes comes up with two features say Fsimage and the edit log.FSImage   and edit log

Features

Highly memory intensive

Keeping it safe and isolated is necessary

Manages the file system namespaces

DataNodes

Child nodes are attached to the main node.

Features:

Data node has a configuration file to make itself available in the cluster .Again they stores data regarding storage capacity(Ex:5 out f 10 is available) of   that particular data node.

Data nodes are independent ,since they are not pointing to any other data nodes.

Manages the storage attached to the node.

There will be multiple data nodes in a cluster.

Job Tracker

Schedules and assign task to the different datanodes.

Work Flow

Takes the request.

Assign the task.

Validate the requested work.

Checks whether all the data nodes are working properly.

If not, reschedule the tasks.

Task Tracker

Job Tracker and task tracker   works   in a master slave model. Every datanode has got a task tracker which actually performs the task which ever assigned to it by the Job tracker.

Secondary Name Node

Secondaryname node is not a redundant namenode but this actually provides the check pointing and housekeeping tasks periodically.

programming repository

2013년 8월 27일 화요일

Hadoop mode - 하둡 모드, 각 요소 노트

댓글 없음:

댓글 쓰기