--- title: Hadoop完全分散環境構築メモ tags: [] categories: ["Middleware", "DistributedSystem", "Hadoop"] date: 2010-09-14T17:25:08Z updated: 2010-09-21T17:24:45Z --- 構築中。まだメモの途中なので。 ### マシン構成 - themis ... namenode01 - maria ... datanode01 - pallas ... datanode02 ### update-alternatives $ sudo cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.cluster $ sudo update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50 $ update-alternatives --display hadoop-0.20-conf hadoop-0.20-conf - auto mode リンクは現在 /etc/hadoop-0.20/conf.cluster を指しています /etc/hadoop-0.20/conf.cluster - 優先度 50 /etc/hadoop-0.20/conf.empty - 優先度 10 /etc/hadoop-0.20/conf.pseudo - 優先度 30 現在の `最適' バージョンは /etc/hadoop-0.20/conf.cluster です ### 設定ファイルいろいろ #### core-site.xml fs.default.name hdfs://themis:54310 hadoop.tmp.dir /var/lib/hadoop-0.20/cache/${user.name} #### hdfs-site.xml dfs.replication 2 dfs.permissions true dfs.hosts ${hadoop.tmp.dir}/hosts.include dfs.hosts.exclude ${hadoop.tmp.dir}/hosts.exclude dfs.http.address themis:50070 dfs.name.dir /var/lib/hadoop-0.20/cache/hadoop/dfs/name #### mapred-site.xml mapred.job.tracker themis:54311 mapred.hosts ${hadoop.tmp.dir}/hosts.include mapred.hosts.exclude ${hadoop.tmp.dir}/hosts.exclude

hosts.include, hosts.excludeは作成しないとnamenode起動時に怒られる

masters

themis

slaves

maria
pallas

ポートは全開

namenodeをフォーマット

hadoopユーザでフォーマット。

maki@themis:~$ sudo su -s /bin/bash - hadoop -c "hadoop namenode -format"

スタート

namenode

$ sudo /etc/init.d/hadoop-0.20-namenode start
$ sudo /etc/init.d/hadoop-0.20-jobtracker start

datanode

$ sudo /etc/init.d/hadoop-0.20-datanode start
$ sudo /etc/init.d/hadoop-0.20-tasktracker start

---- 次のようなスクリプトを書いておくと便利 #### node.sh #!/bin/sh NAMENODE="themis" DATANODE="maria pallas" NODE="$NAMENODE $DATANODE" #### sendAll.sh #!/bin/sh . ~/node.sh for n in $NODE; do CMD="ssh $n $*" echo "== $n ==" $CMD; done 実行例 $ ./sendAll.sh hostname == themis == themis == maria == maria == pallas == pallas #### sync.sh #!/bin/sh . ~/node.sh for n in $NODE; do if [ `hostname` != "$n" ];then CMD="sudo rsync --progress -av /etc/hadoop/conf.cluster $n:/etc/hadoop/conf.cluster" echo "== $n ==" echo $CMD $CMD fi done 実行例(マスターを変更すれば全部同期する) $ ./sync.sh == maria == sudo rsync --progress -av /etc/hadoop/conf.cluster/capacity-scheduler.xml /etc/hadoop/conf.cluster/configuration.xsl /etc/hadoop/conf.cluster/core-site.xml /etc/hadoop/conf.cluster/core-site.xml~ /etc/hadoop/conf.cluster/fair-scheduler.xml /etc/hadoop/conf.cluster/hadoop-env.sh /etc/hadoop/conf.cluster/hadoop-metrics.properties /etc/hadoop/conf.cluster/hadoop-policy.xml /etc/hadoop/conf.cluster/hdfs-site.xml /etc/hadoop/conf.cluster/hdfs-site.xml~ /etc/hadoop/conf.cluster/log4j.properties /etc/hadoop/conf.cluster/mapred-site.xml /etc/hadoop/conf.cluster/mapred-site.xml~ /etc/hadoop/conf.cluster/masters /etc/hadoop/conf.cluster/masters~ /etc/hadoop/conf.cluster/slaves /etc/hadoop/conf.cluster/slaves~ /etc/hadoop/conf.cluster/ssl-client.xml.example /etc/hadoop/conf.cluster/ssl-server.xml.example maria:/etc/hadoop/conf.cluster sending incremental file list sent 382 bytes received 12 bytes 788.00 bytes/sec total size is 24285 speedup is 61.64 == pallas == sudo rsync --progress -av /etc/hadoop/conf.cluster/capacity-scheduler.xml /etc/hadoop/conf.cluster/configuration.xsl /etc/hadoop/conf.cluster/core-site.xml /etc/hadoop/conf.cluster/core-site.xml~ /etc/hadoop/conf.cluster/fair-scheduler.xml /etc/hadoop/conf.cluster/hadoop-env.sh /etc/hadoop/conf.cluster/hadoop-metrics.properties /etc/hadoop/conf.cluster/hadoop-policy.xml /etc/hadoop/conf.cluster/hdfs-site.xml /etc/hadoop/conf.cluster/hdfs-site.xml~ /etc/hadoop/conf.cluster/log4j.properties /etc/hadoop/conf.cluster/mapred-site.xml /etc/hadoop/conf.cluster/mapred-site.xml~ /etc/hadoop/conf.cluster/masters /etc/hadoop/conf.cluster/masters~ /etc/hadoop/conf.cluster/slaves /etc/hadoop/conf.cluster/slaves~ /etc/hadoop/conf.cluster/ssl-client.xml.example /etc/hadoop/conf.cluster/ssl-server.xml.example pallas:/etc/hadoop/conf.cluster sending incremental file list sent 382 bytes received 12 bytes 262.67 bytes/sec total size is 24285 speedup is 61.64 [ここ][1]読め [1]: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/