---
title: Hadoop完全分散環境構築メモ
tags: []
categories: ["Middleware", "DistributedSystem", "Hadoop"]
date: 2010-09-14T17:25:08Z
updated: 2010-09-21T17:24:45Z
---
構築中。まだメモの途中なので。
### マシン構成
- themis ... namenode01
- maria ... datanode01
- pallas ... datanode02
### update-alternatives
$ sudo cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.cluster
$ sudo update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50
$ update-alternatives --display hadoop-0.20-conf
hadoop-0.20-conf - auto mode
リンクは現在 /etc/hadoop-0.20/conf.cluster を指しています
/etc/hadoop-0.20/conf.cluster - 優先度 50
/etc/hadoop-0.20/conf.empty - 優先度 10
/etc/hadoop-0.20/conf.pseudo - 優先度 30
現在の `最適' バージョンは /etc/hadoop-0.20/conf.cluster です
### 設定ファイルいろいろ
#### core-site.xml
hosts.include, hosts.excludeは作成しないとnamenode起動時に怒られる
themis
maria pallas
ポートは全開
hadoopユーザでフォーマット。
maki@themis:~$ sudo su -s /bin/bash - hadoop -c "hadoop namenode -format"
$ sudo /etc/init.d/hadoop-0.20-namenode start $ sudo /etc/init.d/hadoop-0.20-jobtracker start
$ sudo /etc/init.d/hadoop-0.20-datanode start $ sudo /etc/init.d/hadoop-0.20-tasktracker start---- 次のようなスクリプトを書いておくと便利 #### node.sh #!/bin/sh NAMENODE="themis" DATANODE="maria pallas" NODE="$NAMENODE $DATANODE" #### sendAll.sh #!/bin/sh . ~/node.sh for n in $NODE; do CMD="ssh $n $*" echo "== $n ==" $CMD; done 実行例 $ ./sendAll.sh hostname == themis == themis == maria == maria == pallas == pallas #### sync.sh #!/bin/sh . ~/node.sh for n in $NODE; do if [ `hostname` != "$n" ];then CMD="sudo rsync --progress -av /etc/hadoop/conf.cluster $n:/etc/hadoop/conf.cluster" echo "== $n ==" echo $CMD $CMD fi done 実行例(マスターを変更すれば全部同期する) $ ./sync.sh == maria == sudo rsync --progress -av /etc/hadoop/conf.cluster/capacity-scheduler.xml /etc/hadoop/conf.cluster/configuration.xsl /etc/hadoop/conf.cluster/core-site.xml /etc/hadoop/conf.cluster/core-site.xml~ /etc/hadoop/conf.cluster/fair-scheduler.xml /etc/hadoop/conf.cluster/hadoop-env.sh /etc/hadoop/conf.cluster/hadoop-metrics.properties /etc/hadoop/conf.cluster/hadoop-policy.xml /etc/hadoop/conf.cluster/hdfs-site.xml /etc/hadoop/conf.cluster/hdfs-site.xml~ /etc/hadoop/conf.cluster/log4j.properties /etc/hadoop/conf.cluster/mapred-site.xml /etc/hadoop/conf.cluster/mapred-site.xml~ /etc/hadoop/conf.cluster/masters /etc/hadoop/conf.cluster/masters~ /etc/hadoop/conf.cluster/slaves /etc/hadoop/conf.cluster/slaves~ /etc/hadoop/conf.cluster/ssl-client.xml.example /etc/hadoop/conf.cluster/ssl-server.xml.example maria:/etc/hadoop/conf.cluster sending incremental file list sent 382 bytes received 12 bytes 788.00 bytes/sec total size is 24285 speedup is 61.64 == pallas == sudo rsync --progress -av /etc/hadoop/conf.cluster/capacity-scheduler.xml /etc/hadoop/conf.cluster/configuration.xsl /etc/hadoop/conf.cluster/core-site.xml /etc/hadoop/conf.cluster/core-site.xml~ /etc/hadoop/conf.cluster/fair-scheduler.xml /etc/hadoop/conf.cluster/hadoop-env.sh /etc/hadoop/conf.cluster/hadoop-metrics.properties /etc/hadoop/conf.cluster/hadoop-policy.xml /etc/hadoop/conf.cluster/hdfs-site.xml /etc/hadoop/conf.cluster/hdfs-site.xml~ /etc/hadoop/conf.cluster/log4j.properties /etc/hadoop/conf.cluster/mapred-site.xml /etc/hadoop/conf.cluster/mapred-site.xml~ /etc/hadoop/conf.cluster/masters /etc/hadoop/conf.cluster/masters~ /etc/hadoop/conf.cluster/slaves /etc/hadoop/conf.cluster/slaves~ /etc/hadoop/conf.cluster/ssl-client.xml.example /etc/hadoop/conf.cluster/ssl-server.xml.example pallas:/etc/hadoop/conf.cluster sending incremental file list sent 382 bytes received 12 bytes 262.67 bytes/sec total size is 24285 speedup is 61.64 [ここ][1]読め [1]: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/