Java之美[從菜鳥到高手演練]之Linux下Hadoop的完全分布式安裝
來源:程序員人生 發布時間:2015-03-02 08:30:10 閱讀次數:4859次
作者:2青
郵箱:xtfggef@gmail.com 微博:http://weibo.com/xtfggef
本來是想安裝1個單節點的環境就行了,后來按裝完了總覺得不夠過癮,因而今天繼續研究1下,來1個完全散布式的集群安裝。用到的軟件和上1篇單節點安裝Hadoop1樣,以下:
- Ubuntu 14.10 64 Bit Server Edition
- Hadoop2.6.0
- JDK 1.7.0_71
- ssh
- rsync
準備環境
仍然是VirtualBox + Ubuntu 14.10 64 Bit,只不過這次是3個節點,話不多說,下面開始配置準備,基礎環境就不再贅述了,包括安裝JDK,ssh,rsync等,可以參考上1篇。

master 192.168.1.118 nameNode
slave1 192.168.1.189 dataNode1
slave2 192.168.1.116 dataNode2
修改每一個機器的hosts文件,在/etc/hosts文件末尾添加以下配置:
192.168.1.118 master
192.168.1.189 slave1
192.168.1.116 slave2
2015年1月15日補充:
1. 修改每一個機器的hostname,在/etc/hostsname下,將master機器的hostname改成master,以此類推。
2. 賦予當前用戶管理員權限。
3. 將IP地址改成靜態IP,通過修改/etc/network/interfaces文件,添加以下配置:
auto eth0
iface eth0 inet static
address 192.168.1.118
netmask 255.255.255.0
gateway 192.168.1.1
修改完需重啟才能生效。
配置namenode對datanode的無密鑰訪問
直接在namenode控制臺履行以下兩行命令:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
進入namenode的用戶根目錄,再進入.ssh目錄查看生成的文件:authorized_keys, id_dsa, id_dsa.pub

將authorized_keys文件分發到個datanode節點上:

驗證:
ssh 192.168.1.189
ssh 192.168.1.116
ssh slave1
ssh slave2
如果不需要密碼直接進入則OK,否則重新來配。
安裝Hadoop
1. 從官網下載hadoop 2.6.0 tar.gz文件,然后解壓到用戶目錄:tar -zxvf hadoop⑵.6.0.tar.gz.
2. 在解壓后的hadoop⑵.6.0文件夾里創建tmp文件夾。
3. 配置環境變量
添加以下配置信息到/etc/profile文件末尾(每臺機器都要配置)。
# set hadoop path
export HADOOP_HOME=/home/adam/hadoop⑵.6.0
export PATH=$PATH:$HADOOP_HOME/bin
履行. /etc/profile或source /etc/profile使配置生效,然后履行hadoop version查看hadoop版本并且驗證環境變量是不是配置成功。
4. 配置hadoop,進入目錄/home/adam/hadoop⑵.6.0/etc/hadoop
a>. 編輯core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/adam/hadoop⑵.6.0/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
</configuration>
b>. 在hadoop-env.sh和yarn-env.sh里配置JAVA_HOME環境變量以下

3. 編輯hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/adam/hadoop⑵.6.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/adam/hadoop⑵.6.0/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
4. 編輯mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>master:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
5. 編輯yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
6. 編輯slaves文件添加以下兩行:
slave1
slave2
7. 將hadoop文件夾復制到另個slave結點上
啟動Hadoop
1. 格式化namenode
adam@ubuntu:~/hadoop⑵.6.0/bin$ ./hdfs namenode -format
15/01/14 19:29:58 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/60.191.124.254
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath = /home/adam/hadoop⑵.6.0/etc/hadoop:/home/adam/hadoop⑵.6.0/share/hadoop/common/lib/slf4j-log4j12⑴.7.5.jar:/home/adam/hadoop⑵.6.0/share/hadoop/common/lib/jsr305⑴.3.9.jar:/home/adam/h ...
jar:/home/adam/hadoop⑵.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-app⑵.6.0.jar:/home/adam/hadoop⑵.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core⑵.6.0.jar:/home/adam/hadoop⑵.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle⑵.6.0.jar:/home/adam/hadoop⑵.6.0/contrib/capacity-scheduler/*.jar
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1; compiled by 'jenkins' on 2014⑴1⑴3T21:10Z
STARTUP_MSG: java = 1.7.0_71
************************************************************/
15/01/14 19:29:58 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/01/14 19:29:58 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID⑶f81e813⑹04e⑷d60⑼3b1⑼794d7c7c079
15/01/14 19:30:10 INFO namenode.FSNamesystem: No KeyProvider found.
15/01/14 19:30:10 INFO namenode.FSNamesystem: fsLock is fair:true
15/01/14 19:30:10 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/01/14 19:30:10 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
15/01/14 19:30:10 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/01/14 19:30:10 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Jan 14 19:30:10
15/01/14 19:30:10 INFO util.GSet: Computing capacity for map BlocksMap
15/01/14 19:30:10 INFO util.GSet: VM type = 64-bit
15/01/14 19:30:10 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
15/01/14 19:30:10 INFO util.GSet: capacity = 2^21 = 2097152 entries
15/01/14 19:30:10 INFO blockmanagement.BlockManager: dfs.block.
access.token.enable=false
15/01/14 19:30:10 INFO blockmanagement.BlockManager: defaultReplication = 1
15/01/14 19:30:10 INFO blockmanagement.BlockManager: maxReplication = 512
15/01/14 19:30:10 INFO blockmanagement.BlockManager: minReplication = 1
15/01/14 19:30:10 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
15/01/14 19:30:10 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
15/01/14 19:30:10 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/01/14 19:30:10 INFO blockmanagement.BlockManager: encryptDataTransfer = false
15/01/14 19:30:10 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
15/01/14 19:30:10 INFO namenode.FSNamesystem: fsOwner = adam (auth:SIMPLE)
15/01/14 19:30:10 INFO namenode.FSNamesystem: supergroup = supergroup
15/01/14 19:30:10 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/01/14 19:30:10 INFO namenode.FSNamesystem: Determined nameservice ID: hadoop-cluster
15/01/14 19:30:10 INFO namenode.FSNamesystem: HA Enabled: false
15/01/14 19:30:10 INFO namenode.FSNamesystem: Append Enabled: true
15/01/14 19:30:16 INFO util.GSet: Computing capacity for map INodeMap
15/01/14 19:30:16 INFO util.GSet: VM type = 64-bit
15/01/14 19:30:16 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
15/01/14 19:30:16 INFO util.GSet: capacity = 2^20 = 1048576 entries
15/01/14 19:30:16 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/01/14 19:30:16 INFO util.GSet: Computing capacity for map cachedBlocks
15/01/14 19:30:16 INFO util.GSet: VM type = 64-bit
15/01/14 19:30:16 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
15/01/14 19:30:16 INFO util.GSet: capacity = 2^18 = 262144 entries
15/01/14 19:30:16 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
15/01/14 19:30:16 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/01/14 19:30:16 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
15/01/14 19:30:16 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/01/14 19:30:16 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
15/01/14 19:30:16 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/01/14 19:30:16 INFO util.GSet: VM type = 64-bit
15/01/14 19:30:16 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
15/01/14 19:30:16 INFO util.GSet: capacity = 2^15 = 32768 entries
15/01/14 19:30:16 INFO namenode.NNConf: ACLs enabled? false
15/01/14 19:30:16 INFO namenode.NNConf: XAttrs enabled? true
15/01/14 19:30:16 INFO namenode.NNConf: Maximum size of an xattr: 16384
15/01/14 19:30:16 INFO namenode.FSImage: Allocated new BlockPoolId: BP⑴507698623⑹0.191.124.254⑴421235016468
15/01/14 19:30:16 INFO common.Storage: Storage directory /home/adam/hadoop⑵.6.0/dfs/name has been successfully formatted.
15/01/14 19:30:17 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/01/14 19:30:17 INFO util.ExitUtil: Exiting with status 0
15/01/14 19:30:17 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/60.191.124.254
************************************************************/
2. 啟動Hadoop
hadoop/sbin start-all.sh或start-dfs.sh & start-yard.sh
3. 驗證安裝


啟動Job History Server:
root@master:~/hadoop/sbin# ./mr-jobhistory-daemon.sh start historyserver
這樣1個完全的散布式hadoop集群就裝好了,步驟也不多,有興趣的同學可以試1試,有甚么問題歡迎聯系我:
微博:http://weibo.com/xtfggef
郵箱:xtfggef@gmail.com
生活不易,碼農辛苦
如果您覺得本網站對您的學習有所幫助,可以手機掃描二維碼進行捐贈