Hadoop详细安装配置过程.pdf
HadoopHadoop学习第一步之基础环境搭建学习第一步之基础环境搭建1.下载并安装ubuntukylin-15.10-desktop-amd64.iso2.安装sshsudo apt-get install openssh-server openssh-client3.搭建vsftpd#sudo apt-get update#sudo apt-get install vsftpd配置参考.linuxidc./Linux/2015-01/111970.htmjingyan.baidu./article/67508eb4d6c4fd9ccb1ce470.htmlzhidao.baidu./link?url=vEmPmg5sV6IUfT4qZqivtiHtXWUoAQalGAL7bOC5XrTumpLRDfa-OmFcTzPetNZUqAi0hgjBGGdpnldob6hL5IhgtGVWDGSmS88iLvhCO4Cvsftpd的开始、关闭和重启$sudo/etc/init.d/vsftpd start#开始$sudo/etc/init.d/vsftpd stop#关闭$sudo/etc/init.d/vsftpd restart#重启4.安装jdk1.7sudo chown-R hadoop:hadoop/optcp/soft/jdk-7u79-linux-x64.gz/optsudo vi/etc/profilealias untar=tar-zxvfsudo source/etc/profilesource/etc/profileuntar jdk*环境变量配置#vi/etc/profile在profile文件最后加上#set java environmentexport JAVA_HOME=/opt/jdk1.7.0_79export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexport PATH=$JAVA_HOME/bin:$PATH配置完成后,保存退出。不重启,更新命令#source/etc/profile测试是否安装成功#Java version其他问题:其他问题:1.sudo1.sudo 出现出现unable to resolve hostunable to resolve host 解决方法解决方法参考参考 Starting sendmailStarting sendmail 不动了的解决方案不动了的解决方案参考参考 安装软件时出现安装软件时出现 E:Unable to locate package vsftpdE:Unable to locate package vsftpd参考参考 vi/vim4.Linux/Ubuntu vi/vim 使用方法讲解使用方法讲解参考参考.cnblogs./emanlee/archive/2011/11/10/blogs./emanlee/archive/2011/11/10/2243930.html分类:Hadoop-克隆克隆mastermaster虚拟机至虚拟机至node1node1、node2node2分别修改master的主机名为master、node1的主机名为node1、node2的主机名为node2(启动node1、node2系统默认分配递增ip,无需手动修改)分别修改/etc/hosts中的ip和主机名(包含其他节点ip和主机名)-配置配置sshssh免密码连入免密码连入hadoopnode1:$ssh-keygen-t dsa-P -f/./.ssh/id_dsaGenerating publicpublic/privateprivate dsa keykey pair.Created directory/home/hadoop/.ssh.Your identification has been saved inin/home/hadoop/./.ssh/id_dsa.Your publicpublic keykey has been saved inin/home/hadoop/./.ssh/id_dsa.pub.The keykey fingerprint isis:SHA256:B8vBju/uc3kl/v9lrMqtltttttCcXgRkQPbVoU hadoopnode1The keykeys randomart image is:+-DSA 1024-+|.o.o.|o+.+.E.|.oo+|.+|o+.+.o ooo+|+|=|=o.o.ooo.o.|.|*|*o.+=.+=o.+.+|.+.+|+-SHA256-+hadoopnode1:$cd.sshhadoopnode1:/.:/.ssh$ll总用总用量 16drwx-2 hadoop hadoop 4096 Jul 24 20:31./drwxr-xr-x 18 hadoop hadoop 4096 Jul 24 20:31././-rw-1 hadoop hadoop 668 Jul 24 20:31 id_dsa-rw-r-r-1 hadoop hadoop 602 Jul 24 20:31 id_dsa.pubhadoopnode1:/.:/.ssh$cat id_dsa.pub authorized_keyshadoopnode1:/.:/.ssh$ll总用总用量 20drwx-2 hadoop hadoop 4096 Jul 24 20:32./drwxr-xr-x 18 hadoop hadoop 4096 Jul 24 20:31././-rw-rw-r-1 hadoop hadoop 602 Jul 24 20:32 authorized_keys-rw-1 hadoop hadoop 668 Jul 24 20:31 id_dsa-rw-r-r-1 hadoop hadoop 602 Jul 24 20:31 id_dsa.pub单机机回环ssh免密码登登录测试hadoopnode1:/.:/.ssh$ssh localhostThe authenticity ofof hosthost localhost(127.0.0.1)cant be established.ECDSA keykey fingerprint isisSHA256:daO0dssyqt12tt9yGUauImOh6tt6A1SgxzSfSmpQqJVEiQTxas.AreAre you sure you want toto continuecontinue connecting(yes/nono)?)?yesWarning:Permanently added localhost(ECDSA)toto the list ofof known hosts.Welcome toto Ubuntu 15.10(GNU/Linux 4.2.0-16-generic x86_64)*Documentation:https:/help.ubuntu./270 packages can be updated.178 updates areare security updates.NewNew releaserelease 16.04 LTS available.Run do-release-upgrade toto upgrade toto it.LastLast login:Sun Jul 24 20:21:39 2016 fromfrom 192.168.219.1hadoopnode1:$exitexit注销ConnectionConnection toto localhost closed.hadoopnode1:/.:/.ssh$出现以上信息说明操作成功,其他两个个节点同样操作让让主主结点结点(mastermaster)能通过能通过SSHSSH免密码免密码登登录两录两个个子结点子结点(slaveslave)hadoopnode1hadoopnode1:/.:/.ssh$scp hadoopmaster:/.:/.ssh/id_dsa.pub././master_dsa.pubThe authenticity ofof hosthost master(192.168.219.128)cant be established.ECDSA keykey fingerprint isisSHA256:daO0dssyqtt9yGUuImOh646A1SgxzSfatSmpQqJVEiQTxas.AreAre you sure you want toto continuecontinue connecting(yes/nono)?)?yesWarning:Permanently added master,192.168.219.128(ECDSA)toto the list ofofknown hosts.hadoopmasters password:id_dsa.pub100%6030.6KB/s00:00hadoopnode1hadoopnode1:/.:/.ssh$cat master_dsa.pub authorized_keys如上过程如上过程显示显示了了node1node1结点通过结点通过scpscp命命令远程令远程登登录录mastermaster结点结点,并复制并复制mastermaster的公钥文件到当前的公钥文件到当前的目录下的目录下,这一过程需要密码验证这一过程需要密码验证。接着接着,将将mastermaster结点的公钥文件追加至结点的公钥文件追加至authorized_keysauthorized_keys文件文件中中,通通过这步操作过这步操作,如果不出问题如果不出问题,mastermaster结点就可以通过结点就可以通过sshssh远程免密码连接远程免密码连接node1node1结点了结点了。在在mastermaster结点结点中中操作操作如下如下:hadoopmasterhadoopmaster:/.:/.ssh$ssh node1The authenticity ofof hosthost node1(192.168.219.129)cant be established.ECDSA keykey fingerprint isisSHA256:daO0dssyqt9yGUuImOh3466A1SttgxzSfSmpQqJVEiQTxas.AreAre you sure you want toto continuecontinue connecting(yes/nono)?)?yesWarning:Permanently added node1,192.168.219.129(ECDSA)toto the list ofofknown hosts.Welcome toto Ubuntu 15.10(GNU/Linux 4.2.0-16-generic x86_64)*Documentation:https:/help.ubuntu./270 packages can be updated.178 updates areare security updates.NewNew releaserelease 16.04 LTS available.Run do-release-upgrade toto upgrade toto it.LastLast login:Sun Jul 24 20:39:30 2016 fromfrom 192.168.219.1hadoopnode1hadoopnode1:$exitexit注销ConnectionConnection toto node1 closed.hadoopmaster:/.:/.ssh$由上图可以看出由上图可以看出,node1node1结点首结点首次次连接时需要连接时需要,“YES”“YES”确认连接确认连接,这意味着这意味着mastermaster结点连接结点连接node1node1结点时需要人工询问结点时需要人工询问,无法自动连接无法自动连接,输入输入yesyes后成功接入后成功接入,紧接着注销退出至紧接着注销退出至mastermaster结点结点。要实现要实现sshssh免密码连接至其它结点免密码连接至其它结点,还差一步还差一步,只需要再执行一遍只需要再执行一遍ssh node1ssh node1,如果没有要求你输入如果没有要求你输入”yes”yes”,就算成功了就算成功了,过程如下过程如下:hadoopmaster:/.:/.ssh$ssh node1Welcome toto Ubuntu 15.10(GNU/Linux 4.2.0-16-generic x86_64)*Documentation:https:/help.ubuntu./270 packages can be updated.178 updates areare security updates.NewNew releaserelease 16.04 LTS available.Run do-release-upgrade toto upgrade toto it.LastLast login:Sun Jul 24 20:47:20 2016 fromfrom 192.168.219.128hadoopnode1:$exitexit注销ConnectionConnection toto node1 closed.hadoopmaster:/.:/.ssh$如上图所示示,master已经可以通过ssh免密码登登录至node1结点了。对node2结点也可以用用上面同样的方法进行表面上看,这两个个结点的ssh免密码登登录已经配置成功,但是是我们还需要对主主结点master也要进行上面的同样工作,这一步有点让人困惑,但是是这是是有原因的,具体原因现在在也说不太好太好,据说是是真实物理结点时需要做这项工作,因为为jobtracker有可能会分布在在其它结点上,jobtracker有不存在在master结点上的可能性。对master自身进行ssh免密码登登录测试工作:hadoopmaster:/.:/.ssh$scp hadoopmaster:/.:/.ssh/id_dsa.pub././master_dsa.pubThe authenticity ofof hosthost master(127.0.0.1)cant be established.ECDSA keykey fingerprint isisSHA256:daO0dssttqt9yGUuImOahtt166AgxttzSfSmpQqJVEiQTxas.AreAre you sure you want toto continuecontinue connecting(yes/nono)?)?yesWarning:Permanently added master(ECDSA)toto the list ofof known hosts.id_dsa.pub100%6030.6KB/s00:00hadoopmaster:/.:/.ssh$cat master_dsa.pub authorized_keyhadoopmaster:/.:/.ssh$ssh masterWelcome toto Ubuntu 15.10(GNU/Linux 4.2.0-16-generic x86_64)*Documentation:https:/help.ubuntu./270 packages can be updated.178 updates areare security updates.NewNew releaserelease 16.04 LTS available.Run do-release-upgrade toto upgrade toto it.LastLast login:Sun Jul 24 20:39:24 2016 fromfrom 192.168.219.1hadoopmaster:$exitexit注销ConnectionConnection toto master closed.至此,SSH免密码登登录已经配置成功。-解压hadoop-2.6.4.tar.gz/opt$untar hadoop-2.6.4.tar.gzmv hadoop-2.6.4.tar.gz hadoop然后更新环境变量然后更新环境变量vi/etc/profileexport JAVA_HOME=/=/opt/jdk1.7.0_79export CLASSPATH=.:=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexport HADOOP_HOME=/=/opt/hadoopexport PATHPATH=$PATHPATH:$JAVA_HOME/binbin:$HADOOP_HOME/binbin:$HADOOP_HOME/sbinexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS=-Djava.library.path=$HADOOP_HOME/lib/nativealias untar=tar-zxvfalias viprofile=vi/etc/profilealias sourceprofile=source/etc/profilealias catprofile=cat/etc/profilealias cdhadoop=cd/opt/hadoop/alias startdfs=$HADOOP_HOME/sbin/start-dfs.shalias startyarn=$HADOOP_HOME/sbin/start-yarn.shalias stopdfs=$HADOOP_HOME/sbin/stop-dfs.shalias stopyarn=$HADOOP_HOME/sbin/stop-yarn.shsource/etc/profile-步骤六步骤六:修改配置修改配置一共有7个个文件要修改:$HADOOP_HOME/etc/hadoop/hadoop-env.sh$HADOOP_HOME/etc/hadoop/yarn-env.sh$HADOOP_HOME/etc/hadoop/core-site.xml$HADOOP_HOME/etc/hadoop/hdfs-site.xml$HADOOP_HOME/etc/hadoop/mapred-site.xml$HADOOP_HOME/etc/hadoop/yarn-site.xml$HADOOP_HOME/etc/hadoop/slaves其中中$HADOOP_HOME表示示hadoop根目录a)hadoopa)hadoop-envenv.shsh、yarnyarn-envenv.shsh这二个个文件主主要是是修改JAVA_HOME后的目录,改成实际本机本机jdk所在在目录位置vi etc/hadoop/hadoop-env.sh(及 vi etc/hadoop/yarn-env.sh)找找到下面这行的位置,改成(jdk目录位置,大家根据实际情况修改)export JAVA_HOME=/=/opt/jdk1.7.0_79另外 hadoop-env.sh中中,建议加上这句:export HADOOP_PREFIX=/=/opt/hadoopb)coreb)core-sitesite.xmlxml 参考下面的容修改参考下面的容修改:?fs.defaultFS/hdfs:/:/master:9000/hadoop.tmp.dir/opt/hadoop/tmp/注:/opt/hadoop/tmp 目录如不存在在,则先mkdir手动创建core-site.xml的完整参数请参考hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/core-defaudefaultlt.xmlc)hdfsc)hdfs-sitesite.xmlxml?dfs.datanode.ipc.address/0.0.0.0:50020/dfs.datanode.http.address/0.0.0.0:50075/dfs.replication/2/注:dfs.replication 表示示数据副本本数,一般般不大于 datanode 的节点数。hdfs-site.xml的完整参数请参考hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-defaultdefault.xmld)mapredd)mapred-sitesite.xmlxml?mapreduce.framework.name/yarn/mapred-site.xml的完整参数请参考hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-defaultdefault.xmle)yarne)yarn-sitesite.xmlxml?yarn.nodemanager.aux-services/mapreduce_shuffle/yarn-site.xml的完整参数请参考hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-defaultdefault.xml另外,hadoop 1.x与2.x相比,1.x中中的很多参数已经被标识为为过时,具体可参考hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html最后一个个文件slavesslaves暂时不管(可以先用用mv slaves slaves.bak 将它改名),上述配置弄好好后,就可以在在master上启用启用 NameNode测试了,方法:$HADOOP_HOME/binbin/hdfs namenode formatformat 先先格格式化式化16/07/25。16/07/25 20:34:42 INFO namenode.FSImage:Allocated newnew BlockPoolId:BP-1076359968-127.0.0.1-14008250616/07/25 20:34:42 INFO common.Storage:Storage directory/opt/hadoop/tmp/dfs/name has been successfully formatted.16/07/25 20:34:43 INFO namenode.NNStorageRetentionManager:Going toto retain1 images withwith txid=016/07/25 20:34:43 INFO util.ExitUtil:Exiting withwith status 016/07/25 20:34:43 INFO namenode.NameNode:SHUTDOWN_MSG:/*SHUTDOWN_MSG:Shutting down NameNode at master/127.0.0.1*/等看到这个个时,表示格示格式化ok$HADOOP_HOME/sbin/startstart-dfs.sh启启动完成后,输入jps(ps-ef|grep.)查看进程,如果看到以下二个个进程:5161 SecondaryNameNode4989 NameNode表示示master节点基本本ok了再输入$HADOOP_HOME/sbin/startstart-yarn.sh,完成后,再输入jps查看进程5161 SecondaryNameNode5320 ResourceManager4989 NameNode如果看到这3个个进程,表示示yarn也ok了f)f)修改修改/opt/hadoop/etc/hadoop/slaves/opt/hadoop/etc/hadoop/slaves如果刚才用用mv slaves slaves.bak对该文件重命命名过,先运行 mv slaves.bak slaves 把名字改回来,再vi slaves 编辑该文件,输入node1node2保存退出,最后运行$HADOOP_HOME/sbin/stop-dfs.sh$HADOOP_HOME/sbin/stop-yarn.sh停掉刚才启启动的服务步骤七步骤七:将将mastermaster上的上的hadoophadoop目录复制到目录复制到 node1,node2node1,node2仍然保持在在master机机器上cd 先进入主主目录 cd/optzip-r hadoop.zip hadoopscp-r hadoop.zip hadoopnode1:/:/opt/scp-r hadoop.zip hadoopnode2:/:/opt/unzip hadoop.zip注:node1、node2 上的hadoop临时目录(tmp)及数据目录(datadata),仍然要先手动创建。-步骤八步骤八:验证验证master节点上,重新启启动$HADOOP_HOME/sbin/startstart-dfs.sh$HADOOP_HOME/sbin/startstart-yarn.sh-hadoopmaster:/:/opt/hadoop/sbin$startstart-dfs.shStarting namenodes onon master master:starting namenode,logging toto/opt/hadoop/logs/hadoop-hadoop-namenode-master.outoutnode1:starting datanode,logging toto/opt/hadoop/logs/hadoop-hadoop-datanode-node1.outoutnode2:starting datanode,logging toto/opt/hadoop/logs/hadoop-hadoop-datanode-node2.outoutStarting secondary namenodes 0.0.0.0 0.0.0.0:starting secondarynamenode,logging toto/opt/hadoop/logs/hadoop-hadoop-secondarynamenode-master.outout-hadoopmaster:/:/opt/hadoop/sbin$startstart-yarn.shstarting yarn daemonsstarting resourcemanager,logging toto/opt/hadoop/logs/yarn-hadoop-resourcemanager-master.outoutnode1:starting nodemanager,logging toto/opt/hadoop/logs/yarn-hadoop-nodemanager-node1.outoutnode2:starting nodemanager,logging toto/opt/hadoop/logs/yarn-hadoop-nodemanager-node2.outout-顺利利的话,master节点上有几下3个个进程:ps-ef|grep ResourceManagerps-ef|grep SecondaryNameNodeps-ef|grep NameNode7482 ResourceManager7335 SecondaryNameNode7159 NameNodeslave01、slave02上有几下2个个进程:ps-ef|grep DataNodeps-ef|grep NodeManager2296 DataNode2398 NodeManager同时可浏览:master:50070/master:8088/查看状态另外也可以通过 binbin/hdfs dfsadmin-report 查看hdfs的状态报告其它注意事项:a)master(即:namenode节点)若要重新格格式化,请先清空各datanode上的datadata目录(最好好连tmp目录也一起清空),否否则格格式化完成后,启启动dfs时,datanode会启启动失败败b)如果觉得master机机器上只运行namenode比较浪费,想把master也当成一个个datanode,直接在在slaves文件里,添加一行master即可c)为为了方便操作,可修改/etc/profile,把hadoop所需的lib目录,先加到CLASSPATH环境变量中中,同时把hadoop/binbin,hadoop/sbin目录也加入到PATHPATH变量中中,可参考下面的容(根据实际情况修改):export HADOOP_HOME=/=/home/hadoop/hadoop-2.6.0export JAVA_HOME=/=/usr/javajava/jdk1.7.0_51exportCLASSPATH=.:=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$HADOOP_HOME/shareshare/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_HOME/shareshare/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/shareshare/hadoop/common/lib/commons-cli-1.2.jarexport PATHPATH=$PATHPATH:$JAVA_HOME/binbin:$HADOOP_HOME/sbin:$HADOOP_HOME/binbinby colplay2016.07.25