文章目录0, 常用命令1, 主机角色分配2, 设置主机修改hadoop配置文件2.1 设置主机2.2 修改hadoop配置文件3, 启动服务4, 验证服务报错: hadoop jar xx-example.jar0, 常用命令类别命令hdfs集群验证hdfs dfsadmin -report ; hdfs haadmin -getAllServiceStateyarn集群验证yarn node -list -all -showDetails; yarn rmadmin -getAllServiceStateyarn节点资源动态扩容yarn rmadmin -updateNodeResource [NodeID] [MemSize] [vCores]1, 主机角色分配主机启动用户角色c72( 192.168.56.72)wangNameNode zkfc , JournalNode,ResourceManager, zk服务,JobHistoryServerc71( 192.168.56.71)wangNameNode zkfc , JournalNode, ResourceManager, zk服务c7( 192.168.56.7)wangDataNode, JournalNode, NodeManager, zk服务2, 设置主机修改hadoop配置文件2.1 设置主机ulimit参数调整# ulimit -n* soft nofile65536# 软限制单个用户可打开的文件数上限* hard nofile65536# 硬限制强制限制超过则报错# ulimit -u, 默认不超过4096程序容易报错outOfMemory* soft nproc131072# 软限制单个用户可创建的进程/线程数上限针对多线程应用如MapReduce* hard nproc131072# 硬限制进程/线程数强制上限一般性设置1, 所有主机统一配置/etc/hosts 域名解析2所有主机统一创建启动用户 wang ,配置 sudoer 权限:useraddwang;su- wang-cssh-keygen -t rsa -P -f ~/.ssh/id_rsa echowang ALL(ALL) NOPASSWD: ALL/etc/sudoersechowang|passwdwang--stdinsed-i/StrictHostKeyChecking ask/a StrictHostKeyChecking no/etc/ssh/ssh_config 关闭防火墙/关闭 Selinux 安装 JDK 时间同步ln-sf/usr/share/zoneinfo/Asia/Shanghai /etc/localtime yuminstallntp ntpdate ntp1.aliyun.com nn节点免密登录---其他dn节点 nn1 -- nn2 互相免密3, 解压hadoop 到 /optchownwang.-R/opt/echoexport JAVA_HOME/opt/jdk-1.8.0_211//opt/hadoop-3.3.3/etc/hadoop/hadoop-env.sh4, 配置环境变量[rootc72 ~]# tail /etc/profileexportJAVA_HOME/opt/jdk-1.8.0_211/exportHADOOP_HOME/opt/hadoop-3.3.3/exportHADOOP_CONF_DIR$HADOOP_HOME/etc/hadoop/exportPATH$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin5, 下一步修改hadoop xml配置文件同步配置2.2 修改hadoop配置文件[rootc72 ~]# cat /opt/hadoop-3.3.3/etc/hadoop/workersc7[rootc72 ~]# cat /opt/hadoop-3.3.3/etc/hadoop/core-site.xmlconfiguration!-- 指定hdfs的nameservice --propertynamefs.defaultFS/namevaluehdfs://hadoop-ha-cluster1//value/property!-- 指定hadoop数据目录 --propertynamehadoop.tmp.dir/namevalue/opt/data/hadoop//value/property!-- 指定zookeeper地址 --propertynameha.zookeeper.quorum/namevaluec7:2181,c71:2181,c72:2181/value/property!-- hadoop链接zookeeper的超时时长设置 --propertynameha.zookeeper.session-timeout.ms/namevalue1000/valuedescriptionms/description/property!-- 解决beeline连接报错 --propertynamehadoop.proxyuser.root.groups/namevalue*/valuedescriptionAllow the superuser oozie to impersonate any members of the group group1 and group2/description/propertypropertynamehadoop.proxyuser.root.hosts/namevalue*/value/property/configuration[rootc72 ~]# cat /opt/hadoop-3.3.3/etc/hadoop/hdfs-site.xmlconfiguration!-- 指定副本数 --propertynamedfs.replication/namevalue1/value/property!-- 配置namenode和datanode的工作目录-数据存储目录 --propertynamedfs.namenode.name.dir/namevalue/opt/data/hadoop/nn/value/propertypropertynamedfs.datanode.data.dir/namevalue/opt/data/hadoop/dn/value/property!-- 启用webhdfs --propertynamedfs.webhdfs.enabled/namevaluetrue/value/property!--指定hdfs的nameservice为hadoop-ha-cluster1需要和core-site.xml中的保持一致 dfs.ha.namenodes.[nameservice id]为在nameservice中的每一个NameNode设置唯一标示符。 配置一个逗号分隔的NameNode ID列表。这将是被DataNode识别为所有的NameNode。 例如如果使用hadoop-ha-cluster1作为nameservice ID并且使用nn1和nn2作为NameNodes标示符 --propertynamedfs.nameservices/namevaluehadoop-ha-cluster1/value/property!-- hadoop-ha-cluster1下面有两个NameNode分别是nn1nn2 --propertynamedfs.ha.namenodes.hadoop-ha-cluster1/namevaluenn1,nn2/value/property!-- nn1的RPC通信地址 --propertynamedfs.namenode.rpc-address.hadoop-ha-cluster1.nn1/namevaluec72:9000/value/property!-- nn1的http通信地址 --propertynamedfs.namenode.http-address.hadoop-ha-cluster1.nn1/namevaluec72:50070/value/property!-- nn2的RPC通信地址 --propertynamedfs.namenode.rpc-address.hadoop-ha-cluster1.nn2/namevaluec71:9000/value/property!-- nn2的http通信地址 --propertynamedfs.namenode.http-address.hadoop-ha-cluster1.nn2/namevaluec71:50070/value/property!-- 指定NameNode的edits元数据的共享存储位置。也就是JournalNode列表 该url的配置格式qjournal://host1:port1;host2:port2;host3:port3/journalId journalId推荐使用nameservice默认端口号是8485 --propertynamedfs.namenode.shared.edits.dir/namevalueqjournal://c7:8485;c71:8485;c72:8485/hadoop-ha-cluster1/value/property!-- 指定JournalNode在本地磁盘存放数据的位置 --propertynamedfs.journalnode.edits.dir/namevalue/opt/data/hadoop/jn/value/property!-- 开启NameNode失败自动切换 --propertynamedfs.ha.automatic-failover.enabled/namevaluetrue/value/property!-- 配置失败自动切换实现方式 --propertynamedfs.client.failover.proxy.provider.hadoop-ha-cluster1/namevalueorg.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider/value/property!-- 配置隔离机制方法多个机制用换行分割即每个机制暂用一行 --propertynamedfs.ha.fencing.methods/namevaluesshfence shell(/bin/true)/value/property!-- 使用sshfence隔离机制时需要ssh免登陆 --propertynamedfs.ha.fencing.ssh.private-key-files/namevalue/home/potter/.ssh/id_rsa/value/property!-- 配置sshfence隔离机制超时时间 --propertynamedfs.ha.fencing.ssh.connect-timeout/namevalue30000/value/propertypropertynameha.failover-controller.cli-check.rpc-timeout.ms/namevalue60000/value/property/configuration[rootc72 ~]# cat /opt/hadoop-3.3.3/etc/hadoop/mapred-site.xmlconfiguration!-- 指定mr框架为yarn方式 --propertynamemapreduce.framework.name/namevalueyarn/value/property!-- 指定mapreduce jobhistory地址 --propertynamemapreduce.jobhistory.address/namevaluec72:10020/value/property!-- 任务历史服务器的web地址 --propertynamemapreduce.jobhistory.webapp.address/namevaluec72:19888/value/propertypropertynameyarn.app.mapreduce.am.env/namevalueHADOOP_MAPRED_HOME/opt/hadoop-3.3.3//value!--valueHADOOP_MAPRED_HOME${full path of your hadoop distribution directory}/value--/propertypropertynamemapreduce.map.env/namevalueHADOOP_MAPRED_HOME/opt/hadoop-3.3.3//value/propertypropertynamemapreduce.reduce.env/namevalueHADOOP_MAPRED_HOME/opt/hadoop-3.3.3//value/property/configuration[rootc72 ~]# cat /opt/hadoop-3.3.3/etc/hadoop/yarn-site.xmlconfiguration!-- 开启RM高可用 --propertynameyarn.resourcemanager.ha.enabled/namevaluetrue/value/property!-- 指定RM的clusterid--propertynameyarn.resourcemanager.cluster-id/namevalueyarn-cluster1/value/property!-- 指定RM的名字 --propertynameyarn.resourcemanager.ha.rm-ids/namevaluerm1,rm2/value/property!-- 分别指定RM的地址 --propertynameyarn.resourcemanager.hostname.rm1/namevaluec72/value/propertypropertynameyarn.resourcemanager.hostname.rm2/namevaluec71/value/property!--新版本指定文件--propertynameyarn.resourcemanager.webapp.address.rm1/namevaluec71:8088/value/propertypropertynameyarn.resourcemanager.webapp.address.rm2/namevaluec72:8088/value/property!-- 指定zk集群地址 --propertynameyarn.resourcemanager.zk-address/namevaluec7:2181,c71:2181,c72:2181/value/propertypropertynameyarn.nodemanager.aux-services/namevaluemapreduce_shuffle/value/propertypropertynameyarn.log-aggregation-enable/namevaluetrue/value/propertypropertynameyarn.log-aggregation.retain-seconds/namevalue86400/value/property!-- 启用自动恢复 --propertynameyarn.resourcemanager.recovery.enabled/namevaluetrue/value/property!-- 制定resourcemanager的状态信息存储在zookeeper集群上 --propertynameyarn.resourcemanager.store.class/namevalueorg.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore/value/property/configuration3, 启动服务c71 执行#root用户启动 hadoop-3.x会报错需要设置以下环境变量exportHDFS_NAMENODE_USERrootexportHDFS_DATANODE_USERrootexportHDFS_SECONDARYNAMENODE_USERrootexportYARN_RESOURCEMANAGER_USERrootexportYARN_NODEMANAGER_USERrootexportHDFS_ZKFC_USERrootexportHDFS_JOURNALNODE_USERroot#启动zk集群具体配置省略./bin/hdfs zkfc-formatZK./sbin/hadoop-daemons.sh start journalnode ./bin/hadoop namenode-format./sbin/start-dfs.sh ./sbin/hadoop-daemons.sh start zkfc#在standby 主机执行命令 hdfs namenode -bootstrapStandby【等同于把namenode数据拷贝到第二个namenode 的相同目录下】./sbin/start-yarn.sh ./sbin/mr-jobhistory-daemon.sh start historyserver[rootc71 ~]# jps16579ResourceManager19142NameNode19231JournalNode19300DFSZKFailoverController#需要在两个NameNode节点分别启动8030QuorumPeerMain[rootc72 hadoop]# jps7553ResourceManager3609NameNode508JobHistoryServer3885JournalNode4029DFSZKFailoverController26244QuorumPeerMain[rootc7 ~]# jps10514NodeManager25363DataNode25460JournalNode5237QuorumPeerMainc72执行./bin/hdfs namenode-bootstrapStandby./sbin/hadoop-daemon.sh start namenode4, 验证服务报错: hadoop jar xx-example.jar找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster#报错原因yarn-site.xml 缺少如下配置propertynameyarn.application.classpath/namevaluehadoop classpath:命令打印的参数值/value/propertyerror starting MRAppMaster: YarnRuntimeException: java.lang.NullPointExceptionRMCommunicator exception while registering NullPointExceptionMRClientService: Webapps failed to start. ignoring for now#报错原因yarn ha 中缺少 yarn-site.xml 如下配置propertynameyarn.resourcemanager.webapp.address.rm1/namevaluexxx1:8088/value/propertypropertynameyarn.resourcemanager.webapp.address.rm2/namevaluexxx2:8088/value/property