#Hadoop

准备工作

主机安装SSH服务,并开启证书登录。然后运行ssh-keygen命令,在 ~/.ssh/ 目录中生成 id_rsaid_rsa.pub 文件,并将id_rsa.pub内容拷贝到 ~/.ssh/authorized_key 中(如果没有就新建该文件),目的是能够开启主机到自身的ssh证书登录。

hadoop-3.1.2.tar.gzjdk-8u231-linux-x64.tar.gz拷贝到服务器中并解压,例如 /data/ 目录。

1
2
3
4
5
6
7
[root@localhost data]# ls jdk1.8.0_231/ 
bin include jre LICENSE README.html src.zip THIRDPARTYLICENSEREADME.txt
COPYRIGHT javafx-src.zip lib man release THIRDPARTYLICENSEREADME-JAVAFX.txt
[root@localhost data]# ls jdk1.8.0_231/
bin include jre LICENSE README.html src.zip THIRDPARTYLICENSEREADME.txt
COPYRIGHT javafx-src.zip lib man release THIRDPARTYLICENSEREADME-JAVAFX.txt
[root@localhost data]#

环境配置

  1. 编辑 /etc/profile 文件,在末尾追加以下环境信息:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
####大数据测试环境配置

# 全局路径
WORK_DIR=/data

# Java
export JAVA_HOME=$WORK_DIR/jdk1.8.0_231
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib

# Hadoop
export HADOOP_HOME=$WORK_DIR/hadoop-3.1.2
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

# Export PATH
export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH
  1. 创建hadoop文件目录
1
2
3
4
5
6
mkdir  /data/hadoop  
mkdir /data/hadoop/tmp
mkdir /data/hadoop/var
mkdir /data/hadoop/dfs
mkdir /data/hadoop/dfs/name
mkdir /data/hadoop/dfs/data
  1. 编辑 hadoop-3.1.2/etc/hadoop/core-site.xml 文件,在 configuration 节中增加以下内容:
1
2
3
4
5
6
7
8
9
10
11
12
13
<configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

</configuration>
  1. 编辑 hadoop-3.1.2/etc/hadoop/hdfs-site.xml 文件,在 configuration 节中增加以下内容:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<configuration>

<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>need not permissions</description>
</property>


</configuration>
  1. 编辑 hadoop-3.1.2/etc/hadoop/mapred-site.xml 文件,在 configuration 节中增加以下内容:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<configuration>

<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/data/hadoop/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

</configuration>

  1. 编辑 hadoop-3.1.2/etc/hadoop/hadoop-env.sh 文件,将其中 JAVA_HOME 路径改为以下内容:
1
export JAVA_HOME=/data/jdk1.8.0_231

额外选项

可能会遇到报错,抛出以下报错信息:

1
2
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.

解决办法:

修改 hadoop-3.1.2/sbin/start-dfs.shhadoop-3.1.2/sbin/stop-dfs.sh 文件,在脚本开始处添加以下内容:

1
2
3
4
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

修改 hadoop-3.1.2/sbin/start-yarn.shhadoop-3.1.2/sbin/stop-yarn.sh 文件,在脚本开始处添加以下内容:

1
2
3
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

运行

首先载入/etc/profile文件中的配置:

1
source /etc/profile

然后初始化hadoop namenode:

1
hadoop  namenode  -format

启动dfs:

1
/data/hadoop-3.1.2/sbin/start-dfs.sh

启动yarn:

1
/data/hadoop-3.1.2/sbin/start-yarn.sh

查看运行情况:

HADOOP_HOST是你的环境IP。

访问 http://HADOOP_HOST:8088/

访问 http://HADOOP_HOST:9870/

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×