配置
系统:win7 旗舰版
软件以及依赖
软件或依赖 | 版本 | 安装方式 | 作用 | 配置环境变量 |
Java | 21.0.7 | 软件安装 | Java编程 | |
Vmware | 15.5.2 | 软件安装 | 虚拟机 | 新建JAVA_HOME变量,值为:D:\dev\java21 |
finalshell | 4.3.10 | 软件安装 | 远程连接虚拟机 | |
Linux | CentOS-6.5-x86_64-bin-DVD1. | 镜像安装 | ||
jdk | 1.8.0_102 | tar包解压 | ||
Hadoop | 2.8.0 | tar包解压 | ||
IntelliJ IDEA | 2020.1.4 | 软件安装 |
实验一:
Linux系统安装与配置
Linux系统网络配置
修改主机名
vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop01
NTPSERVERARGS=iburst
修改
vi /etc/hosts
文件内容
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.150.130 hadoop01 192.168.150.131 hadoop02 192.168.150.132 hadoop03
修改IP地址文件
vi /etc/sysconfig/network-scripts/ifcfg-eth0
修改以下内容
DEVICE=eth0 HWADDR=00:0C:29:88:E6:5C TYPE=Ethernet UUID=8125ad2a-1604-4efa-8cc2-09264ef66060 ONBOOT=yes NM_CONTROLLED=yes BOOTPROTO=static IPADDR=192.168.150.130 NETMASK=255.255.255.0 GATEWAY=192.168.150.2 DNS1=192.168.150.2 DNS2=8.8.8.8
重启网卡
service network restart
虚拟机克隆
关闭虚拟机hadoop01
分别克隆出虚拟机hadoop02 hadoop03
修改hadoop02和hadoop03两台虚拟机的网卡配置文件
vi /etc/udev/rules.d/70-persistent-net.rules
修改
# This file was automatically generated by the /lib/udev/write_net_rules # program, run by the persistent-net-generator.rules rules file. # # You can modify it, as long as you keep each rule on a single # line, and change only the value of the NAME= key. # PCI device 0x8086:0x100f (e1000) # PCI device 0x8086:0x100f (e1000) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:0c:29:20:9d:2d", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
修改ip以及hwaddr 上面文件标绿 ATTR{address}=="00:0c:29:20:9d:2d"
DEVICE=eth0 HWADDR=00:0C:29:20:9d:2d TYPE=Ethernet UUID=8125ad2a-1604-4efa-8cc2-09264ef66060 ONBOOT=yes NM_CONTROLLED=yes BOOTPROTO=static IPADDR=192.168.150.132 NETMASK=255.255.255.0 GATEWAY=192.168.150.2 DNS1=192.168.150.2 DNS2=8.8.8.8
修改主机名
vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop02
NTPSERVERARGS=iburst
NETWORKING=yes
HOSTNAME=hadoop03
NTPSERVERARGS=iburst
SSH免密功能配置
分别在hadoop01、hadoop02、hadoop03三台虚拟机上使用“rm -rf .ssh”命令删除~/.ssh目录
rm -rf .ssh
在hadoop01虚拟机上新建SSH公私密钥对,使用“ssh-keygen -t rsa”命令, 然后一直回车即可
ssh-keygen -t rsa
在虚拟机hadoop01上配置免密登录虚拟机hadoop01、hadoop02和hadoop03,使用ssh-copy-id root@hadoop01
ssh-copy-id root@hadoop02
ssh-copy-id root@hadoop03
ssh-copy-id root@hadoop01
ssh-copy-id root@hadoop02
ssh-copy-id root@hadoop03
连接ssh root@hadoop02
ssh root@hadoop02
实验二
Hadoop集群搭建
- 安装JDK
将JDK安装包jdk-8u102-linux-x64.tar.gz放到/root/Downloads
使用tar命令安装JDK至/usr/local目录 tar -zxvf /root/Downloads/jdk-8u102-linux-x64.tar.gz -C /usr/local
tar -zxvf /root/Downloads/ jdk-8u102-linux-x64.tar.gz -C /usr/local
配置JDK环境变量,使用vi命令编辑vi /etc/profile文件,添加如下内容
vi /etc/profile
# /etc/profile # System wide environment and startup programs, for login setup # Functions and aliases go in /etc/bashrc # It's NOT a good idea to change this file unless you know what you # are doing. It's much better to create a custom.sh shell script in # /etc/profile.d/ to make custom changes to your environment, as this # will prevent the need for merging in future updates. pathmunge () { case ":${PATH}:" in *:"$1":*) ;; *) if [ "$2" = "after" ] ; then PATH=$PATH:$1 else PATH=$1:$PATH fi esac } if [ -x /usr/bin/id ]; then if [ -z "$EUID" ]; then # ksh workaround EUID=`id -u` UID=`id -ru` fi USER="`id -un`" LOGNAME=$USER MAIL="/var/spool/mail/$USER" fi # Path manipulation if [ "$EUID" = "0" ]; then pathmunge /sbin pathmunge /usr/sbin pathmunge /usr/local/sbin else pathmunge /usr/local/sbin after pathmunge /usr/sbin after pathmunge /sbin after fi HOSTNAME=`/bin/hostname 2>/dev/null` HISTSIZE=1000 if [ "$HISTCONTROL" = "ignorespace" ] ; then export HISTCONTROL=ignoreboth else export HISTCONTROL=ignoredups fi export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL # By default, we want umask to get set. This sets it for login shell # Current threshold for system reserved uid/gids is 200 # You could check uidgid reservation validity in # /usr/share/doc/setup-*/uidgid file if [ $UID -gt 199 ] && [ "`id -gn`" = "`id -un`" ]; then umask 002 else umask 022 fi for i in /etc/profile.d/*.sh ; do if [ -r "$i" ]; then if [ "${-#*i}" != "$-" ]; then . "$i" else . "$i" >/dev/null 2>&1 fi fi done unset i unset -f pathmunge export JAVA_HOME=/usr/local/jdk1.8.0_102 export PATH=$JAVA_HOME/bin:$PATH
使用source命令使配置文件生效 source /etc/profile
source /etc/profile
使用java -version命令对JDK环境进行验证 java -version
java -version
下载和解压Hadoop软件包
将安装包hadoop-2.8.0.tar.gz放到/root/Downloads目录下
使用tar命令解压至/usr/local目录 tar -zxvf /root/Downloads/hadoop-2.8.0.tar.gz -C /usr/local
- 配置环境变量,使用vi命令编辑/etc/profile文件,添加HADOOP_HOME相关内容
export JAVA_HOMR=/usr/local/jdk1.8.0_102
export HADOOP_HOME=/usr/local/hadoop-2.8.0
export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
使用source命令使配置文件生效 source /etc/profile
source /etc/profile
使用hadoop version和which hadoop命令进行验证
hadoop version
which hadoop
-
Hadoop集群配置
- 进入/usr/local/hadoop-2.8.0/etc/hadoop/目录,vi hadoop-env.sh 文件进行修改
vi hadoop-env.sh
# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. export JAVA_HOME=/usr/local/jdk1.8.0_102
用vi core-site.xml 文件进行修改,添加如下内容(注意hadoop01为该虚拟机的主机名,可根据实际情况调整)
vi core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop01:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-hadoop01</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> </configuration>
用vi命令对hdfs-site.xml文件进行修改,添加如下内容(注意hadoop02为第二台虚拟机的主机名,可根据实际情况调整)
<configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop02:50090</value> </property> </configuration>
使用cp命令复制并重命名mapred-site.xml文件
cp mapred-site.xml.template mapred-site.xml
用vi命令对mapred-site.xml文件进行修改,添加如下内容
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
用vi命令对yarn-site.xml文件进行修改,添加如下内容
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop01</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
设置从节点,使用vi命令修改slaves文件,内容如下
hadoop01 hadoop02 hadoop03
将hadoop01的配置及相关文件分发到hadoop02和hadoop03两台虚拟机
scp /etc/profile hadoop02:/etc/profile
scp /etc/profile hadoop03:/etc/profile
scp -r /usr/local/hadoop-2.8.0/ hadoop02:/usr/local/
scp -r /usr/local/hadoop-2.8.0/ hadoop03:/usr/local/
分别在hadoop02和hadoop03两台虚拟机上执行source /etc/profile指令刷新配置文件
source /etc/profile
在hadoop01虚拟机上,使用命令格式化文件系统 hdfs namenode -format
hdfs namenode -format
出现下所示内容,代表格式化成功
25/06/05 14:45:22 INFO namenode.FSImage: Allocated new BlockPoolId: BP-666209927-192.168.150.130-1749105922358
25/06/05 14:45:23 INFO common.Storage: Storage directory /tmp/hadoop-hadoop01/dfs/name has been successfully formatted.
25/06/05 14:45:24 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop-hadoop01/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
25/06/05 14:45:26 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-hadoop01/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 2 seconds.
25/06/05 14:45:26 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
25/06/05 14:45:26 INFO util.ExitUtil: Exiting with status 0
25/06/05 14:45:26 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop01/192.168.150.130
************************************************************/
启动Hadoop进程,命令如下:start-dfs.sh
start-yarn.sh
start -dfs.sh
start-yarn.sh
权限不足:使用 chmod -R 777 目录 添加
分别在三台虚拟机上查看Hadoop进程,命令如下:jps
jps
hadoop01:
356 DataNode
80981 Jps
6614 ResourceManager
77549 NameNode
6782 NodeManagerhadoop02:
65776 SecondaryNameNode
70978 Jps
51675 NodeManager
51308 DataNode
hadoop03:
45218 NodeManager
64659 Jps
44718 DataNode
关闭hadoop01、hadoop02、hadoop03三台虚拟机的iptables防火墙,通过本机电脑浏览器访问Hadoop的Web页面http://192.168.108.141:50070/(红色部分IP地址可根据实际情况调整)
service iptables stop
chkconfig iptables off
在hadoop01虚拟上新建一个文本文件
hdfs dfs -put word.txt /
通过hdfs相关命令查看已上传至HDFS上的文件内容,确认文件内容是否与本地虚拟机上文件内容一致
hdfs dfs -cat /word.txt
在本地PC机上,通过浏览器登录Web界面,并下载文件查看
主机名换为ip
实验三:HDFS分布式文件系统配置和启动
Windows本地环境配置
在Windows操作系统中将hadoop安装包hadoop-2.8.0.tar.gz(即实验二中在Linux操作系统中解压的程序包)解压到指定目录下(如D:\Application)
配置Windows环境变量,右键点击“此电脑”,左键单击“属性”,在弹出的显示框中选择“高级系统设置”
在“系统变量”中选择“新建”,添加HADOOP_HOME的变量值,其中变量值为第一步中解压hadoop程序包的路径
修改系统变量PATH的值,添加变量值%HADOOP_HOME%\bin和%HADOOP_HOME%\sbin,并点击“确定”
在D:\Application\hadoop-2.8.0\bin路径下,添加hadoop.dll和winutils.exe两个文件
编辑D:\Application\hadoop-2.8.0\etc\hadoop路径下的hadoop-env.cmd文件(使用记事本打开),设置JAVA_HOME的值为C:\PROGRA~1\Java\jdk1.8.0_102(这里为Java的安装路径,PROGRA~1代表Program Files)并保存退出
使用win+R呼出命令提示符界面,输入hadoop version,如显示下来hadoop的版本信息,则表面Windows环境下hadoop安装配置成功
Hive集群搭建
-
安装mysql
将mysql安装包放到/root/Downloads 中
通过rpm安装 ,在https://downloads.mysql/archives/community/中下载好对应的版本之后。开始安装
mkdir -p /usr/local/mysql
tar -xvf /root/Downloads/mysql-5.7.22-1.el6.x86_64.rpm-bundle.tar -C /usr/local/mysql
yum remove -y mysql-libs
yum install -y net-tools
rpm -ivh mysql-community-common-5.7.22-1.el6.x86_64.rpm
rpm -ivh mysql-community-libs-5.7.22-1.el6.x86_64.rpm
rpm -ivh mysql-community-client-5.7.22-1.el6.x86_64.rpm
rpm -ivh mysql-community-server-5.7.22-1.el6.x86_64.rpm
rpm -ivh mysql-community-libs-compat-5.7.22-1.el6.x86_64.rpm
systemctl start mysqld
# 遇到问题:初始化失败
2025-06-21T23:21:39.197886Z 1 [ERROR] Failed to open the bootstrap file /var/lib/mysql-files/install-validate-password-plugin.JwJRa3.sql 2025-06-21T23:21:39.197955Z 1 [ERROR] 1105 Bootstrap file error, return code (0). Nearest query: 'LSE SET @sys.tmp.table_exists.SQL = CONCAT('SELECT COUNT(*) FROM `', in_db, '`.`', in_table, '`'); PREPARE stmt_select FROM @sys.tmp.table_exists.SQL; IF (NOT v_error) THEN DEALLOCATE PREPARE stmt_select; SET out_exists = 'TEMPORARY'; END IF; END IF; END;
未解决。。。。。
chkconfig --add mysqld
chkconfig mysqld on
chkconfig --list | grep mysqld
service mysql restart
cat /var/log/mysqld.log | grep password
# 设置密码
mysql -u root -p
set global validate_password_policy = LOW;
set password = passwor['root'];
eixt
service mysqld restart
service mysqld status
# 遇到这个问题:
2025-06-22T00:39:47.771513Z 0 [ERROR] InnoDB: The innodb_system data file 'ibdata1' must be writable 2025-06-22T00:39:47.771559Z 0 [ERROR] InnoDB: The innodb_system data file 'ibdata1' must be writable 2025-06-22T00:39:47.771586Z 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 2025-06-22T00:39:48.377780Z 0 [ERROR] Plugin 'InnoDB' init function returned error. 2025-06-22T00:39:48.377985Z 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2025-06-22T00:39:48.378025Z 0 [ERROR] Failed to initialize builtin plugins. 2025-06-22T00:39:48.378054Z 0 [ERROR] Aborting
更改权限即可。
chown -R mysql.mysql mysql
# 修改myf 在/etc/myf 先删除mysqld,在添加以下内容
[client] default-character-set = utf8 [mysql] default-character-set = utf8 [mysqld] character_set_server = utf8 #
# 重启
service mysqld restart
# 对外开放权限
mysql -uroot -p
set global validate_password_policy = LOW;
grant all privileges on *.* to 'root'@'%' identified by 'root';
flush privileges;
安装hive
将hive安装包apache-hive-2.3.3-bin.tar.gz放到/root/Downloads
使用tar命令安装JDK至/usr/local目录 tar -zxvf /root/Downloads/apache-hive-2.3.3-bin.tar.gz -C /usr/local
ln -s apache-hive-2.3.3-bin hive
# 添加权限
chown -R hadoop:hadoop apache-hive-2.3.3-bin/
没什么用呀,chown: 无效的用户: "hadoop:hadoop"
我使用的是root权限,忘了用户名,查询一下。
who
chown -R Hadoop:Hadoop apache-hive-2.3.3-bin/
# 配置环境变量
vi /etc/profile 添加:
export HIVE_HOME=/usr/local/hive export HIVE_CONF_DIR=/usr/local/hive/conf export PATH=$PATH:$HIVE_HOME/bin
source /etc/profile
# 修改配置文件
[Hadoop@hadoop01 conf]$ cp hive-default.xml.template hive-site.xml
cp: 无法创建普通文件"hive-site.xml": 权限不够
就是因为这个文件属于root,更改为chown -R Hadoop:Hadoop apache-hive-2.3.3-bin/ 即可
cp hive-default.xml.template hive-site.xml
# 启动hive
nohup bin/hive --service metastore >> logs/metastore.log 2>&1 &
#启动spark
sbin/start-all.sh
配置yum源
# 备份源
cd /etc/yum.repos.d
cp CentOS-Base.repo CentOS-Base.repo.bak
mkdir -p /etc/yum.repos.d/bak
cp CentOS-Debuginfo.repo CentOS-Debuginfo.repo.bak
cp CentOS-Media.repo CentOS-Media.repo.bak
cp CentOS-Vault.repo CentOS-Vault.repo.bak
mv /etc/yum.repos.d/*.bak /etc/yum.repos.d/bak
更新base repo
[base] name=CentOS-6.10 enabled=1 failovermethod=priority baseurl=http://mirrors.aliyun/centos-vault/6.10/os/$basearch/ gpgcheck=1 gpgkey=http://mirrors.aliyun/centos-vault/RPM-GPG-KEY-CentOS-6 [updates] name=CentOS-6.10 enabled=1 failovermethod=priority baseurl=http://mirrors.aliyun/centos-vault/6.10/updates/$basearch/ gpgcheck=1 gpgkey=http://mirrors.aliyunm/centos-vault/RPM-GPG-KEY-CentOS-6 [extras] name=CentOS-6.10 enabled=1 failovermethod=priority baseurl=http://mirrors.aliyun/centos-vault/6.10/extras/$basearch/ gpgcheck=1 gpgkey=http://mirrors.aliyun/centos-vault/RPM-GPG-KEY-CentOS-6 [epel] name=Extra Packages for Enterprise Linux 6 - $basearch enabled=1 failovermethod=priority baseurl=http://mirrors.aliyun/epel-archive/6/$basearch gpgcheck=0 gpgkey=http://mirrors.aliyun/epel-archive/RPM-GPG-KEY-EPEL-6
更新Vault
[base] name=CentOS-6.10 - Base - mirrors.aliyun failovermethod=priority baseurl=https://mirrors.aliyun/centos-vault/6.10/os/$basearch/ gpgcheck=0 gpgkey=http://mirrors.aliyun/centos/RPM-GPG-KEY-CentOS-6 #released updates [updates] name=CentOS-6.10 - Updates - mirrors.aliyun failovermethod=priority baseurl=https://mirrors.aliyun/centos-vault/6.10/updates/$basearch/ gpgcheck=0 gpgkey=http://mirrors.aliyun/centos/RPM-GPG-KEY-CentOS-6 #additional packages that may be useful [extras] name=CentOS-6.10 - Extras - mirrors.aliyun failovermethod=priority baseurl=https://mirrors.aliyun/centos-vault/6.10/extras/$basearch/ gpgcheck=0 gpgkey=http://mirrors.aliyun/centos/RPM-GPG-KEY-CentOS-6 #additional packages that extend functionality of existing packages [centosplus] name=CentOS-6.10 - Plus - mirrors.aliyun failovermethod=priority baseurl=https://mirrors.aliyun/centos-vault/6.10/centosplus/$basearch/ gpgcheck=0 enabled=0 gpgkey=http://mirrors.aliyun/centos/RPM-GPG-KEY-CentOS-6 #contrib - packages by Centos Users [contrib] name=CentOS-6.10 - Contrib - mirrors.aliyun failovermethod=priority baseurl=https://mirrors.aliyun/centos-vault/6.10/contrib/$basearch/ gpgcheck=0 enabled=0 gpgkey=http://mirrors.aliyun/centos/RPM-GPG-KEY-CentOS-6
新建epel.rpeo
# 阿里云的epel源 [epel-aliyun] name=epel-aliyun-CentOS-$releasever baseurl=https://mirrors.aliyun/epel-archive/$releasever/$basearch/ gpgcheck=0
新建elrepo.repo
### Name: ELRepo Community Enterprise Linux Repository for el6 ### URL: https://elrepo/ [elrepo] name=ELRepo Community Enterprise Linux Repository - el6 baseurl=https://mirrors.aliyun/elrepo/elrepo/el6/$basearch/ #mirrorlist=http://mirrors.elrepo/mirrors-elrepo.el6 enabled=1 gpgcheck=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-elrepo protect=0 [elrepo-kernel] name=ELRepo Community Enterprise Linux Kernel Repository - el6 baseurl=https://mirrors.aliyun/elrepo/kernel/el6/$basearch/ #mirrorlist=http://mirrors.elrepo/mirrors-elrepo-kernel.el6 enabled=1 gpgcheck=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-elrepo protect=0 [elrepo-extras] name=ELRepo Community Enterprise Linux Extras Repository - el6 baseurl=https://mirrors.aliyun/elrepo/extras/el6/$basearch/ #mirrorlist=http://mirrors.elrepo/mirrors-elrepo-extras.el6 enabled=1 gpgcheck=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-elrepo protect=0
# 更新
yum clean all && yum makecache
发布评论