配置

系统:win7 旗舰版

软件以及依赖

软件或依赖版本安装方式作用配置环境变量
Java        21.0.7软件安装Java编程
Vmware15.5.2软件安装虚拟机

新建JAVA_HOME变量,值为:D:\dev\java21

finalshell  4.3.10软件安装远程连接虚拟机
LinuxCentOS-6.5-x86_64-bin-DVD1.镜像安装
jdk1.8.0_102tar包解压
Hadoop 2.8.0tar包解压
IntelliJ IDEA2020.1.4软件安装

实验一:

Linux系统安装与配置

Linux系统网络配置

修改主机名

vi /etc/sysconfig/network

NETWORKING=yes
HOSTNAME=hadoop01
NTPSERVERARGS=iburst

修改

vi /etc/hosts

文件内容

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.150.130 hadoop01
192.168.150.131 hadoop02
192.168.150.132 hadoop03

修改IP地址文件

vi /etc/sysconfig/network-scripts/ifcfg-eth0

修改以下内容 

DEVICE=eth0
HWADDR=00:0C:29:88:E6:5C
TYPE=Ethernet
UUID=8125ad2a-1604-4efa-8cc2-09264ef66060
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
IPADDR=192.168.150.130
NETMASK=255.255.255.0
GATEWAY=192.168.150.2
DNS1=192.168.150.2
DNS2=8.8.8.8

重启网卡

service network restart

虚拟机克隆

关闭虚拟机hadoop01

分别克隆出虚拟机hadoop02 hadoop03

修改hadoop02和hadoop03两台虚拟机的网卡配置文件

vi /etc/udev/rules.d/70-persistent-net.rules

修改

# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.

# PCI device 0x8086:0x100f (e1000)

# PCI device 0x8086:0x100f (e1000)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:0c:29:20:9d:2d", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

修改ip以及hwaddr 上面文件标绿 ATTR{address}=="00:0c:29:20:9d:2d"

DEVICE=eth0
HWADDR=00:0C:29:20:9d:2d
TYPE=Ethernet
UUID=8125ad2a-1604-4efa-8cc2-09264ef66060
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
IPADDR=192.168.150.132
NETMASK=255.255.255.0
GATEWAY=192.168.150.2
DNS1=192.168.150.2
DNS2=8.8.8.8

修改主机名

vi /etc/sysconfig/network

NETWORKING=yes
HOSTNAME=hadoop02
NTPSERVERARGS=iburst

NETWORKING=yes
HOSTNAME=hadoop03
NTPSERVERARGS=iburst

SSH免密功能配置

分别在hadoop01、hadoop02、hadoop03三台虚拟机上使用“rm -rf .ssh”命令删除~/.ssh目录

rm -rf .ssh

在hadoop01虚拟机上新建SSH公私密钥对,使用“ssh-keygen -t rsa”命令, 然后一直回车即可

ssh-keygen -t rsa

在虚拟机hadoop01上配置免密登录虚拟机hadoop01、hadoop02和hadoop03,使用ssh-copy-id root@hadoop01 

ssh-copy-id root@hadoop02

ssh-copy-id root@hadoop03

ssh-copy-id root@hadoop01
ssh-copy-id root@hadoop02
ssh-copy-id root@hadoop03

 连接ssh root@hadoop02

ssh root@hadoop02

 实验二

Hadoop集群搭建

  • 安装JDK

将JDK安装包jdk-8u102-linux-x64.tar.gz放到/root/Downloads  

使用tar命令安装JDK至/usr/local目录 tar -zxvf /root/Downloads/jdk-8u102-linux-x64.tar.gz -C /usr/local

tar -zxvf /root/Downloads/ jdk-8u102-linux-x64.tar.gz -C /usr/local

配置JDK环境变量,使用vi命令编辑vi /etc/profile文件,添加如下内容

vi /etc/profile
# /etc/profile

# System wide environment and startup programs, for login setup
# Functions and aliases go in /etc/bashrc

# It's NOT a good idea to change this file unless you know what you
# are doing. It's much better to create a custom.sh shell script in
# /etc/profile.d/ to make custom changes to your environment, as this
# will prevent the need for merging in future updates.

pathmunge () {
    case ":${PATH}:" in
        *:"$1":*)
            ;;
        *)
            if [ "$2" = "after" ] ; then
                PATH=$PATH:$1
            else
                PATH=$1:$PATH
            fi
    esac
}


if [ -x /usr/bin/id ]; then
    if [ -z "$EUID" ]; then
        # ksh workaround
        EUID=`id -u`
        UID=`id -ru`
    fi
    USER="`id -un`"
    LOGNAME=$USER
    MAIL="/var/spool/mail/$USER"
fi

# Path manipulation
if [ "$EUID" = "0" ]; then
    pathmunge /sbin
    pathmunge /usr/sbin
    pathmunge /usr/local/sbin
else
    pathmunge /usr/local/sbin after
    pathmunge /usr/sbin after
    pathmunge /sbin after
fi

HOSTNAME=`/bin/hostname 2>/dev/null`
HISTSIZE=1000
if [ "$HISTCONTROL" = "ignorespace" ] ; then
    export HISTCONTROL=ignoreboth
else
    export HISTCONTROL=ignoredups
fi

export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL

# By default, we want umask to get set. This sets it for login shell
# Current threshold for system reserved uid/gids is 200
# You could check uidgid reservation validity in
# /usr/share/doc/setup-*/uidgid file
if [ $UID -gt 199 ] && [ "`id -gn`" = "`id -un`" ]; then
    umask 002
else
    umask 022
fi

for i in /etc/profile.d/*.sh ; do
    if [ -r "$i" ]; then
        if [ "${-#*i}" != "$-" ]; then
            . "$i"
        else
            . "$i" >/dev/null 2>&1
        fi
    fi
done

unset i
unset -f pathmunge

export JAVA_HOME=/usr/local/jdk1.8.0_102
export PATH=$JAVA_HOME/bin:$PATH

使用source命令使配置文件生效 source /etc/profile

source /etc/profile

使用java -version命令对JDK环境进行验证 java -version

java -version

下载和解压Hadoop软件包

将安装包hadoop-2.8.0.tar.gz放到/root/Downloads目录下 

使用tar命令解压至/usr/local目录 tar -zxvf /root/Downloads/hadoop-2.8.0.tar.gz -C /usr/local        

  1. 配置环境变量,使用vi命令编辑/etc/profile文件,添加HADOOP_HOME相关内容
export JAVA_HOMR=/usr/local/jdk1.8.0_102

export HADOOP_HOME=/usr/local/hadoop-2.8.0

export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

      

使用source命令使配置文件生效 source /etc/profile

source /etc/profile

使用hadoop version和which hadoop命令进行验证

hadoop version
which hadoop


  • Hadoop集群配置

  1. 进入/usr/local/hadoop-2.8.0/etc/hadoop/目录,vi hadoop-env.sh 文件进行修改
    vi hadoop-env.sh
    # Licensed to the Apache Software Foundation (ASF) under one
    # or more contributor license agreements.  See the NOTICE file
    # distributed with this work for additional information
    # regarding copyright ownership.  The ASF licenses this file
    # to you under the Apache License, Version 2.0 (the
    # "License"); you may not use this file except in compliance
    # with the License.  You may obtain a copy of the License at
    #
    #     http://www.apache/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    # Set Hadoop-specific environment variables here.
    
    # The only required environment variable is JAVA_HOME.  All others are
    # optional.  When running a distributed configuration it is best to
    # set JAVA_HOME in this file, so that it is correctly defined on
    # remote nodes.
    
    # The java implementation to use.
    export JAVA_HOME=/usr/local/jdk1.8.0_102
    
    
    

用vi core-site.xml 文件进行修改,添加如下内容(注意hadoop01为该虚拟机的主机名,可根据实际情况调整)

vi core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop01:8020</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/tmp/hadoop-hadoop01</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>4096</value>
    </property>
</configuration>

用vi命令对hdfs-site.xml文件进行修改,添加如下内容(注意hadoop02为第二台虚拟机的主机名,可根据实际情况调整)

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop02:50090</value>
    </property>
</configuration>

使用cp命令复制并重命名mapred-site.xml文件

cp mapred-site.xml.template mapred-site.xml

用vi命令对mapred-site.xml文件进行修改,添加如下内容

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

用vi命令对yarn-site.xml文件进行修改,添加如下内容

<configuration>

<!-- Site specific YARN configuration properties -->
    
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop01</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

</configuration>

设置从节点,使用vi命令修改slaves文件,内容如下

hadoop01
hadoop02
hadoop03

将hadoop01的配置及相关文件分发到hadoop02和hadoop03两台虚拟机

scp /etc/profile hadoop02:/etc/profile

scp /etc/profile hadoop03:/etc/profile

scp -r /usr/local/hadoop-2.8.0/ hadoop02:/usr/local/

scp -r /usr/local/hadoop-2.8.0/ hadoop03:/usr/local/

分别在hadoop02和hadoop03两台虚拟机上执行source /etc/profile指令刷新配置文件

source /etc/profile

在hadoop01虚拟机上,使用命令格式化文件系统 hdfs namenode  -format

hdfs namenode -format

出现下所示内容,代表格式化成功

25/06/05 14:45:22 INFO namenode.FSImage: Allocated new BlockPoolId: BP-666209927-192.168.150.130-1749105922358
25/06/05 14:45:23 INFO common.Storage: Storage directory /tmp/hadoop-hadoop01/dfs/name has been successfully formatted.
25/06/05 14:45:24 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop-hadoop01/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
25/06/05 14:45:26 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-hadoop01/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 2 seconds.
25/06/05 14:45:26 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
25/06/05 14:45:26 INFO util.ExitUtil: Exiting with status 0
25/06/05 14:45:26 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop01/192.168.150.130
************************************************************/
 

启动Hadoop进程,命令如下:start-dfs.sh

start-yarn.sh

start -dfs.sh

start-yarn.sh

 权限不足:使用  chmod -R 777  目录  添加

分别在三台虚拟机上查看Hadoop进程,命令如下:jps

jps

 hadoop01:

356 DataNode
80981 Jps
6614 ResourceManager
77549 NameNode
6782 NodeManager

hadoop02:

65776 SecondaryNameNode
70978 Jps
51675 NodeManager
51308 DataNode
 

hadoop03:

45218 NodeManager
64659 Jps
44718 DataNode
 

关闭hadoop01、hadoop02、hadoop03三台虚拟机的iptables防火墙,通过本机电脑浏览器访问Hadoop的Web页面http://192.168.108.141:50070/(红色部分IP地址可根据实际情况调整)

service iptables stop
chkconfig iptables off

在hadoop01虚拟上新建一个文本文件

hdfs dfs -put word.txt /

通过hdfs相关命令查看已上传至HDFS上的文件内容,确认文件内容是否与本地虚拟机上文件内容一致

hdfs dfs -cat /word.txt


 

在本地PC机上,通过浏览器登录Web界面,并下载文件查看

主机名换为ip

实验三:HDFS分布式文件系统配置和启动

Windows本地环境配置

在Windows操作系统中将hadoop安装包hadoop-2.8.0.tar.gz(即实验二中在Linux操作系统中解压的程序包)解压到指定目录下(如D:\Application)

配置Windows环境变量,右键点击“此电脑”,左键单击“属性”,在弹出的显示框中选择“高级系统设置”

在“系统变量”中选择“新建”,添加HADOOP_HOME的变量值,其中变量值为第一步中解压hadoop程序包的路径

修改系统变量PATH的值,添加变量值%HADOOP_HOME%\bin和%HADOOP_HOME%\sbin,并点击“确定”

在D:\Application\hadoop-2.8.0\bin路径下,添加hadoop.dll和winutils.exe两个文件

编辑D:\Application\hadoop-2.8.0\etc\hadoop路径下的hadoop-env.cmd文件(使用记事本打开),设置JAVA_HOME的值为C:\PROGRA~1\Java\jdk1.8.0_102(这里为Java的安装路径,PROGRA~1代表Program Files)并保存退出

使用win+R呼出命令提示符界面,输入hadoop version,如显示下来hadoop的版本信息,则表面Windows环境下hadoop安装配置成功

Hive集群搭建

  • 安装mysql

将mysql安装包放到/root/Downloads  中

通过rpm安装 ,在https://downloads.mysql/archives/community/中下载好对应的版本之后。开始安装


mkdir -p /usr/local/mysql
tar -xvf /root/Downloads/mysql-5.7.22-1.el6.x86_64.rpm-bundle.tar -C /usr/local/mysql
yum remove -y mysql-libs

yum install -y net-tools

rpm -ivh mysql-community-common-5.7.22-1.el6.x86_64.rpm
rpm -ivh  mysql-community-libs-5.7.22-1.el6.x86_64.rpm
rpm -ivh mysql-community-client-5.7.22-1.el6.x86_64.rpm
rpm -ivh mysql-community-server-5.7.22-1.el6.x86_64.rpm
rpm -ivh mysql-community-libs-compat-5.7.22-1.el6.x86_64.rpm

systemctl start mysqld

# 遇到问题:初始化失败

2025-06-21T23:21:39.197886Z 1 [ERROR] Failed to open the bootstrap file /var/lib/mysql-files/install-validate-password-plugin.JwJRa3.sql
2025-06-21T23:21:39.197955Z 1 [ERROR] 1105  Bootstrap file error, return code (0). Nearest query: 'LSE SET @sys.tmp.table_exists.SQL = CONCAT('SELECT COUNT(*) FROM `', in_db, '`.`', in_table, '`'); PREPARE stmt_select FROM @sys.tmp.table_exists.SQL; IF (NOT v_error) THEN DEALLOCATE PREPARE stmt_select; SET out_exists = 'TEMPORARY'; END IF; END IF; END;

未解决。。。。。

chkconfig --add mysqld

chkconfig mysqld on
chkconfig --list | grep mysqld
service mysql restart
cat /var/log/mysqld.log | grep password

# 设置密码
mysql -u root -p

        set global validate_password_policy = LOW;
        set password = passwor['root'];

        eixt

service mysqld restart
service mysqld status

# 遇到这个问题:

2025-06-22T00:39:47.771513Z 0 [ERROR] InnoDB: The innodb_system data file 'ibdata1' must be writable
2025-06-22T00:39:47.771559Z 0 [ERROR] InnoDB: The innodb_system data file 'ibdata1' must be writable
2025-06-22T00:39:47.771586Z 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2025-06-22T00:39:48.377780Z 0 [ERROR] Plugin 'InnoDB' init function returned error.
2025-06-22T00:39:48.377985Z 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2025-06-22T00:39:48.378025Z 0 [ERROR] Failed to initialize builtin plugins.
2025-06-22T00:39:48.378054Z 0 [ERROR] Aborting

更改权限即可。

chown -R mysql.mysql mysql

# 修改myf  在/etc/myf   先删除mysqld,在添加以下内容

[client]

default-character-set = utf8

[mysql]
default-character-set = utf8

[mysqld]
character_set_server = utf8
#

# 重启

service mysqld restart
# 对外开放权限

mysql -uroot -p

        set global validate_password_policy = LOW;

        grant all privileges on *.* to 'root'@'%' identified by 'root';

        flush privileges;

安装hive

        

将hive安装包apache-hive-2.3.3-bin.tar.gz放到/root/Downloads  

使用tar命令安装JDK至/usr/local目录 tar -zxvf /root/Downloads/apache-hive-2.3.3-bin.tar.gz -C /usr/local

ln -s apache-hive-2.3.3-bin hive

# 添加权限

chown -R hadoop:hadoop apache-hive-2.3.3-bin/   

没什么用呀,chown: 无效的用户: "hadoop:hadoop"

我使用的是root权限,忘了用户名,查询一下。

who

chown -R Hadoop:Hadoop apache-hive-2.3.3-bin/   

# 配置环境变量

vi /etc/profile    添加:

export HIVE_HOME=/usr/local/hive
export HIVE_CONF_DIR=/usr/local/hive/conf
export PATH=$PATH:$HIVE_HOME/bin

source /etc/profile

# 修改配置文件

[Hadoop@hadoop01 conf]$ cp hive-default.xml.template hive-site.xml
cp: 无法创建普通文件"hive-site.xml": 权限不够

就是因为这个文件属于root,更改为chown -R Hadoop:Hadoop apache-hive-2.3.3-bin/   即可

cp hive-default.xml.template hive-site.xml

# 启动hive

nohup bin/hive  --service metastore >> logs/metastore.log 2>&1 &

#启动spark

sbin/start-all.sh

配置yum源

# 备份源

 cd /etc/yum.repos.d
cp CentOS-Base.repo CentOS-Base.repo.bak
mkdir -p /etc/yum.repos.d/bak
cp CentOS-Debuginfo.repo CentOS-Debuginfo.repo.bak

cp CentOS-Media.repo CentOS-Media.repo.bak
cp CentOS-Vault.repo CentOS-Vault.repo.bak
mv /etc/yum.repos.d/*.bak /etc/yum.repos.d/bak
更新base repo

[base]
name=CentOS-6.10
enabled=1
failovermethod=priority
baseurl=http://mirrors.aliyun/centos-vault/6.10/os/$basearch/
gpgcheck=1
gpgkey=http://mirrors.aliyun/centos-vault/RPM-GPG-KEY-CentOS-6
[updates]
name=CentOS-6.10
enabled=1
failovermethod=priority
baseurl=http://mirrors.aliyun/centos-vault/6.10/updates/$basearch/
gpgcheck=1
gpgkey=http://mirrors.aliyunm/centos-vault/RPM-GPG-KEY-CentOS-6
[extras]
name=CentOS-6.10
enabled=1
failovermethod=priority
baseurl=http://mirrors.aliyun/centos-vault/6.10/extras/$basearch/
gpgcheck=1
gpgkey=http://mirrors.aliyun/centos-vault/RPM-GPG-KEY-CentOS-6
[epel]
name=Extra Packages for Enterprise Linux 6 - $basearch
enabled=1
failovermethod=priority
baseurl=http://mirrors.aliyun/epel-archive/6/$basearch
gpgcheck=0
gpgkey=http://mirrors.aliyun/epel-archive/RPM-GPG-KEY-EPEL-6

更新Vault

[base]
name=CentOS-6.10 - Base - mirrors.aliyun
failovermethod=priority    
baseurl=https://mirrors.aliyun/centos-vault/6.10/os/$basearch/
gpgcheck=0
gpgkey=http://mirrors.aliyun/centos/RPM-GPG-KEY-CentOS-6

#released updates
[updates]
name=CentOS-6.10 - Updates - mirrors.aliyun
failovermethod=priority
baseurl=https://mirrors.aliyun/centos-vault/6.10/updates/$basearch/
gpgcheck=0
gpgkey=http://mirrors.aliyun/centos/RPM-GPG-KEY-CentOS-6

#additional packages that may be useful
[extras]
name=CentOS-6.10 - Extras - mirrors.aliyun
failovermethod=priority
baseurl=https://mirrors.aliyun/centos-vault/6.10/extras/$basearch/
gpgcheck=0
gpgkey=http://mirrors.aliyun/centos/RPM-GPG-KEY-CentOS-6

#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-6.10 - Plus - mirrors.aliyun
failovermethod=priority
baseurl=https://mirrors.aliyun/centos-vault/6.10/centosplus/$basearch/
gpgcheck=0
enabled=0
gpgkey=http://mirrors.aliyun/centos/RPM-GPG-KEY-CentOS-6

#contrib - packages by Centos Users
[contrib]
name=CentOS-6.10 - Contrib - mirrors.aliyun
failovermethod=priority
baseurl=https://mirrors.aliyun/centos-vault/6.10/contrib/$basearch/
gpgcheck=0
enabled=0
gpgkey=http://mirrors.aliyun/centos/RPM-GPG-KEY-CentOS-6

新建epel.rpeo

# 阿里云的epel源
[epel-aliyun]
name=epel-aliyun-CentOS-$releasever
baseurl=https://mirrors.aliyun/epel-archive/$releasever/$basearch/
gpgcheck=0

新建elrepo.repo

### Name: ELRepo Community Enterprise Linux Repository for el6
### URL: https://elrepo/

[elrepo]
name=ELRepo Community Enterprise Linux Repository - el6
baseurl=https://mirrors.aliyun/elrepo/elrepo/el6/$basearch/
#mirrorlist=http://mirrors.elrepo/mirrors-elrepo.el6
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-elrepo
protect=0

[elrepo-kernel]
name=ELRepo Community Enterprise Linux Kernel Repository - el6
baseurl=https://mirrors.aliyun/elrepo/kernel/el6/$basearch/
#mirrorlist=http://mirrors.elrepo/mirrors-elrepo-kernel.el6
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-elrepo
protect=0

[elrepo-extras]
name=ELRepo Community Enterprise Linux Extras Repository - el6
baseurl=https://mirrors.aliyun/elrepo/extras/el6/$basearch/
#mirrorlist=http://mirrors.elrepo/mirrors-elrepo-extras.el6
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-elrepo
protect=0

# 更新

yum clean all && yum makecache