2023年11月29日发(作者:)

docker+rancher搭建过程中的ETCD等相关报错

docker+rancher搭建k8s 报错笔记

CentOS 7

docker v1.20.x

rancher v2.3.5

内⽹环境 ⽆法出外⽹

ps:采⽤Nexus3作为docker镜像仓库代理,Nexus3的代理的相关安装配置

ETCD⽆法创建问题

没有外⽹,经常出现docker 镜像⽆法拉取的情况,rancher正常启动后,登录到webui界⾯,开始创建k8s集群,发现抱错,etcd⽆法创建

如下

查看了⼀下rancher容器运⾏log,⽇志如下

2022/02/14 13:34:14 [WARNING] Failed to create Docker container [etcd] on host [192.168.1.1]: Error response from daemon: No such image: rancher/cor

eos-etcd:v3.4.3-rancher1

2022/02/14 13:34:14 [ERROR] cluster [c-rc4nk] provisioning: [etcd] Failed to bring up Etcd Plane: Failed to create [etcd] container on host [192.168.1.1]: F

ailed to create Docker container [etcd] on host [192.168.1.1]: <nil>

2022/02/14 13:34:14 [INFO] kontainerdriver rancherkubernetesengine stopped

2022/02/14 13:34:14 [ERROR] ClusterController c-rc4nk [cluster-provisioner-controller] failed with : [etcd] Failed to bring up Etcd Plane: Failed to create [et

cd] container on host [192.168.1.1]: Failed to create Docker container [etcd] on host [192.168.1.1]: <nil>

2022/02/14 13:37:44 [ERROR] Error parsing max age Error parsing auth refresh max age: time: invalid duration s

rancher/coreos-etcd:v3.4.3-rancher1

docker镜像⽆法拉取,没法后⾯从其他地⽅pull取该镜像,再推到Nexus中

[etcd] Failed to bring up Etcd Plane

ETCD启动失败问题,这个是个经典的问题,⽹上很多教程,就是得重新删除⼲净,重启docker服务

[etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [192.168.154.231] failed to report healthy. Check etcd container logs on each host for

more information

清除的指令如下

docker stop $(docker ps -aq)

#

注意,这个会把所⽤容器删除

docker system prune -f

# volume

注意,这个会清空所有

docker volume rm $(docker volume ls -q)

# image

注意,这个会清空所有

docker image rm $(docker image ls -q)

rm -rf /etc/ceph

/etc/cni

/etc/kubernetes

/opt/cni

/opt/rke

/run/secrets/

/run/calico

/run/flannel

/var/lib/calico

/var/lib/etcd

/var/lib/cni

/var/lib/kubelet

/var/lib/rancher/rke/log

/var/log/containers

/var/log/pods

反复重启后,终于以为快到胜利了,谁知还有王炸

Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

这个尼玛到处找解决⽅案,找了好久,也是各种尝试都失败了

docker logs kubelet

E0215 14:34:53.376690 25851 :2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:do

cker: network plugin is not ready: cni config uninitialized

W0215 14:34:54.249077 25851 :237] Unable to update cni config: no networks found in /etc/cni/net.d

E0215 14:34:55.238592 25851 aws_:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.

For verbose messaging see tialsChainVerboseErrors

I0215 14:34:55.238747 25851 :98] Refreshing cache for provider: *tDockerConfigProvider

I0215 14:34:55.788537 25851 kube_docker_:345] Stop pulling image "rancher/pause:3.1": "7675586df687: Downloading "

E0215 14:34:55.788569 25851 remote_:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed pulling ima

ge "rancher/pause:3.1": Get "/v2/": dial tcp: lookup on [::1]:53: read udp [::1]:52359->[::1]:53: read: connectio

n refused

E0215 14:34:55.788592 25851 kuberuntime_:68] CreatePodSandbox for pod "rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-1

6b8-4ca6-9b62-8f96be849d6c)" failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "/v2/": dia

l tcp: lookup on [::1]:53: read udp [::1]:52359->[::1]:53: read: connection refused

E0215 14:34:55.788601 25851 kuberuntime_:729] createPodSandbox for pod "rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-

16b8-4ca6-9b62-8f96be849d6c)" failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "/v2/": di

al tcp: lookup on [::1]:53: read udp [::1]:52359->[::1]:53: read: connection refused

E0215 14:34:55.788624 25851 pod_:191] Error syncing pod 36f21b3d-16b8-4ca6-9b62-8f96be849d6c ("rke-network-plugin-deploy-job-lpm7j_

kube-system(36f21b3d-16b8-4ca6-9b62-8f96be849d6c)"), skipping: failed to "CreatePodSandbox" for "rke-network-plugin-deploy-job-lpm7j_kube-system(

36f21b3d-16b8-4ca6-9b62-8f96be849d6c)" with CreatePodSandboxError: "CreatePodSandbox for pod "rke-network-plugin-deploy-job-lpm7j_kube-syste

m(36f21b3d-16b8-4ca6-9b62-8f96be849d6c)" failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "registry-1.

/v2/": dial tcp: lookup on [::1]:53: read udp [::1]:52359->[::1]:53: read: connection refused"

E0215 14:34:58.384759 25851 :2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:do

cker: network plugin is not ready: cni config uninitialized

W0215 14:34:58.917882 25851 :412] Failed to create summary reader for "/docker/24b652688425c7ce066f2b48c152e3517cefc5b6e6ddd6c

678d0f3690ce85343": none of the resources are being tracked.

W0215 14:34:59.249222 25851 :237] Unable to update cni config: no networks found in /etc/cni/net.d

W0215 14:35:01.876641 25851 :412] Failed to create summary reader for "/docker/6c119f3554c2826383b27ada71887a6a7b91549440a2fc

908a991fb6e99cca83": none of the resources are being tracked.

E0215 14:35:03.389468 25851 :2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:do

cker: network plugin is not ready: cni config uninitialized

W0215 14:35:03.878133 25851 :412] Failed to create summary reader for "/docker/f1caf907cd832aa331d4421c745891abfd0ab0b4aa250be

963797ddb49ede164": none of the resources are being tracked.

W0215 14:35:04.249318 25851 :237] Unable to update cni config: no networks found in /etc/cni/net.d

W0215 14:35:06.514104 25851 :412] Failed to create summary reader for "/docker/0a2415cd7536b2c252698e4c59ff5d2fb0b8f5888d11c4cc

fb189082c850178e": none of the resources are being tracked.

E0215 14:35:08.395996 25851 :2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:do

cker: network plugin is not ready: cni config uninitialized

I0215 14:35:09.073937 25851 container_manager_:469] [ContainerManager]: Discovered runtime cgroups name: //e

W0215 14:35:09.213734 25851 :412] Failed to create summary reader for "/docker/7b9a4127f215057a403ff4747f5ee82e99bf58fe3db4da46

8573f987087cc73c": none of the resources are being tracked.

W0215 14:35:09.249559 25851 :237] Unable to update cni config: no networks found in /etc/cni/net.d

I0215 14:35:11.008180 25851 kuberuntime_:424] No sandbox for pod "rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-16b8-4c

a6-9b62-8f96be849d6c)" can be found. Need to start a new one

W0215 14:35:11.676849 25851 :412] Failed to create summary reader for "/docker/49aa66c13dd22d740981bfd6cea750b674c5bcf6dc2b148

6095723bf661439c0": none of the resources are being tracked.

E0215 14:35:13.401485 25851 :2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:do

cker: network plugin is not ready: cni config uninitialized

W0215 14:35:14.249673 25851 :237] Unable to update cni config: no networks found in /etc/cni/net.d

W0215 14:35:14.609182 25851 :412] Failed to create summary reader for "/docker/0c05b25ff9b0f72ccbd8be4dd7767aac3c56cb279521d91

a3ac4f703473227aa": none of the resources are being tracked.

认为关键错误是

Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uniniti

alized

尝试了下⾯的操作

添加配置⽂件

mkdir /etc/cni/net.d

vi /etc/cni/net.d/st

{

"name": "cbr0",

"cniVersion": "0.3.0", #rancherdockerflannel-cni

这个版本内是去的官⽅仓库中查到的最新版本

"plugins": [

{

"type": "flannel",

"delegate": {

"hairpinMode": true,

"isDefaultGateway": true

}

},

{

"type": "portmap",

"capabilities": {

"portMappings": true

}

}

]

}

重启后,rancher-web-ui上还是报错

Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

但是,重新查看 有新发现

docker logs kubelet

E0215 15:15:52.939226 25851 pod_:191] Error syncing pod ea4ac90a-352a-4edc-8a4b-347f5ce5405c ("rke-network-plugin-deploy-job-jdt6t_

kube-system(ea4ac90a-352a-4edc-8a4b-347f5ce5405c)"), skipping: failed to "CreatePodSandbox" for "rke-network-plugin-deploy-job-jdt6t_kube-system(e

a4ac90a-352a-4edc-8a4b-347f5ce5405c)" with CreatePodSandboxError: "CreatePodSandbox for pod "rke-network-plugin-deploy-job-jdt6t_kube-system(

ea4ac90a-352a-4edc-8a4b-347f5ce5405c)" failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "

/v2/": dial tcp: lookup on [::1]:53: read udp [::1]:46815->[::1]:53: read: connection refused"

W0215 15:15:54.359391 25851 :202] Error validating CNI config list {

"name": "cbr0",

"cniVersion": "0.3.0",

"plugins": [

{

"type": "flannel",

"delegate": {

"hairpinMode": true,

"isDefaultGateway": true

}

},

{

"type": "portmap",

"capabilities": {

"portMappings": true

}

}

]

}

: [failed to find plugin "flannel" in path [/opt/cni/bin] failed to find plugin "portmap" in path [/opt/cni/bin]]

W0215 15:15:54.359428 25851 :237] Unable to update cni config: no valid networks found in /etc/cni/net.d

有可能是前执⾏清空docker指令导致的 /opt/cni/bin⽬录下没有任何程序,没有重新创建,然后复制了其它同镜像的容器⾥/opt/cni/bin

下⾯的⽂件到宿主机/opt/cni/bin⽬录下

再次重新启动,还是ui上还是报错,查看了 ⼜有新发现,最后锁定下⾯的错误

docker logs kubetel

E0215 15:47:53.338808 6762 remote_:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed pulling ima

ge "rancher/pause:3.1": Get "/v2/": dial tcp: lookup on [::1]:53: read udp [::1]:33648->[::1]:53: read: connectio

n refused

rancher/pause:3.1

⽆法拉取,解决这个docker镜像⽆法拉取的问题后,终于正常跑起来来了

附上愉快的图

注意保证服务器环境的整洁,以前残留的数据会影响集群的

kubelet容器会挂载 ⽬录的

/etc/cni/opt/cni

etcd会挂载⽬录

/var/lib/etcd

最后献上⼏篇参考