2023年11月29日发(作者:)
docker+rancher搭建过程中的ETCD等相关报错
docker+rancher搭建k8s 报错笔记
CentOS 7
docker v1.20.x
rancher v2.3.5
内⽹环境 ⽆法出外⽹
ps:采⽤Nexus3作为docker镜像仓库代理,Nexus3的代理的相关安装配置
ETCD⽆法创建问题
没有外⽹,经常出现docker 镜像⽆法拉取的情况,rancher正常启动后,登录到webui界⾯,开始创建k8s集群,发现抱错,etcd⽆法创建
如下
查看了⼀下rancher容器运⾏log,⽇志如下
2022/02/14 13:34:14 [WARNING] Failed to create Docker container [etcd] on host [192.168.1.1]: Error response from daemon: No such image: rancher/cor
eos-etcd:v3.4.3-rancher1
2022/02/14 13:34:14 [ERROR] cluster [c-rc4nk] provisioning: [etcd] Failed to bring up Etcd Plane: Failed to create [etcd] container on host [192.168.1.1]: F
ailed to create Docker container [etcd] on host [192.168.1.1]: <nil>
2022/02/14 13:34:14 [INFO] kontainerdriver rancherkubernetesengine stopped
2022/02/14 13:34:14 [ERROR] ClusterController c-rc4nk [cluster-provisioner-controller] failed with : [etcd] Failed to bring up Etcd Plane: Failed to create [et
cd] container on host [192.168.1.1]: Failed to create Docker container [etcd] on host [192.168.1.1]: <nil>
2022/02/14 13:37:44 [ERROR] Error parsing max age Error parsing auth refresh max age: time: invalid duration s
rancher/coreos-etcd:v3.4.3-rancher1
docker镜像⽆法拉取,没法后⾯从其他地⽅pull取该镜像,再推到Nexus中
[etcd] Failed to bring up Etcd Plane
ETCD启动失败问题,这个是个经典的问题,⽹上很多教程,就是得重新删除⼲净,重启docker服务
[etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [192.168.154.231] failed to report healthy. Check etcd container logs on each host for
more information
清除的指令如下
docker stop $(docker ps -aq)
#
注意,这个会把所⽤容器删除
docker system prune -f
# volume
注意,这个会清空所有
docker volume rm $(docker volume ls -q)
# image
注意,这个会清空所有
docker image rm $(docker image ls -q)
rm -rf /etc/ceph
/etc/cni
/etc/kubernetes
/opt/cni
/opt/rke
/run/secrets/
/run/calico
/run/flannel
/var/lib/calico
/var/lib/etcd
/var/lib/cni
/var/lib/kubelet
/var/lib/rancher/rke/log
/var/log/containers
/var/log/pods
反复重启后,终于以为快到胜利了,谁知还有王炸
Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
这个尼玛到处找解决⽅案,找了好久,也是各种尝试都失败了
docker logs kubelet
E0215 14:34:53.376690 25851 :2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:do
cker: network plugin is not ready: cni config uninitialized
W0215 14:34:54.249077 25851 :237] Unable to update cni config: no networks found in /etc/cni/net.d
E0215 14:34:55.238592 25851 aws_:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see tialsChainVerboseErrors
I0215 14:34:55.238747 25851 :98] Refreshing cache for provider: *tDockerConfigProvider
I0215 14:34:55.788537 25851 kube_docker_:345] Stop pulling image "rancher/pause:3.1": "7675586df687: Downloading "
E0215 14:34:55.788569 25851 remote_:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed pulling ima
ge "rancher/pause:3.1": Get "/v2/": dial tcp: lookup on [::1]:53: read udp [::1]:52359->[::1]:53: read: connectio
n refused
E0215 14:34:55.788592 25851 kuberuntime_:68] CreatePodSandbox for pod "rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-1
6b8-4ca6-9b62-8f96be849d6c)" failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "/v2/": dia
l tcp: lookup on [::1]:53: read udp [::1]:52359->[::1]:53: read: connection refused
E0215 14:34:55.788601 25851 kuberuntime_:729] createPodSandbox for pod "rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-
16b8-4ca6-9b62-8f96be849d6c)" failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "/v2/": di
al tcp: lookup on [::1]:53: read udp [::1]:52359->[::1]:53: read: connection refused
E0215 14:34:55.788624 25851 pod_:191] Error syncing pod 36f21b3d-16b8-4ca6-9b62-8f96be849d6c ("rke-network-plugin-deploy-job-lpm7j_
kube-system(36f21b3d-16b8-4ca6-9b62-8f96be849d6c)"), skipping: failed to "CreatePodSandbox" for "rke-network-plugin-deploy-job-lpm7j_kube-system(
36f21b3d-16b8-4ca6-9b62-8f96be849d6c)" with CreatePodSandboxError: "CreatePodSandbox for pod "rke-network-plugin-deploy-job-lpm7j_kube-syste
m(36f21b3d-16b8-4ca6-9b62-8f96be849d6c)" failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "registry-1.
/v2/": dial tcp: lookup on [::1]:53: read udp [::1]:52359->[::1]:53: read: connection refused"
E0215 14:34:58.384759 25851 :2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:do
cker: network plugin is not ready: cni config uninitialized
W0215 14:34:58.917882 25851 :412] Failed to create summary reader for "/docker/24b652688425c7ce066f2b48c152e3517cefc5b6e6ddd6c
678d0f3690ce85343": none of the resources are being tracked.
W0215 14:34:59.249222 25851 :237] Unable to update cni config: no networks found in /etc/cni/net.d
W0215 14:35:01.876641 25851 :412] Failed to create summary reader for "/docker/6c119f3554c2826383b27ada71887a6a7b91549440a2fc
908a991fb6e99cca83": none of the resources are being tracked.
E0215 14:35:03.389468 25851 :2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:do
cker: network plugin is not ready: cni config uninitialized
W0215 14:35:03.878133 25851 :412] Failed to create summary reader for "/docker/f1caf907cd832aa331d4421c745891abfd0ab0b4aa250be
963797ddb49ede164": none of the resources are being tracked.
W0215 14:35:04.249318 25851 :237] Unable to update cni config: no networks found in /etc/cni/net.d
W0215 14:35:06.514104 25851 :412] Failed to create summary reader for "/docker/0a2415cd7536b2c252698e4c59ff5d2fb0b8f5888d11c4cc
fb189082c850178e": none of the resources are being tracked.
E0215 14:35:08.395996 25851 :2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:do
cker: network plugin is not ready: cni config uninitialized
I0215 14:35:09.073937 25851 container_manager_:469] [ContainerManager]: Discovered runtime cgroups name: //e
W0215 14:35:09.213734 25851 :412] Failed to create summary reader for "/docker/7b9a4127f215057a403ff4747f5ee82e99bf58fe3db4da46
8573f987087cc73c": none of the resources are being tracked.
W0215 14:35:09.249559 25851 :237] Unable to update cni config: no networks found in /etc/cni/net.d
I0215 14:35:11.008180 25851 kuberuntime_:424] No sandbox for pod "rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-16b8-4c
a6-9b62-8f96be849d6c)" can be found. Need to start a new one
W0215 14:35:11.676849 25851 :412] Failed to create summary reader for "/docker/49aa66c13dd22d740981bfd6cea750b674c5bcf6dc2b148
6095723bf661439c0": none of the resources are being tracked.
E0215 14:35:13.401485 25851 :2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:do
cker: network plugin is not ready: cni config uninitialized
W0215 14:35:14.249673 25851 :237] Unable to update cni config: no networks found in /etc/cni/net.d
W0215 14:35:14.609182 25851 :412] Failed to create summary reader for "/docker/0c05b25ff9b0f72ccbd8be4dd7767aac3c56cb279521d91
a3ac4f703473227aa": none of the resources are being tracked.
认为关键错误是
Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uniniti
alized
尝试了下⾯的操作
添加配置⽂件
mkdir /etc/cni/net.d
vi /etc/cni/net.d/st
{
"name": "cbr0",
"cniVersion": "0.3.0", #rancherdockerflannel-cni
这个版本内是去的官⽅仓库中查到的最新版本
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
重启后,rancher-web-ui上还是报错
Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
但是,重新查看 有新发现
docker logs kubelet
E0215 15:15:52.939226 25851 pod_:191] Error syncing pod ea4ac90a-352a-4edc-8a4b-347f5ce5405c ("rke-network-plugin-deploy-job-jdt6t_
kube-system(ea4ac90a-352a-4edc-8a4b-347f5ce5405c)"), skipping: failed to "CreatePodSandbox" for "rke-network-plugin-deploy-job-jdt6t_kube-system(e
a4ac90a-352a-4edc-8a4b-347f5ce5405c)" with CreatePodSandboxError: "CreatePodSandbox for pod "rke-network-plugin-deploy-job-jdt6t_kube-system(
ea4ac90a-352a-4edc-8a4b-347f5ce5405c)" failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "
/v2/": dial tcp: lookup on [::1]:53: read udp [::1]:46815->[::1]:53: read: connection refused"
W0215 15:15:54.359391 25851 :202] Error validating CNI config list {
"name": "cbr0",
"cniVersion": "0.3.0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
: [failed to find plugin "flannel" in path [/opt/cni/bin] failed to find plugin "portmap" in path [/opt/cni/bin]]
W0215 15:15:54.359428 25851 :237] Unable to update cni config: no valid networks found in /etc/cni/net.d
有可能是前执⾏清空docker指令导致的 /opt/cni/bin⽬录下没有任何程序,没有重新创建,然后复制了其它同镜像的容器⾥/opt/cni/bin
下⾯的⽂件到宿主机/opt/cni/bin⽬录下
再次重新启动,还是ui上还是报错,查看了 ⼜有新发现,最后锁定下⾯的错误
docker logs kubetel
E0215 15:47:53.338808 6762 remote_:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed pulling ima
ge "rancher/pause:3.1": Get "/v2/": dial tcp: lookup on [::1]:53: read udp [::1]:33648->[::1]:53: read: connectio
n refused
rancher/pause:3.1
⽆法拉取,解决这个docker镜像⽆法拉取的问题后,终于正常跑起来来了
附上愉快的图
注意保证服务器环境的整洁,以前残留的数据会影响集群的
kubelet容器会挂载 ⽬录的
/etc/cni/opt/cni
etcd会挂载⽬录
/var/lib/etcd
最后献上⼏篇参考


发布评论