[TOC]
23-Kubernetes之企业运维实践操作笔记
|[TOC]
0x00 实操1.记一次 Kubernetes V1.23.x 集群证书过期或者延期处理操作实践案例
描述: 下述操作主要用于处理K8S证书已过期
或者即将过期的kubernetes集群
实践案例,当然网上百度、CSDN(捡垃圾)大多数方法均是一知半解,不同的K8S版本操作有些许不同,若全部按照其操作,有可能你需要重建K8S了,别问我怎么知道的,因为我踩过坑( Ĭ ^ Ĭ ),所以建议在遇到问题时先查询K8S官方文档。
1.前言简述
在年后上班的第一天,我像往常一样,登录到K8S集群之中依次检查应用,在检查开发测试环境的k8s集群时,发现执行kubectl命令报证书过期错误,顿时心情都不好了,却也无可奈何,只能进行证书续签了,由于是高可用集群遇到了许多坑,为了方便自己以及广大的运维工作者,解决相关问题,遂整理了此篇文章。
1 | # 连接 Api-server 失败,报证书已过期不可用。 |
2.实践环境
集群版本及其节点描述:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18# 集群版本
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:39:51Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
# 集群节点
$ data -s "2023-01-01"
$ kubectl get node
NAME STATUS ROLES AGE VERSION
weiyigeek-107 Ready control-plane,master 381d v1.23.1
weiyigeek-108 Ready control-plane,master 380d v1.23.1
weiyigeek-109 Ready control-plane,master 380d v1.23.1
weiyigeek-223 Ready work 380d v1.23.1
weiyigeek-224 Ready work 380d v1.23.1
weiyigeek-225 Ready work 381d v1.23.1
weiyigeek-226 Ready work 220d v1.23.1
# 论保存过程配置文件的重要性,在搭建k8s集群时建议备份资源清单。
kubectl -n kube-system get cm kubeadm-config -o yaml > kubeadm-config-v1.23.1.yaml
3.证书续签
高可用K8S集群,证书续签操作流程步骤如下:
0.在进行操作前一定要进行备份,便于回退处理,此处我在三台master节点之一的weiyigeek-107
机器上操作,后续默认也在此机器上操作,若需在其他机器上操作我会进行说明1
2
3
4
5
6
7
8# 备份旧的配置文件。
cp -a /etc/kubernetes{,.bak}
cp -a /var/lib/kubelet{,.bak}
cp -a /var/lib/etcd /var/lib/etcd.bak
# 备份集群配置 (当证书到期时是无法执行的此步骤可跳过)但可以利用date命令将系统时间设置到过期前。
data -s "2023-01-01" || timedatectl set-time "2023-01-01"
kubectl -n kube-system get cm kubeadm-config -o yaml > kubeadm-init-config.yaml # 后续会用到此原始配置文件。
1.使用openssl命令查询单个证书可用时间及其相关信息1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55# k8s 集群的 ca.crt 证书有效期为 十年
# k8s 集群的 apiserver.crt 、kubelet.crt、etcd.crt 证书默认有效期为 一年,当然你也可以自行修改为十年(后续有文章进行讲解)
$ for i in $(ls /etc/kubernetes/pki/*.crt /etc/kubernetes/pki/etcd/*.crt); do echo "===== $i ====="; openssl x509 -in $i -text -noout | grep -A 3 'Validity' ; done
# for item in `find /etc/kubernetes/pki -maxdepth 2 -name "*.crt"`;do echo ======================$item===============;openssl x509 -in $item -text -noout| grep -A 3 Not;done
===== /etc/kubernetes/pki/apiserver.crt =====
Validity
Not Before: Jan 15 10:42:56 2022 GMT # 颁发时间
Not After : Jan 15 10:42:57 2023 GMT # 到期时间
Subject: CN = kube-apiserver # 通用名称
===== /etc/kubernetes/pki/apiserver-etcd-client.crt =====
Validity
Not Before: Jan 15 10:42:58 2022 GMT
Not After : Jan 15 10:42:59 2023 GMT
Subject: O = system:masters, CN = kube-apiserver-etcd-client
===== /etc/kubernetes/pki/apiserver-kubelet-client.crt =====
Validity
Not Before: Jan 15 10:42:56 2022 GMT
Not After : Jan 15 10:42:57 2023 GMT
Subject: O = system:masters, CN = kube-apiserver-kubelet-client
===== /etc/kubernetes/pki/ca.crt =====
Validity
Not Before: Jan 15 10:42:56 2022 GMT
Not After : Jan 13 10:42:56 2032 GMT
Subject: CN = kubernetes
===== /etc/kubernetes/pki/etcd/ca.crt =====
Validity
Not Before: Jan 15 10:42:58 2022 GMT
Not After : Jan 13 10:42:58 2032 GMT
Subject: CN = etcd-ca
===== /etc/kubernetes/pki/etcd/healthcheck-client.crt =====
Validity
Not Before: Jan 15 10:42:58 2022 GMT
Not After : Jan 15 10:42:59 2023 GMT
Subject: O = system:masters, CN = kube-etcd-healthcheck-client
===== /etc/kubernetes/pki/etcd/peer.crt =====
Validity
Not Before: Jan 15 10:42:58 2022 GMT
Not After : Jan 15 10:42:59 2023 GMT
Subject: CN = weiyigeek-107
===== /etc/kubernetes/pki/etcd/server.crt =====
Validity
Not Before: Jan 15 10:42:58 2022 GMT
Not After : Jan 15 10:42:59 2023 GMT
Subject: CN = weiyigeek-107
===== /etc/kubernetes/pki/front-proxy-ca.crt =====
Validity
Not Before: Jan 15 10:42:58 2022 GMT
Not After : Jan 13 10:42:58 2032 GMT
Subject: CN = front-proxy-ca
===== /etc/kubernetes/pki/front-proxy-client.crt =====
Validity
Not Before: Jan 15 10:42:58 2022 GMT
Not After : Jan 15 10:42:58 2023 GMT
Subject: CN = front-proxy-client
2.查看当前集群证书相关信息,包含所有证书名称以及证书颁发机构、到期时间等, 此处可以看到均已经到期。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21$ sudo kubeadm certs check-expiration
# [check-expiration] Reading configuration from the cluster...
# [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
# [check-expiration] Error reading configuration from the Cluster. Falling back to default configuration
# CERTIFICATE EXPIRES(过期时间) RESIDUAL TIME(剩余时间) CERTIFICATE AUTHORITY EXTERNALLY MANAGED(是否是外部管理)
# admin.conf Jan 15, 2023 10:43 UTC <invalid> ca no
# apiserver Jan 15, 2023 10:42 UTC <invalid> ca no
# apiserver-etcd-client Jan 15, 2023 10:42 UTC <invalid> etcd-ca no
# apiserver-kubelet-client Jan 15, 2023 10:42 UTC <invalid> ca no
# controller-manager.conf Jan 15, 2023 10:43 UTC <invalid> ca no
# etcd-healthcheck-client Jan 15, 2023 10:42 UTC <invalid> etcd-ca no
# etcd-peer Jan 15, 2023 10:42 UTC <invalid> etcd-ca no
# etcd-server Jan 15, 2023 10:42 UTC <invalid> etcd-ca no
# front-proxy-client Jan 15, 2023 10:42 UTC <invalid> front-proxy-ca no
# scheduler.conf Jan 15, 2023 10:43 UTC <invalid> ca no
# CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
# ca Jan 13, 2032 10:42 UTC 8y no
# etcd-ca Jan 13, 2032 10:42 UTC 8y no
# front-proxy-ca Jan 13, 2032 10:42 UTC 8y no
温馨提示: 如果 Etcd 是由Kubeadm创建和托管的此时也可以通过下面的方式进行证书的续期, 如果是外部高可用环境管理需要则手动进行更新证书配置;
3.使用 certs 的 renew 子命令刷新集群所有证书的到期时间进行再续期一年, 此处 –config 参数指定的是我当初创建集群的初始化配置清单,若没有可以安装步骤0进行生成。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58~/.k8s$ sudo kubeadm certs renew all --config=./kubeadm-init-config.yaml
# W1212 17:17:16.721037 1306627 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
# certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed # 嵌入在kubeconfig文件中的证书,供管理员使用,并对kubeadm本身进行更新 (admin.conf )
# certificate for serving the Kubernetes API renewed # 更新Kubernetes API服务证书
# certificate the apiserver uses to access etcd renewed # 服务器访问etcd所使用的证书已更新
# certificate for the API server to connect to kubelet renewed # API服务器连接到kubelet的证书已更新
# certificate embedded in the kubeconfig file for the controller manager to use renewed # 证书嵌入在kubeconfig文件中,供控制器管理器使用更新 (controller-manager.conf)
# certificate for liveness probes to healthcheck etcd renewed # 健康检查etcd激活探针证书续期
# certificate for etcd nodes to communicate with each other renewed # 用于etcd节点之间通信的证书更新
# certificate for serving etcd renewed # 续期etcd“服务证书”
# certificate for the front proxy client renewed # 前代理客户端的证书更新
# certificate embedded in the kubeconfig file for the scheduler manager to use renewed # 证书嵌入在kubeconfig文件中,供调度器管理器使用更新 (scheduler.conf )
# 若看到已完成续订证书,您必须重新启动kube apiserver、kube控制器管理器、kube调度器等,以便它们可以使用新证书,表示证书续期成功
Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
# 检查证书续签以及到期时间
~/.k8s$ kubeadm certs check-expiration
# [check-expiration] Reading configuration from the cluster...
# [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
# CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
# admin.conf Jan 31, 2024 09:26 UTC 364d ca no
# apiserver Jan 31, 2024 09:26 UTC 364d ca no
# apiserver-etcd-client Jan 31, 2024 09:26 UTC 364d etcd-ca no
# apiserver-kubelet-client Jan 31, 2024 09:26 UTC 364d ca no
# controller-manager.conf Jan 31, 2024 09:26 UTC 364d ca no
# etcd-healthcheck-client Jan 31, 2024 09:26 UTC 364d etcd-ca no
# etcd-peer Jan 31, 2024 09:26 UTC 364d etcd-ca no
# etcd-server Jan 31, 2024 09:26 UTC 364d etcd-ca no
# front-proxy-client Jan 31, 2024 09:26 UTC 364d front-proxy-ca no
# scheduler.conf Jan 31, 2024 09:26 UTC 364d ca no
# CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
# ca Jan 13, 2032 10:42 UTC 8y no
# etcd-ca Jan 13, 2032 10:42 UTC 8y no
# front-proxy-ca Jan 13, 2032 10:42 UTC 8y no
# 使用stat命令查看 apiserver.key 与 apiserver.crt 证书修改时间
/etc/kubernetes/pki$ stat apiserver.key apiserver.crt
# File: apiserver.key
# Size: 1675 Blocks: 8 IO Block: 4096 regular file
# Device: fd00h/64768d Inode: 3670556 Links: 1
# Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
# Access: 2022-04-28 12:55:13.456040564 +0800 最近访问:
# Modify: 2023-01-31 17:26:51.108767670 +0800 最近更改:
# Change: 2023-01-31 17:26:51.108767670 +0800 最近改动:
# Birth: -
# File: apiserver.crt
# Size: 1338 Blocks: 8 IO Block: 4096 regular file
# Device: fd00h/64768d Inode: 3670557 Links: 1
# Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
# Access: 2023-01-31 17:28:58.104917185 +0800
# Modify: 2023-01-31 17:26:51.108767670 +0800
# Change: 2023-01-31 17:26:51.108767670 +0800
# Birth: -
4.完成证书更新后,此时我们需要重新生成新的K8S集群master节点所需的相关配置文件,例如 /etc/kubernetes
目录下的 admin.conf / controller-manager.conf / kubelet.conf / scheduler.conf
相关文件。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20# 初始化所需配置文件
$ rm -rf /etc/kubernetes/*.conf
$ kubeadm init phase kubeconfig all --config=kubeadm-init-config.yaml
# [kubeconfig] Using kubeconfig folder "/etc/kubernetes"
# [kubeconfig] Writing "admin.conf" kubeconfig file
# [kubeconfig] Writing "kubelet.conf" kubeconfig file
# [kubeconfig] Writing "controller-manager.conf" kubeconfig file
# [kubeconfig] Writing "scheduler.conf" kubeconfig file
# 可以看到文件时间已经发生变化
$ ls -alh /etc/kubernetes/*.conf
-rw------- 1 root root 5.6K Jan 31 21:53 /etc/kubernetes/admin.conf
-rw------- 1 root root 5.6K Jan 31 21:53 /etc/kubernetes/controller-manager.conf
-rw------- 1 root root 5.6K Jan 31 21:53 /etc/kubernetes/kubelet.conf
-rw------- 1 root root 5.5K Jan 31 21:53 /etc/kubernetes/scheduler.conf
# 为防止 kubelet 客户端证书轮换失败,我们需要将(此处坑有点大)kubelet-client-* 进行删除,在kubelet服务重启时又会自动生成
# 如果此轮换过程失败,你可能会在 kube-apiserver 日志中看到诸如 x509: certificate has expired or is not yet valid 之类的错误
# https://kubernetes.io/zh/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#kubelet-client-cert
rm -rf /var/lib/kubelet/pki/kubelet-client-*
补充: 若要生成其他master节点的K8S配置文件请参考如下,例如,此处是 weiyigeek-108 控制平面节点的conf配置文件。1
2# $ mkdir -vp /tmp/kubernetes/
# $ kubeadm init phase kubeconfig all --node-name weiyigeek-108 --kubeconfig-dir /tmp/kubernetes/
5.按照提示将当前操作集群 master(weiyigeek-107) 节点上重启 kube-apiserver, kube-controller-manager, kube-scheduler 以及 etcd 等相关服务。
1 | # 将新生成的集群连接配置文件覆盖到 ~/.kube/config |
6.此时在(weiyigeek-107) 节点上运行kubectl相关命令,则不会报证书到期了,我们可正常使用相关命令.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23$ kubectl get nodes
# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# weiyigeek-107 Ready control-plane,master 381d v1.23.1 192.168.12.107 <none> Ubuntu 20.04.1 LTS 5.4.0-137-generic containerd://1.4.12
# weiyigeek-108 Ready control-plane,master 57m v1.23.1 192.168.12.108 <none> Ubuntu 20.04.3 LTS 5.4.0-137-generic containerd://1.4.12
# weiyigeek-109 Ready control-plane,master 2m19s v1.23.1 192.168.12.109 <none> Ubuntu 20.04.3 LTS 5.4.0-137-generic containerd://1.4.12
# weiyigeek-223 Ready work 380d v1.23.1 192.168.12.223 <none> Ubuntu 20.04.3 LTS 5.4.0-94-generic containerd://1.4.12
# weiyigeek-224 Ready work 380d v1.23.1 192.168.12.224 <none> Ubuntu 20.04.3 LTS 5.4.0-42-generic containerd://1.4.12
# weiyigeek-225 Ready work 381d v1.23.1 192.168.12.225 <none> Ubuntu 20.04.3 LTS 5.4.0-94-generic containerd://1.4.12
# weiyigeek-226 Ready work 220d v1.23.1 192.168.12.226 <none> Ubuntu 20.04.3 LTS 5.4.0-80-generic containerd://1.4.12
$ kubectl get pod -n kube-system | egrep "kube-apiserver|kube-controller-manager|kube-scheduler|etcd"
# etcd-weiyigeek-107 1/1 Running 1 380d
# etcd-weiyigeek-108 1/1 Running 0 96d
# etcd-weiyigeek-109 1/1 Running 0 380d
# kube-apiserver-weiyigeek-107 1/1 Running 0 380d
# kube-apiserver-weiyigeek-108 1/1 Running 0 380d
# kube-apiserver-weiyigeek-109 1/1 Running 0 380d
# kube-controller-manager-weiyigeek-107 1/1 Running 2 (380d ago) 380d
# kube-controller-manager-weiyigeek-108 1/1 Running 1 (15d ago) 380d
# kube-controller-manager-weiyigeek-109 1/1 Running 1 (380d ago) 380d
# kube-scheduler-weiyigeek-107 1/1 Running 3 (15d ago) 380d
# kube-scheduler-weiyigeek-108 1/1 Running 2 (15d ago) 380d
# kube-scheduler-weiyigeek-109 1/1 Running 1 (380d ago) 380d
7.查看集群的健康状态,即调度器、控制器以及etcd数据库是否正常。1
2
3
4
5
6$ kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
你是否认为,实践到这里就结束了,当然不是由于此处是高可用K8S集群,更新证书后坑,远不止于此。
有兴趣的,请继续看下节。
4.集群证书更新后 calico && kube-proxy 相关操作
描述:在进行该K8S高可用集群安装时,参照了我博客中的《在Ubuntu安装部署K8S高可用集群使用初体验》文章,我将文章链接地址( https://blog.weiyigeek.top/2020/4-27-470.html#0x04-高可用集群使用初体验 ), 其中在使用calico网络插件时选择将 Calico 数据存储在 etcd datastore 之中,所以在进行calico安装或者使用时需要将数据存储到etcd中,则肯定需要链接到etcd数据中,此处 calico-etcd.yaml 配置清单中的etcd-ca、etcd-cert、etcd-key 字段的值仍然为原证书,由于我们前面已经更新了所有组件证书,所以再进行数据的CURD肯定是无法链接到etcd,则此时我们需要重新该 calico-etcd.yaml
资源配置文件。
在更新集群证书后发现的calico以及业务应用的异常情况,其中最明显的就是calic无法正常启动。1
2
3
4
5
6
7
8
9
10
11
12$ kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6cf9b574f-zlrjn 0/1 Running 0 3s
calico-node-5q8lq 0/1 CrashLoopBackOff 11 (6s ago) 25m
calico-node-62zd9 0/1 CrashLoopBackOff 9 (5m6s ago) 25m
calico-node-85b7k 0/1 Running 11 (66s ago) 25m
calico-node-8mt8q 0/1 Running 3 (66s ago) 4m43s
calico-node-cdkf8 0/1 CrashLoopBackOff 9 (5m6s ago) 25m
calico-node-jgm6q 0/1 CrashLoopBackOff 9 (4m56s ago) 25m
calico-node-x2b9q 0/1 CrashLoopBackOff 9 (5m6s ago) 25m
coredns-65c54cc984-7vt8m 0/1 ContainerCreating 0 7m5s
coredns-65c54cc984-hf774 0/1 ContainerCreating 0 25m
操作步骤:
步骤 01.此处参考上面的文章进行更改etcd数据库证书链接字段。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20# (1) Install Calico with etcd datastore (使用etcd数据存储安装Calico)
curl https://docs.projectcalico.org/manifests/calico-etcd.yaml -O
# calico-etcd 网络与etc集群连接修改(此处指定pod子网地址)
ETCD_CA=`cat /etc/kubernetes/pki/etcd/ca.crt | base64 | tr -d '\n'`
ETCD_CERT=`cat /etc/kubernetes/pki/etcd/server.crt | base64 | tr -d '\n'`
ETCD_KEY=`sudo cat /etc/kubernetes/pki/etcd/server.key | base64 | tr -d '\n'`
POD_SUBNET=`sudo cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep cluster-cidr= | awk -F= '{print $NF}'`
sed -i "s@# etcd-key: null@etcd-key: ${ETCD_KEY}@g; s@# etcd-cert: null@etcd-cert: ${ETCD_CERT}@g; s@# etcd-ca: null@etcd-ca: ${ETCD_CA}@g" calico-etcd.yaml
sed -i 's#etcd_ca: ""#etcd_ca: "/calico-secrets/etcd-ca"#g; s#etcd_cert: ""#etcd_cert: "/calico-secrets/etcd-cert"#g; s#etcd_key: "" #etcd_key: "/calico-secrets/etcd-key" #g' calico-etcd.yaml
sed -i 's#etcd_endpoints: "http://<ETCD_IP>:<ETCD_PORT>"#etcd_endpoints: "https://192.168.12.107:2379,https://192.168.12.108:2379,https://192.168.12.109:2379"#g' calico-etcd.yaml
sed -i 's@# - name: CALICO_IPV4POOL_CIDR@- name: CALICO_IPV4POOL_CIDR@g; s@# value: "192.168.0.0/16"@ value: '"${POD_SUBNET}"'@g' calico-etcd.yaml
# (2) 覆盖部署calico到K8S集群中
kubectl apply -f calico-etcd.yaml
# (3) 此处对比两个配置清单文件即可发现证书的不同。
diff calico-etcd.yaml calico-etcd.yaml.bak
17,18c17,18
步骤 02.在master节点(weiyigeek-107)节点上执行如下命令,重启 calico 以及各节点中的 kube-proxy Pod 容器。1
2
3
4
5# calico-node
kubectl delete pod -n kube-system -l k8s-app=calico-node
# kube-proxy
kubectl delete pod -n kube-system -l k8s-app=calico-node
步骤 03.等待一段时间后,验证 calico-node 、kube-proxy服务启动情况1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16$ kubectl get pod -n kube-system | egrep "calico|kube-proxy"
calico-kube-controllers-6cf9b574f-42jnz 1/1 Running 0 98m
calico-node-dvvxk 1/1 Running 0 98m
calico-node-g9svc 1/1 Running 0 98m
calico-node-ggxqp 1/1 Running 0 98m
calico-node-jps97 1/1 Running 0 98m
calico-node-qf7cj 1/1 Running 0 92m
calico-node-vvw9f 1/1 Running 0 98m
calico-node-zvz8r 1/1 Running 0 98m
kube-proxy-25p5s 1/1 Running 0 220d
kube-proxy-8bl7f 1/1 Running 0 94m
kube-proxy-8jxvr 1/1 Running 0 93m
kube-proxy-d79mp 1/1 Running 0 381d
kube-proxy-dtdtm 1/1 Running 0 108m
kube-proxy-l7jxp 1/1 Running 0 93m
kube-proxy-nlgln 1/1 Running 0 381d
步骤 04.访问通过nodePort暴露的业务系统验证是否可以从任意节点代理转发访问。1
2
3
4
5
6
7
8
9
10
11
12
13$ kubectl get svc,pod -n devops -l app=jenkins
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# service/jenkins NodePort 10.109.163.223 <none> 8080:30001/TCP,50000:30634/TCP 380d
# NAME READY STATUS RESTARTS AGE
# pod/jenkins-7fc6f4fcf6-glqxj 1/1 Running 0 118m
$ curl -sI 10.109.163.223:8080 | head -n 1
HTTP/1.1 200 OK
$ curl -s 10.109.163.223:8080 | grep -oP "<title>\S.+</title>"
$ curl -s 192.168.12.107:30001 | grep -oP "<title>\S.+</title>"
<title>Dashboard [Jenkins]</title>
好的,此问题也已经解决了。
下面继续来看,如何在更新证书后删除或者新增
控制面板节点以及工作节点。
5.控制平面(Master)节点的移除&&添加
描述: 有时在更新集群证书后,某一个控制平面的节点可能一致不正常,我们也找不到解决办法时,最终解决方案就是重置该master节点并重新加入到K8S集群中。
操作流程
步骤 01.将需要master节点,例如此处(weiyigeek-108)节点设置不可调度,温馨提示操作前请注意备份该节点上的相关数据。1
2
3
4
5
6
7
8
9
10
11
12
13# 在 weiyigeek-107 节点上执行如下命令,设置节点设置不可调度,此时节点状态变成:Ready,SchedulingDisabled
kubectl drain weiyigeek-108 --delete-local-data --delete-emptydir-data --force --ignore-daemonsets
# node/weiyigeek-108 cordoned
# WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-qf7cj, kube-system/kube-proxy-8jxvr
# node/weiyigeek-108 drained
kubectl get node weiyigeek-108
NAME STATUS ROLES AGE VERSION
weiyigeek-108 Ready,SchedulingDisabled control-plane,master 3h50m v1.23.1
# 从集群中移除 weiyigeek-108 节点
kubectl delete node weiyigeek-109
步骤 02.生成master节点以及工作节点加入到K8S集群的认证Token及其命令示例(值得学习借鉴)。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19# (1) 查看token是否失效默认是24H,如果失效过期可以重新进行生成token并打印加入命令
kubeadm token list
kubeadm token create --print-join-command
# kubeadm join slb-vip.k8s:16443 --token vkhqa1.t3gtrbowlalt8um5 --discovery-token-ca-cert-hash sha256:bfc86e13da79a1ec5f53cef99661e4e3f51adda59c525cb9377cfe59956b1e59 # 注意,配置清单中会使用到
# (2)从节点的 kubeadm 加入到k8s集群之中 (推荐),使用此命令调用init工作流的单个阶段
kubeadm init phase upload-certs --upload-certs
# [upload-certs] Using certificate key:
# c6a084cb06aaae2f4581145dbbe6057ce111c88fdac4ff4405a0a2db58882d76 # 注意,配置清单中会使用到
# 3) 获取CA(证书)公钥哈希值
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^ .* //'
# (stdin)= bfc86e13da79a1ec5f53cef99661e4e3f51adda59c525cb9377cfe59956b1e59
# 此处是公钥哈希值(一台机器上ca证书不变就一直是该sha256的值)
# (4) 进行收到组合加入集群的 master 节点的 join 命令如下:
kubeadm join slb-vip.k8s:16443 --token ejwx62.vqwog6il5p83uk7y \
--discovery-token-ca-cert-hash sha256:bfc86e13da79a1ec5f53cef99661e4e3f51adda59c525cb9377cfe59956b1e59 \
--control-plane --certificate-key c6a084cb06aaae2f4581145dbbe6057ce111c88fdac4ff4405a0a2db58882d76
步骤 03.通过ssh远程登录weiyigeek-108
节点重置该节点。
1 | systemctl stop kubelet |
温馨提示: 如果主Master节点在初始化时候出错需要重新配置时候,请执行以上述命令进行重置;
步骤 04.准备控制平面节点加入到集群的JoinConfiguration资源配置清单1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23$ vim join-k8s.yaml
apiVersion: kubeadm.k8s.io/v1beta3
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: slb-vip.k8s:16443 # 高可用的APIServer地址
token: vkhqa1.t3gtrbowlalt8um5 # 上述步骤生成Token
caCertHashes:
- "sha256:bfc86e13da79a1ec5f53cef99661e4e3f51adda59c525cb9377cfe59956b1e59" # 上述步骤获取到的CA(证书)公钥哈希值
timeout: 5m0s
kind: JoinConfiguration
controlPlane:
certificateKey: "c6a084cb06aaae2f4581145dbbe6057ce111c88fdac4ff4405a0a2db58882d76" # 上述步骤获取到 certificate key
localAPIEndpoint:
advertiseAddress: 192.168.12.108 # 本地APIServer节点地址(即weiyigeek-108节点机器地址)
bindPort: 6443 # 本地APIServer节点端口
nodeRegistration:
criSocket: /run/containerd/containerd.sock # 重点,在1.24.x之前默认是使用docker-shim,此处我们指定使用containerd
imagePullPolicy: IfNotPresent
name: weiyigeek-108 # 重点,节点名称
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
温馨提示: 在 bootstrapToken 字段中也可跳过 caCertHashes 认证,请键值为unsafeSkipCAVerification: true
。
步骤 05.证书更新后原有的master节点移除加入前,需要针对久的etcd数据库进行处理操作。
此处有个小插曲,由于各master节点组成了一个高可用集群,每个master节点上都运行了etcd服务,以实现etcd数据库的高可用,有
Kubernetes 与 Etcd 版本与证书关联说明:
- ETCD 版本小于等于v1.9版本,etcd默认是不使用TLS连接,没有etcd相关证书,只需要更新master证书即可。
- ETCD 版本大于等于v1.10版本,etcd默认开启TLS,需要更新etcd证书和master证书。
此处我们的etcd版本是v3.5.x是开启TLS的,前面我们重启在master节点的etcd的相关pod后将会自动使用最新的证书,而在将master节点移除重新加入需进行如下操作,否则将会报错(查看尾部的错误实例):
1 | kubectl exec -n kube-system -it etcd-weiyigeek-107 /bin/sh |
步骤 06.在 weiyigeek-108 节点上执行加入集群命令, 若加入成果可以通过kubectl get nods
命令查看1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16# 停止该节点上所有 Pod 防止端口占用
crictl stop $(crictl ps -a -q)
# 按照 JoinConfiguration 资源配置清单加入到集群中,--v=5显示更完整的操作过程日志,排错必备。
kubeadm join --config=join-k8s.yaml --v=5
# 若显示如下提示则加入成功,否则请排查异常情况。
# This node has joined the cluster and a new control plane instance was created
# To start administering your cluster from this node, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ kubectl get node weiyigeek-108
# NAME STATUS ROLES AGE VERSION
# weiyigeek-108 Ready control-plane,master 3m9s v1.23.1
温馨提示: 如果加入主master节点时一直停留在 pre-flight 状态,请在第额外的几个节点上执行命令检查:curl -ik https://设置APISERVER地址:6443/version
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21$ curl -ik https://slb-vip.k8s:16443/version
# 正常状态下的输出
HTTP/2 200
audit-id: 77c614cb-0c27-42f5-a852-b5ef8415361f
cache-control: no-cache, private
content-type: application/json
x-kubernetes-pf-flowschema-uid: 41a01a35-c480-4cfd-8854-494261622406
x-kubernetes-pf-prioritylevel-uid: 4cfd380c-d39c-490f-96b0-dd4ed07be4e0
content-length: 263
date: Wed, 01 Feb 2023 07:34:01 GMT
{
"major": "1",
"minor": "23",
"gitVersion": "v1.23.0",
"gitCommit": "ab69524f795c42094a6630298ff53f3c3ebab7f4",
"gitTreeState": "clean",
"buildDate": "2021-12-07T18:09:57Z",
"goVersion": "go1.17.3",
"compiler": "gc",
"platform": "linux/amd64"
}
至此,实践完毕!
6.工作平面(Work)节点的移除&&添加
描述: k8s集群中工作节点的添加与移除,和master节点移除添加方法基本一致,不同之处在于加入集群的配置清单,你可以对照一下master节点与node节点加入集群时的配置清单。
此处,也不在累述直接快速上脚本。
步骤 01.在master节点上执行,将设weiyigeek-226节点置不可调度,即不分配新的资源到该节点上,并且驱逐pod到其他工作节点上。1
2
3# 温馨提示: drain命令会自动把node设置为不可调度,所以可以省略上面执行的cordon命令
kubectl cordon weiyigeek-226
kubectl drain weiyigeek-226 --delete-emptydir-data --force --ignore-daemonsets
步骤 02.在weiyigeek-226工作节点上执行重置节点命令1
2
3
4
5systemctl stop kubelet
echo y | kubeadm reset
sudo rm -rf $HOME/.kube; sudo rm -rf /var/lib/cni/ /etc/cni/ /var/lib/kubelet/*
ipvsadm --clear; iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
systemctl start kubelet; systemctl status kubelet
步骤 03.两种方式将工作节点加入K8S集群中。
1 | # 1.命令行模式 |
步骤 04.在master节点查看加入的工作的节点,此处设置其work标签。1
2
3
4kubectl label nodes weiyigeek-226 node-role.kubernetes.io/work=test
kubectl get nodes weiyigeek-226
# NAME STATUS ROLES AGE VERSION
# weiyigeek-226 Ready work 10m v1.23.1
至此,完毕!
n.实践所遇问题
问题1.在master节点上或者非master节点上执行kubectl命令报 The connection to the server localhost:8080 was refused -
错误
错误信息:1
2$ kubectl get cs
The connection to the server localhost:8080 was refused - did you specify the right host or port?
错误原因: 通常情况下是当前用户下没有~/.kube/config
或者不存在环境变量
解决办法:1
2
3
4
5
6
7
8
9
10# 方式1.复制 /etc/kubernetes 目录下 admin.conf 文件到 当前用户家目录下 /.kube/config
mkdir -p $HOME/.kube
echo 'yes' | sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 方式2.使用 KUBECONFIG 环境变量包含一个 kubeconfig 文件列表。
export KUBECONFIG=/etc/kubernetes/admin.conf:~/.kube/devops.kubeconfig
# 方式3.在命令执行时使用--kubeconfig参数指定配置文件
kubectl config --kubeconfig=/etc/kubernetes/admin.conf
问题2.在某master节点上calico-node准备状态一直为0/1并提示 connect: connection refusedcalico/node is not ready:
错误
错误信息:1
2
3
4
5
6
7
8
9
10
11
12$ kubectl get pod -n kube-system calico-node-v52sv
# NAME READY STATUS RESTARTS AGE
# calico-node-v52sv 0/1 Running 0 31m
$ kubectl describe pod -n kube-system calico-node-v52sv | grep "not ready"
# Warning Unhealthy 33m (x2 over 33m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
# calico/node is not ready: BIRD is not ready: BGP not established with 192.168.12.107,192.168.12.109,192.168.12.223,192.168.12.224,192.168.12.225,192.168.12.226
$ kubectl logs -f --tail 50 -n kube-system calico-node-v52sv | grep "interface"
# 2023-02-01 08:40:57.583 [INFO][69] monitor-addresses/startup.go 714: Using autodetected IPv4 address on interface br-b92e9270f33c: 172.22.0.1/16
# calico 对应的 Pod 启动失败,报错:
# Number of node(s) with BGP peering established = 0
错误原因: 由于该节点上安装了docker并创建了容器,Calico 选择了有问题的br网卡,导致 calico-node 的 Pod 不能启动。1
2
3
4
5# Calico 提供了 IP 自动检测的方法,默认是使用第一个有效网卡上的第一个有效的 IP 地址:
IP_AUTODETECTION_METHOD=first-found
# 节点上应该是出现了有问题的网卡,可以使用以下命令查看:
ip link | grep br
知识扩展: calico-node daemonset 默认的策略是获取第一个取到的网卡的 ip 作为 calico node 的ip, 由于集群中网卡名称不统一所以可能导致calico获取的网卡IP不对, 所以出现此种情况下就只能 IP_AUTODETECTION_METHOD 字段指定通配符网卡名称或者IP地址。
解决办法:1
2
3
4
5
6
7
8
9# 方法1.修改 yaml 配置清单中 IP 自动检测方法,在 spec.containers.env 下添加以下两行。(推荐)
- name: IP_AUTODETECTION_METHOD
value: "interface=ens.*" # ens 根据实际网卡开头配置
# 方法2.删除有问题的网卡(推荐),即指定网卡名称(br 开头的问题网卡)删除。
ifconfig br-b92e9270f33c down
# 方法3.假如环境不依赖docker情况下,可以卸载docker, 然后重启系统即可。
sudo apt-get autoremove docker docker-ce docker-engine docker.io
最后重新启动异常Pod即可:1
2
3
4kubectl delete pod -n kube-system calico-node-v52sv
kubectl get nodes weiyigeek-108
# NAME STATUS ROLES AGE VERSION
# weiyigeek-108 Ready control-plane,master 66m v1.23.1
问题3.在节点加入集群时报 bridge-nf-call-iptables does not exist
错误问题解决
错误信息:1
2
3
4[preflight] Some fatal errors occurred:
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
error execution phase preflight
解决办法:1
2
3
4
5
6
7
8
9
10# 配置
$ cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
$ cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
$ modprobe br_netfilter && sudo sysctl --system
问题4.在启动某个节点的kubelet报Unable to read config path" err="path does not exist, ignoring
错误解决办法
错误信息:1
2
3Jan 16 14:27:25 weiyigeek-226 kubelet[882231]: E0116 14:27:25.496423 882231 kubelet.go:2347] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns e>
Jan 16 14:27:26 weiyigeek-226 kubelet[882231]: E0116 14:27:26.482369 882231 file_linux.go:61] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
J
解决办法: 检查 /etc/kubernetes/manifests 目录是否存储及其权限1
mkdir -vp /etc/kubernetes/manifests
问题5.在节点加入到集群中时报the namespace "kube-system" error downloading the secret
错误解决办法
错误信息:1
2
3
4
5I0116 04:39:41.428788 184219 checks.go:246] validating the existence and emptiness of directory /var/lib/etcd
[preflight] Would pull the required images (like 'kubeadm config images pull')
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
secrets "kubeadm-certs" is forbidden: User "system:bootstrap:20w21w" cannot get resource "secrets" in API group "" in the namespace "kube-system"
error downloading the secret
解决办法: 将证书上载到kubeadm证书。1
2
3
4kubeadm init phase upload-certs --upload-certs
# [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
# [upload-certs] Using certificate key:
# 3a3d7610038c9d14edf377d92b9c6b44e049566ddd25b0e69bf571af58227ae7
你好看友,欢迎关注博主微信公众号哟! ❤
这将是我持续更新文章的动力源泉,谢谢支持!(๑′ᴗ‵๑)
温馨提示: 未解锁的用户不能粘贴复制文章内容哟!
方式1.请访问本博主的B站【WeiyiGeek】首页关注UP主,
将自动随机获取解锁验证码。
Method 2.Please visit 【My Twitter】. There is an article verification code in the homepage.
方式3.扫一扫下方二维码,关注本站官方公众号
回复:验证码
将获取解锁(有效期7天)本站所有技术文章哟!
@WeiyiGeek - 为了能到远方,脚下的每一步都不能少
欢迎各位志同道合的朋友一起学习交流,如文章有误请在下方留下您宝贵的经验知识,个人邮箱地址【master#weiyigeek.top】
或者个人公众号【WeiyiGeek】
联系我。
更多文章来源于【WeiyiGeek Blog - 为了能到远方,脚下的每一步都不能少】, 个人首页地址( https://weiyigeek.top )
专栏书写不易,如果您觉得这个专栏还不错的,请给这篇专栏 【点个赞、投个币、收个藏、关个注、转个发、赞个助】,这将对我的肯定,我将持续整理发布更多优质原创文章!。
最后更新时间:
文章原始路径:_posts/虚拟云容/云容器/Kubernetes/23-Kubernetes扩展学习实践笔记.md
转载注明出处,原文地址:https://blog.weiyigeek.top/2022/12-18-691.html
本站文章内容遵循 知识共享 署名 - 非商业性 - 相同方式共享 4.0 国际协议