[TOC]

0x00 前言简述

本章主要讲述,如果使用kubeadm进行安装配置K8S集群,并指定使用containerd作为容器运行时搭配使用的具体安装步骤,以及尽可能在案例中加入k8s集群常用组件及其操作配置。


0x01 环境准备

描述: 此Ubuntu操作系统已做安全加固和内核优化(SecOpsDev/Ubuntu-InitializeSecurity.sh at master · WeiyiGeek/SecOpsDev (github.com),如你的Linux未进行相应配置环境可能与读者有些许差异。

加固脚本: https://github.com/WeiyiGeek/SecOpsDev/blob/master/OS-%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/Linux/Ubuntu/Ubuntu-InitializeSecurity.sh

主机环境:

1
2
3
4
5
操作系统版本: Ubuntu 20.04.2 LTS
操作内核版本: 5.4.0-78-generic
主机名称与IP:
* k8s-master-1 10.10.107.220 2C 4G @# 控制平面节点
* k8s-node-1 10.10.107.221 2C 4G @# 工作节点


软件环境:

1
2
3
kubernetes -- v1.20.8
containerd -- 1.4.6
calico -- v3.18


节点环境
Tips: 以下命令需要再k8s-master-1和k8s-node-1主机上都要运行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# 1.节点hosts信息以及确保每个节点上 MAC 地址和 product_uuid 的唯一性 
tee -a /etc/hosts <<'EOF'
10.10.107.220 k8s-master-1
10.10.107.221 k8s-node-1
10.10.107.220 newcluster.k8s
EOF
# 你可以使用命令 ip link 或 ifconfig -a 来获取网络接口的 MAC 地址
ifconfig -a
# 可以使用 sudo cat /sys/class/dmi/id/product_uuid 命令对 product_uuid 校验
sudo cat /sys/class/dmi/id/product_uuid
# k8s-master-1: d0154d56-ffdd-697d-09a6-34c851710f09
# k8s-node-1: f98c4d56-9fb2-bc92-98be-2648c83eb7b5


# 2.系统时间时区同步设置
date -R
sudo ntpdate ntp.aliyun.com
# chronyc sources
sudo timedatectl set-timezone Asia/Shanghai
# sudo dpkg-reconfigure tzdata
sudo timedatectl set-local-rtc 0
timedatectl
# Local time: Tue 2021-07-06 11:28:54 CST
# Universal time: Tue 2021-07-06 03:28:54 UTC
# RTC time: Tue 2021-07-06 03:28:55
# Time zone: Asia/Shanghai (CST, +0800)
# System clock synchronized: yes
# NTP service: active
# RTC in local TZ: no


# 3.禁用防火墙与swap分区(新手一定要有这一步操作)
ufw disable && systemctl disable ufw && systemctl stop ufw
swapoff -a && sed -i 's|^/swap.img|#/swap.ing|g' /etc/fstab


# 4.内核相关参数调整
egrep -q "^(#)?vm.swappiness.*" /etc/sysctl.conf && sed -ri "s|^(#)?vm.swappiness.*|vm.swappiness = 0|g" /etc/sysctl.conf || echo "vm.swappiness = 0" >> /etc/sysctl.conf
egrep -q "^(#)?net.ipv4.ip_forward.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.ipv4.ip_forward.*|net.ipv4.ip_forward = 1|g" /etc/sysctl.conf || echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf
# - 允许 iptables 检查桥接流量
egrep -q "^(#)?net.bridge.bridge-nf-call-iptables.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.bridge.bridge-nf-call-iptables.*|net.bridge.bridge-nf-call-iptables = 1|g" /etc/sysctl.conf || echo "net.bridge.bridge-nf-call-iptables = 1" >> /etc/sysctl.conf
egrep -q "^(#)?net.bridge.bridge-nf-call-ip6tables.*" /etc/sysctl.conf && sed -ri "s|^(#)?net.bridge.bridge-nf-call-ip6tables.*|net.bridge.bridge-nf-call-ip6tables = 1|g" /etc/sysctl.conf || echo "net.bridge.bridge-nf-call-ip6tables = 1" >> /etc/sysctl.conf


# 5.ipvs负载均衡管理工具安装
apt install ipset ipvsadm -y


# 6.模块加载到内核并查看是否已经正确加载所需的内核模块
# 重启后永久生效
tee /etc/modules-load.d/k8s.conf <<'EOF'
# netfilter
br_netfilter
# containerd.
overlay
# ipvs
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
EOF

# 临时生效
mkdir -vp /etc/modules.d/
cat > /etc/modules.d/k8s.modules <<EOF
#!/bin/bash
# 允许 iptables 检查桥接流量
modprobe -- br_netfilter
# containerd.
modprobe -- overlay
# ipvs
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack
EOF
chmod 755 /etc/modules.d/k8s.modules && bash /etc/modules.d/k8s.modules && lsmod | grep -e ip_vs -e nf_conntrack
sysctl --system
reboot

0x02 安装流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# - 卸载原有Docker 以及 containerd
sudo apt-get remove docker docker-engine docker.io containerd runc
# - 更新apt程序包索引并安装程序包,以允许apt通过HTTPS使用存储库
sudo apt-get update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release

# - 添加Docker的官方GPG密钥:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

# - 使用以下命令设置稳定存储库。要添加nightly或test存储库,请在下面的命令中的单词stable后面添加单词nightly或test(或两者)。
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/container.list > /dev/null


# - 更新apt包索引,安装最新版本的containerd或进入下一步安装特定版本:
sudo apt-get update
# - 安装前可查看containerd.io可用的版本: 2021年7月6日 21:03:47 当前最新版本 1.4.6-1
apt-cache madison containerd.io && apt install -y containerd.io=1.4.6-1

# - 创建并修改 containerd 配置
mkdir -vp /etc/containerd/
containerd config default > /etc/containerd/config.toml
sed -i "s#k8s.gcr.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g" /etc/containerd/config.toml
sed -i '/containerd.runtimes.runc.options/a\ \ \ \ \ \ \ \ \ \ \ \ SystemdCgroup = true' /etc/containerd/config.toml
sed -i "s#https://registry-1.docker.io#https://xlx9erfu.mirror.aliyuncs.com#g" /etc/containerd/config.toml

# - 启动 Containerd
systemctl daemon-reload
systemctl enable containerd
systemctl restart containerd

# - 验证安装的 Containerd 状态及其版本
systemctl status containerd.service && ctr --version
# ctr containerd.io 1.4.6

温馨提示: 当前最新版本 1.5.11 【2022年4月18日 14:10:15】其配置文件发生改变, 需要如下初始化。

1
2
3
4
5
containerd config default > /etc/containerd/config.toml
sed -i "s#k8s.gcr.io/pause#registry.cn-hangzhou.aliyuncs.com/google_containers/pause#g" /etc/containerd/config.toml
sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml
sed -i '/registry.mirrors]/a\ \ \ \ \ \ \ \ [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]' /etc/containerd/config.toml
sed -i '/registry.mirrors."docker.io"]/a\ \ \ \ \ \ \ \ \ \ endpoint = ["https://xlx9erfu.mirror.aliyuncs.com"]' /etc/containerd/config.toml


  • Step 2.在确保 Containerd安装完成后,现在我们就可以来安装kubernetes相关工具,同样是在两台主机中执行。
    参考地址:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# - 为了提升下载速度我们还是使用使用阿里云的源进行安装。
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
tee /etc/apt/sources.list.d/kubernetes.list <<'EOF'
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
# 或者
# $ sudo add-apt-repository "deb http://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main"

# 更新 apt 包索引,并查看可用的kubernetes版本为了环境的稳定性
apt-get update
apt-cache madison kubeadm | head -n 8
# kubeadm | 1.21.2-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
# kubeadm | 1.21.1-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
# kubeadm | 1.21.0-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
# kubeadm | 1.20.8-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
# kubeadm | 1.20.7-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
# kubeadm | 1.20.6-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
# kubeadm | 1.20.5-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
# kubeadm | 1.20.4-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages

# - 安装 kubelet、kubeadm 和 kubectl,并锁定其版本(此处选择1.20.8-00版本)
sudo apt-get install -y kubelet=1.20.8-00 kubeadm=1.20.8-00 kubectl=1.20.8-00
sudo apt-mark hold kubelet kubeadm kubectl


  • Step 3.设置容器运行时为containerd,到这里为止上面所有的操作都需要在所有节点执行配置。
1
2
3
4
5
6
# containerd - 运行时设置
crictl config runtime-endpoint /run/containerd/containerd.sock

# 重载systemd守护进程并将 kubelet 设置成开机启动
systemctl daemon-reload
systemctl enable kubelet && systemctl start kubelet

Tips : kubelet 现在每隔几秒就会重启,因为它陷入了一个等待 kubeadm 指令的死循环。

1
2
systemctl status kubelet.service
# Active: activating (auto-restart) (Result: exit-code) since Tue 2021-07-06 21:38:26 CST; 9s ago


接下来在 master 节点配置 kubeadm 初始化文件,可以通过如下命令导出默认的初始化配置:

1
root@k8s-master-1:~/k8s# kubeadm config print init-defaults > kubeadm.yaml

然后根据我们自己的需求修改配置,比如修改 imageRepository 的值,kube-proxy 的模式为 ipvs 以及 criSocket 设置为 /run/containerd/containerd.sock,需要注意的是由于我们使用的containerd作为运行时,所以在初始化节点的时候需要指定cgroupDriver为systemd

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
cat > kubeadm.yaml <<'EOF'
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.10.107.220
bindPort: 6443
nodeRegistration:
criSocket: /run/containerd/containerd.sock
name: k8s-master-1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.20.8
controlPlaneEndpoint: "newcluster.k8s:6443"
networking:
dnsDomain: cluster.local
podSubnet: 172.16.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
EOF

# 如果想设置APISERVER名称需要再/etc/hosts添加与其对应IP地址
tee -a /etc/hosts <<'EOF'
10.10.107.220 newcluster.k8s
EOF

然后使用上面的配置文件进行初始化Master节点:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
kubeadm init --config=kubeadm.yaml
# [preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
# [certs] Using certificateDir folder "/etc/kubernetes/pki"
# [certs] Generating "ca" certificate and key
# [certs] Generating "apiserver" certificate and key
# [certs] apiserver serving cert is signed for DNS names [k8s-master-1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local newcluster.k8s] and IPs [10.96.0.1 10.10.107.220]
......
# [addons] Applied essential addon: CoreDNS
# [addons] Applied essential addon: kube-proxy

# Your Kubernetes control-plane has initialized successfully!
# To start using your cluster, you need to run the following as a regular user:
# - 拷贝 kubeconfig 文件到当前用户的根目录,完毕后即可采用kubectl进行查看管理k8s集群。
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf

# You should now deploy a pod network to the cluster.
# Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

# You can now join any number of control-plane nodes by copying certificate authorities
# and service account keys on each node and then running the following as root:
# 从Master节点执行
kubeadm join newcluster.k8s:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:d57743fa8657a959e6f96ea1b2d16ce32c315a2a6dc080a65a2b0fc8849bfbd4 \
--control-plane

# Then you can join any number of worker nodes by running the following on each as root:
# 工作节点执行
kubeadm join newcluster.k8s:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:d57743fa8657a959e6f96ea1b2d16ce32c315a2a6dc080a65a2b0fc8849bfbd4

Tips : 由于 kubeadm 把 kubelet 视为一个系统服务来管理,所以对基于 kubeadm 的安装, 我们推荐使用 systemd 驱动,不推荐 cgroupfs 驱动。参考地址: https://kubernetes.io/zh/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/


  • Step 5.添加节点需要完成初始化集群上面的配置和操作要提前做好,将 master 节点上面的 $HOME/.kube/config 文件拷贝到 node 节点对应的文件中,安装 kubeadm、kubelet、kubectl,然后在node节点上(k8s-node-1)执行上面初始化完成后提示的 join 命令即可:
1
2
kubeadm join newcluster.k8s:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:d57743fa8657a959e6f96ea1b2d16ce32c315a2a6dc080a65a2b0fc8849bfbd4


  • Step 6.通过在Master中执行kubectl get node命令可以看到是 NotReady 状态,这是因为还没有安装网络插件,现在应该在集群中部署一个pod网络,可以从kubernetes官方提供的各类组件中选择我们自己的网络插件,这里我们安装 calio (是一个安全的三层网络和网络策略驱动), 其Calico版本选择 https://docs.projectcalico.org/releases
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# - 安装 Pod 网络前节点的状态为NotReady
~/k8s# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master-1 NotReady control-plane,master 11m v1.20.8
k8s-node-1 NotReady <none> 4m19s v1.20.8

# - 从calico官方下载calico插件的部署清单。
~/k8s# wget https://docs.projectcalico.org/v3.18/manifests/calico.yaml

# - 自定义更改calico插件的地址池。
# The default IPv4 pool to create on startup if none exists. Pod IPs will be chosen from this range. Changing this value after installation will have no effect. This should fall within `--cluster-cidr`.
vim calico.yaml
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"

# - 部署 calico 网络插件
kubectl apply -f calico.yaml

# - 部署后查看kube-system中和网络相关的pod的运行状态,一般的状态回从Pending -> Init -> ContainerCreate -> Running过程转变。
~/k8s# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-77dd468cdb-2lchv 0/1 Pending 0 18s
calico-node-pz9qx 0/1 Init:0/3 0 18s
calico-node-zvst7 0/1 Init:0/3 0 18s
coredns-54d67798b7-78n5j 0/1 Pending 0 15m
coredns-54d67798b7-z9c8f 0/1 Pending 0 15m

# - 等待几分钟后calico插件相关Pod已成功运行
~/k8s# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-77dd468cdb-2lchv 1/1 Running 0 2m18s
calico-node-pz9qx 1/1 Running 0 2m18s
calico-node-zvst7 1/1 Running 0 2m18s
coredns-54d67798b7-78n5j 1/1 Running 0 17m
coredns-54d67798b7-z9c8f 1/1 Running 0 17m

# - 同时可以看到其Node节点的状态已变为Ready,至此calico网络插件安装部署已完毕。
~/k8s# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master-1 Ready control-plane,master 17m v1.20.8
k8s-node-1 Ready <none> 10m v1.20.8


  • Step 7.最后添加命令自动补齐不全功能
1
2
3
4
apt install -y bash-completion
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc

0x03 基础实践

实践目标: 在Kubernetes集群中运行Nginx容器并设置nginx-status查看,并尽可能使用k8s相关组件以及控制器的简单使用。

实践流程:

  • Step 1.创建一个weiyigeek的名称空间进行部署 Nginx Web 容器。
1
2
$ kubectl create namespace weiyigeek
# namespace/weiyigeek created


  • Step 2.准备Nginx.conf配置文件,我们可以采用ConfigMap来存储Nginx.conf配置文件。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# Yaml 方式创建 ConfigMap
tee nginx-conf.yaml <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-conf
namespace: weiyigeek
data:
nginx.conf: |
user nginx;
worker_processes auto;
worker_cpu_affinity 00000001 00000010 00000100 00001000;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
worker_rlimit_nofile 65536;

events {
worker_connections 65535;
accept_mutex on;
multi_accept on;
}

http {
include mime.types;
default_type application/octet-stream;
log_format access_json '{"@timestamp":"$time_iso8601",'
'"host":"$server_addr",'
'"clientip":"$remote_addr",'
'"size":$body_bytes_sent,'
'"responsetime":$request_time,'
'"upstreamtime":"$upstream_response_time",'
'"upstreamhost":"$upstream_addr",'
'"http_host":"$host",'
'"url":"$uri",'
'"domain":"$host",'
'"xff":"$http_x_forwarded_for",'
'"referer":"$http_referer",'
'"status":"$status"}';
access_log /var/log/nginx/access.log access_json;
client_max_body_size 50M;
keepalive_timeout 300;
fastcgi_buffers 8 128k;
fastcgi_buffer_size 128k;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
proxy_connect_timeout 90;
proxy_read_timeout 300;
proxy_send_timeout 300;
sendfile on;

server {
listen 80;
server_name localhost;
add_header Cache-Control no-cache;
location / {
root /usr/share/nginx/html;
index index.html index.htm;
}
location /status
{
stub_status on;
access_log off;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
# include /etc/nginx/conf.d/*.conf;
}
EOF


# 方式1.采用 Yaml 方式创建 ConfigMap 存储配置信息
$ kubectl apply -f nginx-conf.yaml
# configmap/nginx-conf created

# 方式2.采用 nginx.conf 配置创建 ConfigMap 存储配置信息
# kubectl create configmap nginx-conf --from-file=nginx.conf
# kubectl describe configmap nginx-conf

# 查看创建的 configmap
$ kubectl -n weiyigeek get configmap nginx-conf
# NAME DATA AGE
# nginx-conf 1 25m

# 设置的配置信息
$ kubectl describe -n weiyigeek configmap nginx-conf
# Name: nginx-conf
# Namespace: weiyigeek
# Labels: <none>
# Annotations: <none>

# Data
# ====
# nginx.conf:
# ----
# user nginx;
# worker_processes auto;
# ............

# events {
# ...................
# }

# http {
# ....................
# server {
# listen 80;
# server_name localhost;
# add_header Cache-Control no-cache;
# location / {
# root /usr/share/nginx/html;
# index index.html index.htm;
# }
# location /status
# {
# stub_status on;
# access_log off;
# }
# error_page 500 502 503 504 /50x.html;
# location = /50x.html {
# root html;
# }
# }
# }
# Events: <none>


  • Step 3.利用 Deployment 控制器编写部署 Nginx 资源清单。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    tee nginx-deployment.yaml <<'EOF'
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: web-deploy
    namespace: weiyigeek
    spec:
    replicas: 2
    selector:
    matchLabels:
    app: nginx-test
    template:
    metadata:
    labels:
    app: nginx-test
    spec:
    initContainers:
    - name: init-html
    image: busybox
    imagePullPolicy: IfNotPresent
    command: ['sh', '-c', "env;echo ConfigMap:${MSG}--HostName-${HOSTNAME} > /usr/share/nginx/html/index.html"]
    volumeMounts:
    - name: web
    mountPath: "/usr/share/nginx/html"
    securityContext:
    privileged: true
    containers:
    - name: nginx
    image: "nginx:latest"
    imagePullPolicy: IfNotPresent
    ports:
    - containerPort: 80
    volumeMounts:
    - name: nginx-conf
    mountPath: /etc/nginx/nginx.conf
    subPath: nginx.conf
    - name: web
    mountPath: "/usr/share/nginx/html"
    volumes:
    - name: nginx-conf
    configMap:
    name: nginx-conf
    items:
    - key: nginx.conf
    path: nginx.conf
    - name: web
    emptyDir: {}
    ---
    apiVersion: v1
    kind: Service
    metadata:
    name: nginx-service
    namespace: weiyigeek
    labels:
    app: nginx-test
    spec:
    type: NodePort
    ports:
    - name: nginx
    port: 80
    targetPort: 80
    nodePort: 30000
    protocol: TCP
    selector:
    app: nginx-test
    EOF

    # 利用 kubectl apply 部署 Nginx 控制器
    kubectl apply -f nginx-deployment.yaml
    # deployment.apps/web-deploy created
    # service/nginx-service created


  • Step 4.查看验证创建的 Nginx Pod 和 Services, 访问创建的 nginx Web容器。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
~/k8s/containerd# kubectl -n weiyigeek get pod
# NAME READY STATUS RESTARTS AGE
# web-deploy-99fbb677d-jbbwk 1/1 Running 0 48s
# web-deploy-99fbb677d-md5z9 1/1 Running 0 51s

~/k8s/containerd# kubectl -n weiyigeek get svc
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# nginx-service NodePort 10.105.172.104 <none> 80:30000/TCP 59m

# 可以看到以轮询的方式来访问nginx副本
~/k8s/containerd# curl http://10.105.172.104
# ConfigMap:--HostName-web-deploy-99fbb677d-md5z9
~/k8s/containerd# curl http://10.105.172.104
# ConfigMap:--HostName-web-deploy-99fbb677d-jbbwk

# 查看nginx的状态信息
~/k8s/containerd# curl http://10.105.172.104/status
# Active connections: 1
# server accepts handled requests
# 3 3 3
# Reading: 0 Writing: 1 Waiting: 0


  • Step 5.我们也可以采用 port-forward 子命令进行转发 web-deploy-99fbb677d-jbbwk Pod 到本机的 80 端口
1
2
3
4
kubectl -n weiyigeek port-forward --address 0.0.0.0 web-deploy-99fbb677d-jbbwk 80:80
# Forwarding from 0.0.0.0:80 -> 80
# Handling connection for 80
# Handling connection for 80
WeiyiGeek.port-forward

WeiyiGeek.port-forward

Nginx Status 详解:

  • Active connections - 活跃的连接数量
  • server accepts handled requests — 总共处理了7个连接 , 成功创建7次握手, 总共处理了36个请求。
  • Reading — 读取客户端的连接数。
  • Writing — 响应数据到客户端的数量。
  • Waiting — 开启 keep-alive 的情况下, 这个值等于 active - (reading+writing) 意思就是 Nginx 已经处理完正在等候下一次请求指令的驻留连接。


  • Step 6.至此,在使用containerd运行时的kubernetes集群中, 采用deployments控制器创建多个Nginx副本并访问。

0x04 入坑出坑

Containerd 相关问题

问题1.执行ctr命令时报 connection error 或者 :connect: permission denied 错误

  • 错误信息:
1
ctr: failed to dial "/run/containerd/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: permission denied"
  • 错误原因: 一是可能未启动containerd相关服务,二可能是该用户没有操作container的权限(一般会存在于低权限用户)
  • 解决办法:
1
2
3
4
5
6
# - 服务状态
systemctl status containerd.servic

# - 用户切换
su - root
ctr images ls


问题2.未指定名称空间(namespace)或者该操作对象不在该名称空间(namespace)中执行操作命令

  • 错误信息:
1
2
3
$ ctr container rm busybox
ERRO[0000] failed to delete container "busybox" error="container \"busybox\" in namespace \"default\": not found"
ctr: container "busybox" in namespace "default": not found
  • 错误原因: 由于 default 的默认名称空间内无busybox所以删除时报错.
  • 解决办法: 查看有那些名称空间ctr namespace ls


问题3.使用ctr拉取镜像时报INFO[0001] trying next host error="failed to authorize: failed to fetch anonymous token:错误

  • 错误信息:
1
2
3
4
5
$ ctr -n k8s.io images pull docker.io/library/busybox:latest
# docker.io/library/busybox:latest: resolving |--------------------------------------|
# elapsed: 1.1 s total: 0.0 B (0.0 B/s)
# INFO[0001] trying next host error="failed to authorize: failed to fetch anonymous token: Get https://auth.docker.io/token?scope=repository%3Alibrary%2Fbusybox%3Apull&service=registry.docker.io: read tcp 10.10.107.220:62946->107.23.149.57:443: read: connection reset by peer" host=registry-1.docker.io
# ctr: failed to resolve reference "docker.io/library/busybox:latest": failed to authorize: failed to fetch anonymous token: Get https://auth.docker.io/token?scope=repository%3Alibrary%2Fbusybox%3Apull&service=registry.docker.io: read tcp 10.10.107.220:62946->107.23.149.57:443: read: connection reset by peer
  • 解决办法: 不知道是ctr的Bug还是何处配置不对采用ctr images无法拉取而使用ctr image 则可以正常拉取。
1
2
3
4
5
6
7
8
9
10
# ctr command
# images, image, i manage images

ctr -n k8s.io image pull docker.io/library/busybox:latest
# docker.io/library/busybox:latest: resolved |++++++++++++++++++++++++++++++++++++++|
# index-sha256:930490f97e5b921535c153e0e7110d251134cc4b72bbb8133c6a5065cc68580d: done |++++++++++++++++++++++++++++++++++++++|
# manifest-sha256:dca71257cd2e72840a21f0323234bb2e33fea6d949fa0f21c5102146f583486b: done |++++++++++++++++++++++++++++++++++++++|
# layer-sha256:b71f96345d44b237decc0c2d6c2f9ad0d17fde83dad7579608f1f0764d9686f2: done |++++++++++++++++++++++++++++++++++++++|
# config-sha256:69593048aa3acfee0f75f20b77acb549de2472063053f6730c4091b53f2dfb02: done |++++++++++++++++++++++++++++++++++++++|
# elapsed: 2.5 s total: 0.0 B (0.0 B/s)


Kubernetes 相关问题(基于Containerd运行时)

1.Kubernetes 高可用集群搭建时错误信息排查方式(思路具体情况还需具体分析)

  • 错误信息: Unfortunately, an error has occurred: timed out waiting for the condition
  • 错误排查:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# - 此错误可能由以下原因引起:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

# - 如果您使用的是systemd的系统,则可以尝试使用以下命令排除错误:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'

# - 此外,当容器运行时启动时,控制平面组件可能已崩溃或退出。要进行故障排除,请使用首选容器运行时CLI列出所有容器。
- 'crictl --runtime-endpoint /run/containerd/containerd.sock ps -a | grep kube | grep -v pause'

# - 找到发生故障的容器后,可以使用以下工具检查其日志:
- 'crictl --runtime-endpoint /run/containerd/containerd.sock logs CONTAINERID'

# - 集群 pod 排错常用命令指南
kubectl get pod <pod-name> -o yaml # 查看 Pod 的配置是否正确
kubectl describe pod <pod-name> # 查看 Pod 的事件
kubectl logs <pod-name> [-c <container-name>] # 查看容器日志
  • 实际案例:
1
2
3
4
5
6
7
journalctl -xeu kubelet
# Jul 06 22:15:29 k8s-master-1 kubelet[11333]: E0706 22:15:29.108499 11333 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
# Jul 06 22:15:29 k8s-master-1 kubelet[11333]: E0706 22:15:29.206203 11333 kubelet.go:2263] node "k8s-master-1" not found
# Jul 06 22:15:29 k8s-master-1 kubelet[11333]: E0706 22:15:29.208552 11333 remote_runtime.go:116] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to get sandbox image "k8s.gcr.io/pause:3.2": failed to pull image ">
# Jul 06 22:15:29 k8s-master-1 kubelet[11333]: E0706 22:15:29.208609 11333 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "kube-controller-manager-k8s-master-1_kube-system(d1d11a3cb97124022c9d85b070508dfa)" failed: rpc error: code = Unknown de>
# Jul 06 22:15:29 k8s-master-1 kubelet[11333]: E0706 22:15:29.208621 11333 kuberuntime_manager.go:755] createPodSandbox for pod "kube-controller-manager-k8s-master-1_kube-system(d1d11a3cb97124022c9d85b070508dfa)" failed: rpc error: code = Unknown d>
# Jul 06 22:15:29 k8s-master-1 kubelet[11333]: E0706 22:15:29.208682 11333 pod_workers.go:191] Error syncing pod d1d11a3cb97124022c9d85b070508dfa ("kube-controller-manager-k8s-master-1_kube-system(d1d11a3cb97124022c9d85b070508dfa)"), skipping: fail>
  • 错误原因: 由于从k8s.gcr.io/pause:3.2仓库无法拉取pause镜像导致kubernetes集群没办法创建相应的PodSandbox而最终使得集群安装失败。
  • 解决办法: 修改 containerd 配置中pause镜像拉取的仓库地址(此处设置为阿里云)
1
sed -i "s#k8s.gcr.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g"  /etc/containerd/config.toml


2.使用crictl工具拉取镜像时报FATA[0000] pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image错误。

  • 错误信息:
1
2
$ crictl pull docker.io/pollyduan/ingress-nginx-controller:v0.47.0
# FATA[0000] pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/pollyduan/ingress-nginx-controller:v0.47.0": failed to resolve reference "docker.io/pollyduan/ingress-nginx-controller:v0.47.0": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
  • 错误原因: 未在 /etc/crictl.yaml 中设置 image-endpoint 对象, 其次是在 /etc/containerd/config.toml中设置docker.io镜像endpoint有误。
  • 解决办法: 设置 image-endpoint 端点。
1
2
3
4
5
6
7
8
9
10
11
12
$ tee /etc/crictl.yaml <<'EOF'
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: "unix:///run/containerd/containerd.sock"
timeout: 0
debug: false
EOF

$ grep -C 3 'registry.mirrors."docker.io" ' /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".registry]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://xlx9erfu.mirror.aliyuncs.com"]


3.创建Pod时报Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container.错误

  • 错误信息:
1
Warning  FailedCreatePodSandBox  89s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "1c97ad2710e2939c0591477f9d6dde8e0d7d31b3fbc138a7fa38aaa657566a9a" network for pod "coredns-7f89b7bc75-qg924": networkPlugin cni failed to set up pod "coredns-7f89b7bc75-qg924_kube-system" network: error getting ClusterInformation: Get "https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), failed to clean up sandbox container "1c97ad2710e2939c0591477f9d6dde8e0d7d31b3fbc138a7fa38aaa657566a9a" network for pod "coredns-7f89b7bc75-qg924": networkPlugin cni failed to teardown pod "coredns-7f89b7bc75-qg924_kube-system" network: error getting ClusterInformation: Get "https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]
  • 表现状态: coredns无法运行
    1
    2
    3
    4
    $ kubectl get pods -n kube-system
    NAME READY STATUS RESTARTS AGE
    coredns-7f89b7bc75-jzs26 0/1 ContainerCreating 0 63s
    coredns-7f89b7bc75-qg924 0/1 ContainerCreating 0 63s
  • 解决办法: 更改calico.yaml
1
2
3
4
5
6
7
8
9
10
11
12
$ vim calico.yaml
# Cluster type to identify the deployment type
- name: CLUSTER_TYPE
value: "k8s,bgp"
# 下方熙增新增
- name: IP_AUTODETECTION_METHOD
value: "interface=ens192"
# ens192为本地网卡名字

$ kubectl apply -f calico.yaml

$ kubectl get pods -n kube-system


4.Kubeadm 初始化 kubernetes 失败报[ERROR SystemVerification]: unexpected kernel config: CONFIG_CGROUP_PIDS错误

  • 错误信息:
1
2
3
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR SystemVerification]: unexpected kernel config: CONFIG_CGROUP_PIDS
[ERROR SystemVerification]: missing required cgroups: pids
  • 解决办法: 首先你要在 cat /boot/config-$(uname -r) | grep CGROUP 这个文件里面加CONFIG_CGROUP_PIDS=y
    然后你再升级一下内核就可以了。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ cat /boot/config-`uname -r` | grep CGROUP
CONFIG_CGROUPS=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
# CONFIG_BLK_CGROUP_IOLATENCY is not set
CONFIG_BLK_CGROUP_IOCOST=y
# CONFIG_BFQ_CGROUP_DEBUG is not set
CONFIG_NETFILTER_XT_MATCH_CGROUP=m
CONFIG_NET_CLS_CGROUP=m
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y


5.使用Kubectl查看工作节点时发现节点状态为NotReady,并报出Network plugin returns error: cni plugin not initialized错误解决办法.

  • 错误信息:
    1
    2
    3
    4
    5
    $ kubectl describe node master-1
    Ready False Fri, 22 Apr 2022 14:37:11 +0800 Fri, 22 Apr 2022 11:38:16 +0800 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

    $ journalctl -xeu kubelet
    Apr 22 14:41:34 master-1 kubelet[412039]: E0422 14:41:34.064340 412039 kubelet.go:2347] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
  • 问题原因: 由于我重新初始化集群节点后,删除了 /etc/cni/net.d/ 目录并未重启containerd服务。
  • 解决办法:
    1
    2
    3
    4
    # 重启 containerd
    $ systemctl restart containerd
    $ ls /etc/cni/net.d/
    10-calico.conflist calico-kubeconfig