[TOC]

0x00 如何优雅将K8S资源清单中的元数据metadata,通过环境变量注入到Pod容器?

描述: Kubernetes 自从1.7开始,可以在 pod 的container 内获取pod的spec,metadata 等源数据信息,实际上是使用 downward API 通过环境变量把自身的信息呈现给 Pod 中运行的容器。

pod一共有三种类型容器:
• Infrastructure Container:基础容器,维护整个Pod网络空间 。
• InitContainers:初始化容器,先于业务容器开始执行 。
• Containers:业务容器,如果有多个通常是并行启动 。

需求: 假如你有一个根据主机名词尾缀进行选择要使用GPU资源序号,或者是获取资源控制器生成的Pod相关IP或标签信息,此时都可以使用注入环境变量的方式(希望对大家有帮助)

目标:通过使用 env 和 fieldRef,将 k8s 的源数据和容器字段变成环境变量注入到了容器中。

当前资源控制器env对象 (valueFrom.fieldRef.fieldPath) 支持的注入字段信息如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Pod 名称(主机名称)
metadata.name
# 名称空间
metadata.namespace
# 标签
metadata.labels['']
# 注释
metadata.annotations['']
# 节点名词
spec.nodeName
# 服务账户名词
spec.serviceAccountName
# 宿主机IP地址信息
status.hostIP
# Pod IPV4地址信息
status.podIP
# 获取 Pod 的 IPv4 和 IPv6 地址
status.podIPs

示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
apiVersion: v1
kind: Pod
metadata:
name: dapi-envars-fieldref
namespace: devtest
labels:
app: downwardAPI
annotations:
demo: dapi-envars
spec:
containers:
- name: test-container
image: busybox:latest
command: [ "sh", "-c"]
args:
- while true; do
echo -en '\n';
printenv MY_NODE_NAME MY_POD_NAME MY_POD_NAMESPACE;
printenv MY_POD_IP MY_POD_IPS MY_POD_SERVICE_ACCOUNT;
printenv MY_POD_LABELS_APP MY_POD_ANNOTATIONS_DEMO;
printenv MY_CPU_REQUEST MY_CPU_LIMIT;
printenv MY_MEM_REQUEST MY_MEM_LIMIT;
sleep 10;
done;
resources:
requests:
memory: "32Mi"
cpu: "125m"
limits:
memory: "64Mi"
cpu: "250m"
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: MY_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: MY_POD_IPS
valueFrom:
fieldRef:
fieldPath: status.podIPs
- name: MY_POD_SERVICE_ACCOUNT
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: MY_POD_LABELS_APP
valueFrom:
fieldRef:
fieldPath: metadata.labels['app']
- name: MY_POD_ANNOTATIONS_DEMO
valueFrom:
fieldRef:
fieldPath: metadata.annotations['demo']
- name: MY_CPU_REQUEST
valueFrom:
resourceFieldRef:
containerName: test-container
resource: requests.cpu
- name: MY_CPU_LIMIT
valueFrom:
resourceFieldRef:
containerName: test-container
resource: limits.cpu
- name: MY_MEM_REQUEST
valueFrom:
resourceFieldRef:
containerName: test-container
resource: requests.memory
- name: MY_MEM_LIMIT
valueFrom:
resourceFieldRef:
containerName: test-container
resource: limits.memory
restartPolicy: Never

运行Pod后查看注入的环境变量:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
~$ kubectl apply -f test-container.yaml
pod/dapi-envars-fieldref created

~$ kubectl logs -n devtest dapi-envars-fieldref
dapi-envars-fieldref
devtest
10.66.182.247
10.66.182.247
default
downwardAPI
dapi-envars
1
1
33554432
67108864

~$ kubectl exec -n devtest dapi-envars-fieldref -- printenv
HOSTNAME=dapi-envars-fieldref
MY_MEM_REQUEST=33554432
HOST_IP=192.168.12.226
MY_POD_NAME=dapi-envars-fieldref
MY_POD_NAMESPACE=devtest
MY_POD_IP=10.66.182.247
MY_POD_IPS=10.66.182.247
MY_POD_SERVICE_ACCOUNT=default
MY_POD_ANNOTATIONS_DEMO=dapi-envars
NODE_NAME=weiyigeek-226
MY_POD_LABELS_APP=downwardAPI
MY_CPU_REQUEST=1
MY_CPU_LIMIT=1
MY_MEM_LIMIT=67108864
....


实践示例: 根据Pod名称截取最后一个-字符后的数字来选择该Pod调用的GPU序号(即使用那一块gpu)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
apiVersion: v1
kind: Service
metadata:
name: healthcode
namespace: devtest
labels:
app: healthcode
use: gpu
annotations:
author: weiyigeek
blog: blog.weiyigeek.top
spec:
type: NodePort
ports:
- name: http
port: 8000
targetPort: 8000
protocol: TCP
nodePort: 30000
selector:
app: healthcode
use: gpu
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: healthcode-0
namespace: devtest
labels:
app: healthcode
spec:
replicas: 6
selector:
matchLabels:
app: healthcode
use: gpu
serviceName: "healthcode"
template:
metadata:
labels:
app: healthcode
use: gpu
spec:
volumes:
- name: workdir
emptyDir: {}
- name: workspace
hostPath:
path: /storage/webapp/project/MultiTravelcodeocr
type: DirectoryOrCreate
- name: model
hostPath:
path: /storage/webapp/project/.EasyOCR
type: DirectoryOrCreate
- name: img
hostPath:
path: /storage/webapp/project/upfile
type: DirectoryOrCreate
initContainers:
- name: init # 使用初始化容器进行相应处理
image: busybox:1.35.0
imagePullPolicy: IfNotPresent
command: # 设置 Pod 使用的 GPU 显卡序号
- /bin/sh
- -c
- "echo export CUDA_VISIBLE_DEVICES=${GPU_DEVICES##*-}> /app/${GPU_DEVICES}"
env:
- name: GPU_DEVICES
valueFrom:
fieldRef:
fieldPath: metadata.name
# - name: CUDA_VISIBLE_DEVICES # 此种方式不行,env不能直接截取变量
# value: ${GPU_DEVICES##*-}
volumeMounts:
- name: workdir
mountPath: /app/
containers:
- name: app
image: harbor.weiyigeek.top/python/easyocr-healthcode:v1.6.2
command: ['/bin/bash', '-c','source /app/${HOSTNAME}; echo ${CUDA_VISIBLE_DEVICES}; python ./setup.py --imgdir=/imgs --logdir=
/logs --gpu=True'] # 加载进行环境变量之中,实际上我们也可以在app容器直接在source命令前echo export CUDA_VISIBLE_DEVICES=${HOSTNAME##*-}> /app/${HOSTNAME}使用搞定,总之条条大路通罗马,学习就是思路。
imagePullPolicy: IfNotPresent
resources:
limits: {}
# cpu: "8"
# memory: 8Gi
volumeMounts:
- name: workdir
mountPath: /app/
- name: workspace
mountPath: /workspace
- name: model
mountPath: /root/.EasyOCR
- name: img
mountPath: /imgs
ports:
- name: http
protocol: TCP
containerPort: 8000

执行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# 一个6个pod,每个pod使用对应的GPU,例如0-0则使用0号CPU,0-1则使用1号CPU 
$ kubectl get pod -n devtest
NAME READY STATUS RESTARTS AGE
healthcode-0-5 1/1 Running 0 15h
healthcode-0-4 1/1 Running 0 15h
healthcode-0-3 1/1 Running 0 15h
healthcode-0-2 1/1 Running 0 15h
healthcode-0-1 1/1 Running 0 15h
healthcode-0-0 1/1 Running 0 15h

# 查看 GPU 服务器使用情况
$ nvidia-smi
Fri Dec 9 10:08:32 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA Tesla V1... Off | 00000000:1B:00.0 Off | 0 |
| N/A 41C P0 36W / 250W | 6697MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA Tesla V1... Off | 00000000:1D:00.0 Off | 0 |
| N/A 51C P0 53W / 250W | 9489MiB / 32510MiB | 14% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA Tesla V1... Off | 00000000:3D:00.0 Off | 0 |
| N/A 53C P0 42W / 250W | 5611MiB / 32510MiB | 20% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA Tesla V1... Off | 00000000:3F:00.0 Off | 0 |
| N/A 37C P0 35W / 250W | 10555MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA Tesla V1... Off | 00000000:40:00.0 Off | 0 |
| N/A 45C P0 51W / 250W | 5837MiB / 32510MiB | 5% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA Tesla V1... Off | 00000000:41:00.0 Off | 0 |
| N/A 37C P0 37W / 250W | 10483MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 167660 C python 6693MiB |
| 1 N/A N/A 166790 C python 9485MiB |
| 2 N/A N/A 165941 C python 5607MiB |
| 3 N/A N/A 165032 C python 10551MiB |
| 4 N/A N/A 164226 C python 5833MiB |
| 5 N/A N/A 163344 C python 10479MiB |
+-----------------------------------------------------------------------------+

参考文章: