[TOC]

0x00 前言简述及环境准备

描述: 本章主要讲解和实践Prometheus在企业中的应用场景的复现,采用了docker-compose的资源清单进行快速构建prometheus_server、prometheus_pushgateway、prometheus_alertmanager、grafana等环境。

主要实现目标(功能):

  • 0) 实现Windows主机的监控和展示
  • 1) 实现MySQL与Redis数据库的监控和展示
  • 2) 实现外部kubernetes集群的监控和展示

主机说明:

1
2
3
4
5
6
7
8
9
10
11
12
13
# Kubernetes cluster 0: weiyigeek-lb-vip.k8s (正式环境)
192.168.12.107 - master
192.168.12.108 - master
192.168.12.109 - master
192.168.12.223 - work
192.168.12.224 - work
192.168.12.225 - work

# Kubernetes cluster 1: k8s-test.weiyigeek (测试环境-单master节点)
192.168.12.111

# Kubernetes cluster 2: 192.168.12.226(开发环境-单master节点)
192.168.12.226


环境说明
描述: 上述环境中都安装了docker运行环境,并在192.168.12.107主机中安装了docker-compose软件,下面进行的配置循序渐进的进行添加。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 下面表示在主机下进行基础环境的安装部署的组件(node_export与cAdivsor的安装配置参考“1.Prometheus(普罗米修斯)容器集群监控入门.md”,此处不在重新累述)
192.168.12.107
- prometheus_server: 30090
- prometheus_pushgateway: 30091
- prometheus_alertmanager: 30093
- grafana: 9091

192.168.12.108~109
192.168.12.223~225
- node_exporter: 9091

192.168.12.111
- cAdivsor: 9100

# 此处暂时不进行配置后续利用prometheus监控第三方k8s集群时使用
192.168.12.226
- kubernetes Api Server: 6443

目录结构一览:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ tree -L 5
.
├── docker-compose.yml # 资源清单
├── grafana # grafana UI 展示: 针对于数据持久化(插件、dashboard等)
│   └── data
└── prometheus # Prometheus 监控相关配置以及数据持久化目录
├── conf
│   ├── alertmanager.yaml # 报警发送器配置
│   ├── conf.d
│   │   ├── discovery # 自动化发现相关配置文件
│   │   │   └── k8s_nodes.yaml
│   │   ├── rules # 报警规则
│   │   │   └── alert.rules
│   │   └── auth # k8s 以及 相关认证使用
│   │   ├── k8s_client.crt
│   │   ├── k8s_client.key
│   │   └── k8s_token
│   └── prometheus.yml
└── data


环境快速准备

  • 0.目录结构快速生成

    1
    2
    3
    mkdir -vp /nfsdisk-31/monitor/prometheus/conf/conf.d/{discovery,rules,auth}
    mkdir -vp /nfsdisk-31/monitor/prometheus/data
    mkdir -vp /nfsdisk-31/monitor/grafana/date
  • 1.prometheus.yaml 主配置文件:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    tee prometheus.yaml <<'EOF'
    global:
    scrape_interval: 2m
    scrape_timeout: 10s
    evaluation_interval: 1m
    external_labels:
    monitor: 'prom-demo'
    scrape_configs:
    - job_name: 'prom-Server'
    static_configs:
    - targets: ['localhost:9090']
    - job_name: 'cAdvisor'
    static_configs:
    - targets: ['192.168.12.111:9100']
    - job_name: 'prom-Host'
    file_sd_configs:
    - files:
    - /etc/prometheus/conf.d/discovery/k8s_nodes.yaml
    refresh_interval: 1m

    rule_files:
    - /etc/prometheus/conf.d/rules/*.rules

    alerting:
    alertmanagers:
    - scheme: http
    static_configs:
    - targets:
    - '192.168.12.107:30093'
    EOF
  • 2.alert.rules 配置文件:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    tee alert.rules <<'EOF'
    groups:
    - name: node-normal
    rules:
    - alert: service_down
    expr: up == 0
    for: 2m
    labels:
    severity: 1
    team: node
    annotations:
    summary: "主机 {{ $labels.instance }} 监控服务已停止运行超过 15s!"
    - alert: high_load
    expr: node_load1 > 0.7
    for: 5m
    labels:
    severity: 1
    team: node
    annotations:
    summary: "主机 {{ $labels.instance }} 高负载大于0.7以上运行超过 5m!"
    EOF
  • 3.k8s_nodes.yaml 自动发现file_sd_configs配置文件。

    1
    2
    3
    4
    5
    6
    tee k8s_nodes.yaml <<'EOF'
    - targets: [ '192.168.12.107:9100','192.168.12.108:9100','192.168.12.109:9100' ]
    labels: {'env': 'prod','cluster': 'weiyigeek-lb-vip.k8s','nodeType': 'master'}
    - targets: [ '192.168.12.223:9100','192.168.12.224:9100','192.168.12.225:9100' ]
    labels: {'env': 'prod','cluster': 'weiyigeek-lb-vip.k8s','nodeType': 'work'}
    EOF
  • 4.alertmanager.yaml 邮箱报警发送配置

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    tee alertmanager.yaml <<'EOF'
    global:
    resolve_timeout: 5m
    smtp_from: 'monitor@weiyigeek.top'
    smtp_smarthost: 'smtp.exmail.qq.com:465'
    smtp_auth_username: 'monitor@weiyigeek.top'
    smtp_auth_password: xxxxxxxxxxx'
    smtp_require_tls: false
    # smtp_hello: 'qq.com'
    route:
    group_by: ['alertname']
    group_wait: 30s
    group_interval: 1m
    repeat_interval: 10m
    receiver: 'default-email'
    receivers:
    - name: 'default-email'
    email_configs:
    - to: 'master@weiyigeek.top'
    send_resolved: true
    # inhibit_rules:
    # - source_match:
    # severity: 'critical'
    # target_match:
    # severity: 'warning'
    # equal: ['alertname', 'instance']
    EOF
    # Tips : 可以采用amtool工具校验该yml文件是否无误`./amtool check-config alertmanager.yml`
  • 5.docker-compose.yml 资源清单内容:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    # Desc:  prometheus / pushgateway / alertmanager / grafana 环境搭建
    # author: WeiyiGeek
    # email: master@weiyigeek.top

    # 创建一个名称为monitor的桥接网络
    $ docker network create monitor --driver bridge

    tee docker-compose.yml <<'EOF'
    version: '3.2'
    services:
    prometheus:
    image: prom/prometheus:v2.26.0
    container_name: prometheus_server
    environment:
    TZ: Asia/Shanghai
    volumes:
    - /nfsdisk-31/monitor/prometheus/conf/prometheus.yaml:/etc/prometheus/prometheus.yaml
    - /nfsdisk-31/monitor/prometheus/conf/conf.d:/etc/prometheus/conf.d
    - /nfsdisk-31/monitor/prometheus/data:/prometheus/data
    - /etc/localtime:/etc/localtime
    command:
    - '--config.file=/etc/prometheus/prometheus.yaml'
    - '--storage.tsdb.path=/prometheus/data'
    - '--web.enable-admin-api'
    - '--web.enable-lifecycle'
    ports:
    - '30090:9090'
    restart: always
    networks:
    - monitor

    pushgateway:
    image: prom/pushgateway
    container_name: prometheus_pushgateway
    environment:
    TZ: Asia/Shanghai
    volumes:
    - /etc/localtime:/etc/localtime
    ports:
    - '30091:9091'
    restart: always
    networks:
    - monitor

    alertmanager:
    image: prom/alertmanager:v0.21.0
    container_name: prometheus_alertmanager
    environment:
    TZ: Asia/Shanghai
    volumes:
    - /nfsdisk-31/monitor/prometheus/conf/alertmanager.yaml:/etc/alertmanager.yaml
    - /etc/localtime:/etc/localtime
    # - /nfsdisk-31/monitor/prometheus/alertmanager:/alertmanager
    command:
    - '--config.file=/etc/alertmanager.yaml'
    - '--storage.path=/alertmanager'
    ports:
    - '30093:9093'
    restart: always
    networks:
    - monitor

    grafana:
    image: grafana/grafana:7.5.5
    container_name: grafana
    user: "472"
    environment:
    - TZ=Asia/Shanghai
    - GF_SECURITY_ADMIN_PASSWORD=weiyigeek
    volumes:
    - /nfsdisk-31/monitor/grafana/data:/var/lib/grafana
    - /etc/localtime:/etc/localtime
    ports:
    - '30000:3000'
    restart: always
    networks:
    - monitor
    dns:
    - 223.6.6.6
    - 192.168.12.254

    networks:
    monitor:
    external: true
    EOF

    # 验证配置
    docker-compose config

    # 创建和后台启动容器
    docker-compose up -d
  • 6.环境验证: 访问搭建的prometheus server 服务地址http://192.168.12.107:30090/service-discovery进行查询以及监控节点的查看。

WeiyiGeek.基础环境验证

WeiyiGeek.基础环境验证


0x01 实现Windows主机的监控和展示

描述: 我们采用 Prometheus 监控进行 Windows 机器,我们也要像在 node_exporter 二进制可执行软件安装运行在Linux系统上, 在Windows系统上安装 windows_exporter 操作流程如下:

  • Step 1.下载安装windows_exporter可执行软件其releases地址 我们可以选择exe或者msi安装方式;

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    # exe与msi下载
    windows_exporter-0.16.0-amd64.exe
    windows_exporter-0.16.0-amd64.msi

    # msi - 安装执行调用示例:
    msiexec /i <path-to-msi-file> ENABLED_COLLECTORS=os,iis LISTEN_PORT=5000
    # 带有自定义查询的示例服务收集器
    msiexec /i <path-to-msi-file> ENABLED_COLLECTORS=os,service --% EXTRA_FLAGS="--collector.service.services-where ""Name LIKE 'sql%'"""
    # 在某些旧版本的Windows上,可能需要用双引号将参数值括起来,以正确解析install命令:
    msiexec /i C:\Users\Administrator\Downloads\windows_exporter.msi ENABLED_COLLECTORS="ad,iis,logon,memory,process,tcp,thermalzone" TEXTFILE_DIR="C:\custom_metrics\"
    1
    2
    3
    4
    5
    6
    7
    # exe - Examples
    # 仅启用service collector并指定自定义查询
    .\windows_exporter.exe --collectors.enabled "service" --collector.service.services-where "Name='windows_exporter'"
    # 仅启用process collector并指定自定义查询
    .\windows_exporter.exe --collectors.enabled "process" --collector.process.whitelist="firefox.+"
    # 将[defaults]与--collectors.enabled参数一起使用,该参数将与所有默认收集器一起展开。
    .\windows_exporter.exe --collectors.enabled "[defaults],process,container"
  • Step 2.使用配置文件config.yml运行后将在9182端口启用监听我们可以访问该http://127.0.0.1:9182/metricsurl查看metrics.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    .\windows_exporter-0.16.0-amd64.exe --config.file=config.yml

    # config.yml
    # 默认启用 Collectors 收集器以及额外添加的收集器
    collectors:
    enabled: cpu,cs,logical_disk,net,os,system,service,logon,process,tcp
    collector:
    service:
    services-where: Name='windows_exporter'
    log:
    level: debug
    scrape:
    timeout-margin: 0.5
    telemetry:
    addr: ":9182"
    path: /metrics
    max-requests: 5

    # 防火墙规则调整(指定远程连接的地址以及本地开放端口)
    New-NetFirewallRule -Name prom-windows_exporter -Direction Inbound -DisplayName 'windows_exporter' -RemoteAddress 192.168.12.107 -LocalPort 9182 -Protocol 'TCP'
    # Name : prom-windows_exporter
    # DisplayName : windows_exporter
    # Description :
    # DisplayGroup :
    # Group :
    # Enabled : True
    # Profile : Any
    # Platform : {}
    # Direction : Inbound
    # Action : Allow
    # EdgeTraversalPolicy : Block
    # LooseSourceMapping : False
    # LocalOnlyMapping : False
    # Owner :
    # PrimaryStatus : OK
    # Status : 已从存储区成功分析规则。 (65536)
    # EnforcementStatus : NotApplicable
    # PolicyStoreSource : PersistentStore
    # PolicyStoreSourceType : Local
    # New-NetFirewallRule -Name powershell-remote-udp -Direction Inbound -DisplayName 'PowerShell远程连接 UDP' -LocalPort 9182 -Protocol 'UDP' # http 协议走的是tcp所以不用添加UDP协议的防火墙策略
WeiyiGeek.windows-metrics

WeiyiGeek.windows-metrics

  • Step 3.添加到prometheus.yaml主配置文件之中进行重新加载配置即可发现该机器,如下图所示。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    # prometheus.yaml
    scrape_configs:
    - job_name: 'windows-exporter'
    file_sd_configs:
    - files:
    - /etc/prometheus/conf.d/discovery/win_nodes.yaml
    refresh_interval: 1m


    # vi win_nodes.yaml
    - targets: [ '192.168.12.240:9182' ]
    labels: {'env': 'temp','osType': 'windows','nodeType': 'master'}

    # PromQL
    windows_os_info or windows_exporter_build_info{instance='192.168.12.240:9182'} or windows_logical_disk_free_bytes{volume="C:"} / (1024^3) or windows_net_current_bandwidth
    WeiyiGeek.windows_exporter_promQL

    WeiyiGeek.windows_exporter_promQL

WeiyiGeek.Grafana_windows_export

WeiyiGeek.Grafana_windows_export

  • Step 5.访问Grafana的Dashbord查看windows采集的数据展示验证
WeiyiGeek.

WeiyiGeek.


0x02 实现MySQL与Redis数据库的监控和展示

描述: 我们可以针对于MySQL以及Redis进行数据库的监控配置利用到的软件是mysql_exporter(https://github.com/prometheus/mysqld_exporter)和`redis_exporter`(https://github.com/oliver006/redis_exporter/)。

  • Step 1.准备测试的MySQL与Redis的数据库然后利用docker容器进行监控指标的采集;

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    # - mysqld-exporter : export DATA_SOURCE_NAME='user:password@(hostname:3306)/'
    # (1) 在执行的MySQL数据库中添加监控用户
    CREATE USER 'exporter'@'%' IDENTIFIED BY 'XXXXXXXX' WITH MAX_USER_CONNECTIONS 3;
    GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';
    # (2) 运行 prom/mysqld-exporter 容器
    docker run -d -p 9104:9104 --name mysqld-exporter -e DATA_SOURCE_NAME="exporter:XXXXXXXX@(192.168.12.185:3306)/" prom/mysqld-exporter

    # - redis_exporter: Supports Redis 2.x, 3.x, 4.x, 5.x, and 6.x
    # Redis instance addresses can be tcp addresses: redis://localhost:6379, redis.example.com:6379 or e.g. unix sockets: unix:///tmp/redis.sock.
    docker run -d --name redis_exporter --network host -e REDIS_ADDR="redis://192.168.12.1doc85:6379" -e REDIS_PASSWORD="weiyigeek.top" oliver006/redis_exporter # -p 9121:9121

    # 查看部署的exporter
    $ docker ps
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    c3a7a5663143 oliver006/redis_exporter "/redis_exporter" 9 minutes ago Up 9 minutes redis_exporter
    0a3d557bf36b prom/mysqld-exporter "/bin/mysqld_exporter" 16 minutes ago Up 16 minutes 0.0.0.0:9104->9104/tcp mysqld-exporter
  • Step 2.分别访问mysqld-exporter和redis_exporter的metrics的URL

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    $ curl -s http://192.168.12.111:9104/metrics | tail -n -5
    # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
    # TYPE promhttp_metric_handler_requests_total counter
    promhttp_metric_handler_requests_total{code="200"} 2
    promhttp_metric_handler_requests_total{code="500"} 2
    promhttp_metric_handler_requests_total{code="503"} 0

    $ curl -s http://192.168.12.111:9121/metrics | tail -n -5
    # TYPE redis_up gauge
    redis_up 1
    # HELP redis_uptime_in_seconds uptime_in_seconds metric
    # TYPE redis_uptime_in_seconds gauge
    redis_uptime_in_seconds 1.281979e+06
  • Step 3.prometheus.yaml 主配置文件修改和添加;

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    scrape_configs:
    - job_name: 'mysql_discovery'
    file_sd_configs:
    - files:
    - /etc/prometheus/conf.d/discovery/mysql_discovery.yaml
    refresh_interval: 1m
    - job_name: 'redis_discovery'
    file_sd_configs:
    - files:
    - /etc/prometheus/conf.d/discovery/redis_discovery.yaml
    refresh_interval: 1m

    # vi mysql_discovery.yaml
    - targets: [ '192.168.12.111:9104' ]
    labels: {'env': 'test','osType': 'container','nodeType': 'database'}

    # vi redis_discovery.yaml
    - targets: [ '192.168.12.111:9121' ]
    labels: {'env': 'test','osType': 'container','nodeType': 'database'}
  • Step 4.热加载prometheus.yaml配置或者重新启动prometheus容器验证monitor目标执行PromQL表达式: redis_instance_info or mysql_version_info

WeiyiGeek.redis&mysql_exporter

WeiyiGeek.redis&mysql_exporter

WeiyiGeek.Redis-Dashbord

WeiyiGeek.MySQL-Dashbord

WeiyiGeek.MySQL-Dashbord


0x03 实现Jenkins持续集成和交付的服务监控和展示

目标: 使用Prometheus对持续集成Jenkins进行监控,并通过Grafana展示监控数据。

  • Step 1.安装Prometheus metrics插件(Prometheus metrics: Expose Jenkins metrics in prometheus format)

  • Step 2.在系统管理-> 系统配置->配置Prometheus插件,主要填写Path地址和url路径namespace最后应用保存即可。

WeiyiGeek.Prometheus metrics

WeiyiGeek.Prometheus metrics

  • Step 3.测试验证Prometheus插件运行情况,即访问http://yourjenkinserver:port/prometheus
WeiyiGeek.reuqets-Prometheus

WeiyiGeek.reuqets-Prometheus

  • Step 4.在我们的Prometheus服务端将该端点地址添加到prometheus.yml之中。

    1
    2
    3
    4
    5
    6
    - job_name: 'jenkins'
    metrics_path: '/prometheus/'
    scheme: 'http'
    bearer_token: 'bearer_token'
    static_configs:
    - targets: ['192.168.12.107:30001']
  • Step 5.重载prometheus.yml配置(使用方式必须是在有--web.enable-lifecycle启动参数为真的情况下),然后验证监控项:devops_jenkins_executors_available

    1
    2
    3
    4
    5
    # 使修改后的配置生效(在也不用重启容器了)
    curl -X POST http://192.168.12.107:30090/-/reload

    # 查看当前配置
    curl http://192.168.12.107:30090/api/v1/status/config
    WeiyiGeek.jenkins-prometheus

    WeiyiGeek.jenkins-prometheus

WeiyiGeek.a Jenkins performance and health overview

WeiyiGeek.a Jenkins performance and health overview


0x04 实现kubernetes外部集群的监控和展示

描述: 我们知道学习测试 Prometheus 一般将其安装在k8s集群中进行数据metrics的采集,但在实际的环境中企业大多选择将
ometheus 单独部署在集群外部进行监控某一集群,如果有多套集群时使用不同的 Prometheus 实例监控不同的集群,然后用联邦的方式进行汇总。
其次由于我们学习环境的原因,本章将使用 Prometheus 监控外部的 Kubernetes 集群进行配置讲解(在kubernetes集群中即可参照下面某些方式进行配置)


Q: Prometheus 如何采集Kubernetes集群数据?

答: 如果我们对集群内部的 Prometheus 自动发现 Kubernetes 的数据比较熟悉的话,那么监控外部集群的原理也是一样的,只是访问 APIServer 的形式有 inCluster 模式变成了 KubeConfig 的模式,inCluster 模式下在 Pod 中就已经自动注入了访问集群的 token 和 ca.crt 文件,所以非常方便,那么在集群外的话就需要我们手动提供这两个文件,才能够做到自动发现了。


Q: Prometheus通过exporter收集各种维度的监控指标

答: Prometheus 通过 kubernetes_sd_configs 从 Kubernetes 的 REST API 查找并拉取指标,并始终与集群状态保持同步,使用endpoints,service,node,pod,ingress等角色进行自动发现

  • endpoints : 自动发现service中的endpoint
  • node: 自动发现每个集群节点发现一个target,其地址默认为Kubelet的HTTP端口如"https://192.168.3.217:10250/metrics"
  • service : 自动发现每个服务的每个服务端口的target
  • pod : 自动发现所有容器及端口
  • ingrsss : 自动发现ingress中path


Q: 可以通过哪几种方式维度收集监控指标?

维度 | 工具 | 监控url(__metrics_path__) |备注|
|:—:|:—:|:—|—-|
|Node性能 | node-exporter | /api/v1/nodes/node名:9100/proxy/metrics | 节点状态 |
|Pod性能 | kubelet
cadvisor | /api/v1/nodes/node名:10250/proxy/metrics
/api/v1/nodes/node名:10250/proxy/metrics/cadvisor|容器状态|
|K8S资源| kube-state-metrics|__scheme__://__address____metrics_path__|Deploy/ds等|

Tips : 注意kube-state-metrics监控的URL的动态发现是基于标签的自动补全,其中标签的值都可以通过Prometheus的relabel_config拼接成最终的监控url,由于集群外部署Prometheus和集群内部署Prometheus是不一样的,因此我们可以通过proxy url集群外Prometheus就可以访问监控url来拉取监控指标。


Q: 如何构造Apiserver proxy url?
描述: 在k8s集群中nodes、pods、services都有自己的私有IP,但是无法从集群外访问;但K8S提供以下几种方式来访问:1.通过public IPs访问service , 2.通过proxy 访问node、pod、service, 3.通过集群内的node或pod间接访问

例如: 通过kubectl cluster-info命令可以查看kube-system命令空间的proxy url

1
2
3
$ kubectl cluster-info
Kubernetes master is running at https://k8s-dev.weiyigeek:6443
KubeDNS is running at https://k8s-dev.weiyigeek:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

所以其默认的构造规则为如下格式。

1
2
3
4
5
6
7
set other_apiserver_address = k8s-dev.weiyigeek:6443
# 访问node
https://${other_apiserver_address}/api/v1/nodes/node_name:[port_name]/proxy/metrics
# 访问service
https://${other_apiserver_address}/api/v1/namespaces/service_namespace/services/http:service_name[:port_name]/proxy/metrics
# 访问pod
https://${other_apiserver_address}/api/v1/namespaces/pod_namespace/pods/http:pod_name[:port_name]/proxy/metrics

在我们了解如何构造proxy url后,我们可以通过集群外Prometheus的relabel_config自行构造proxy url。


1.Endpoints 之服务自动发现

描述: 此处我们采用进行安装部署k8s集群监控的kube-state-metrics服务, 它监听Kubernetes API服务器并生成关联对象的指标, 它不关注单个Kubernetes组件的运行状况,而是关注内部各种对象(如deployment、node、pod等)的运行状况。

流程步骤:

  • Step 1.我们先查看当前kube-state-metrics兼容性矩阵与我们kubernetes集群版本的对应参考地址,下面最多记录5个kube状态度量和5个kubernetes版本。
    1
    2
    3
    4
    5
    kube-state-metrics 	Kubernetes 1.17 	Kubernetes 1.18 	Kubernetes 1.19 	Kubernetes 1.20 	Kubernetes 1.21
    v1.8.0 - - - - -
    v1.9.8 - - - - -
    v2.0.0 -/✓ -/✓ ✓ ✓ -/✓
    master -/✓ -/✓ ✓ ✓ ✓


  • Step 2.k8s集群 ApiServer 访问鉴权账号创建和绑定的集群角色权限配置。
    描述: 在访问K8S apiserver需要先进行授权,而集群内部Prometheus可以使用集群内默认配置进行访问,而集群外访问需要使用token+客户端cert进行认证因此需要先进行RBAC授权。
    • 此处测试由于我们需要访问不同的namespace,建议先使用分配绑定cluster-admin权限,但在生产中一定要使用最小权限原则来保证其安全性(后面会进行演示)。
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      # 1.创建名称空间
      kubectl create ns monitor
      # namespace/monitor created

      # 2.创建serviceaccounts
      kubectl create sa prometheus --namespace monitor
      # serviceaccount/prometheus created

      # 3.创建prometheus角色并对其绑定cluster-admin
      $ kubectl create clusterrolebinding prometheus --clusterrole cluster-admin --serviceaccount=monitor:prometheus
      # clusterrolebinding.rbac.authorization.k8s.io/prometheus created

      # 4.查看创建的角色对应的Token值
      $ kubectl get sa
      # NAME SECRETS AGE
      # default 1 18d
      # prometheus 1 24s

      $ kubectl get sa prometheus -n monitor -o yaml
      # apiVersion: v1
      # kind: ServiceAccount
      # metadata:
      # creationTimestamp: "2021-05-10T06:10:28Z"
      # name: prometheus
      # namespace: default
      # resourceVersion: "3596438"
      # selfLink: /api/v1/namespaces/default/serviceaccounts/prometheus
      # uid: af6a884d-2670-4f46-836e-d8ccf9fd0c38
      # secrets:
      # - name: prometheus-token-ft8bd # secrets Token 关键点

      # 5.一行命令搞定
      kubectl get secret -n monitor $(kubectl get sa prometheus -n monitor -o yaml | tail -n 1 | cut -d " " -f 3) -o yaml | grep "token:" | head -n 1 | awk '{print $2}'|base64 -d > k8s_token

      # 6.补充k8s集群相关crt和key获取
      # client-certificate-data
      ~$ grep 'client-certificate-data' ~/.kube/config | head -n 1 | awk '{print $2}' | base64 -d > k8s_client.crt
      # client-key-data
      ~$ grep 'client-key-data' ~/.kube/config | head -n 1 | awk '{print $2}' | base64 -d > k8s_client.key
      scp -P20211 weiyigeek@weiyigeek-226:~/.kube/k8s_client.crt ./conf.d/ssl/
      scp -P20211 weiyigeek@weiyigeek-226:~/.kube/k8s_client.key ./conf.d/ssl/


  • Step 3.参考采用官方提供的部署资源清单参考地址,此处已采用上面创建的 prometheus 用户。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    tee kube-state-metrics.yaml <<'EOF'
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.0.0
    name: kube-state-metrics
    namespace: kube-system
    spec:
    replicas: 1
    selector:
    matchLabels:
    app.kubernetes.io/name: kube-state-metrics
    template:
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.0.0
    spec:
    containers:
    - image: bitnami/kube-state-metrics:2.0.0
    livenessProbe:
    httpGet:
    path: /healthz
    port: 8080
    initialDelaySeconds: 5
    timeoutSeconds: 5
    name: kube-state-metrics
    ports:
    - containerPort: 8080
    name: http-metrics
    - containerPort: 8081
    name: telemetry
    readinessProbe:
    httpGet:
    path: /
    port: 8081
    initialDelaySeconds: 5
    timeoutSeconds: 5
    securityContext:
    runAsUser: 65534
    nodeSelector:
    kubernetes.io/os: linux
    serviceAccountName: prometheus
    ---
    apiVersion: v1
    kind: Service
    metadata:
    labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.0.0
    name: kube-state-metrics
    namespace: kube-system
    annotations:
    # 注意: 此处需要进行添加到annotations(注释)便于prometheus进行自动发现。
    prometheus.io/scrape: 'true'
    spec:
    clusterIP: None
    ports:
    - name: http-metrics
    port: 8080
    targetPort: http-metrics
    - name: telemetry
    port: 8081
    targetPort: telemetry
    selector:
    app.kubernetes.io/name: kube-state-metrics
    EOF

    # - 创建 monitor 名称空间
    kubectl create ns monitor

    sed -i "s#kube-system#monitor#g" kube-state-metrics.yaml

    # - 部署 kube-state-metrics
    kubectl apply -f kube-state-metrics.yaml
    # deployment.apps/kube-state-metrics created
    # service/kube-state-metrics created


  • Step 4.验证查看部署情况并获取认证token

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    $ kubectl get pod,svc,ep -n monitor --show-labels
    # NAME READY STATUS RESTARTS AGE LABELS
    # pod/kube-state-metrics-777789bc9d-9n6jf 1/1 Running 0 3m30s app.kubernetes.io/name=kube-state-metrics,app.kubernetes.io/version=2.0.0,pod-template-hash=777789bc9d

    # NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE LABELS
    # service/kube-state-metrics ClusterIP None <none> 8080/TCP,8081/TCP 3m30s app.kubernetes.io/name=kube-state-metrics,app.kubernetes.io/version=2.0.0,prometheus.io/scrape=true

    # NAME ENDPOINTS AGE LABELS
    # endpoints/kube-state-metrics 172.16.182.199:8081,172.16.182.199:8080 3m30s app.kubernetes.io/name=kube-state-metrics,app.kubernetes.io/version=2.0.0,prometheus.io/scrape=true,service.kubernetes.io/headless=

    # - ca.crt 文件以及Token
    # kubectl get secret -n monitor $(kubectl get sa kube-state-metrics -n monitor -o yaml | tail -n 1 | cut -d " " -f 3) -o yaml | grep "ca.crt:" | head -n 1 | awk '{print $2}' | base64 -d > k8s_ca.crt
    kubectl get secret -n monitor $(kubectl get sa kube-state-metrics -n monitor -o yaml | tail -n 1 | cut -d " " -f 3) -o yaml | grep "token:" | head -n 1 | awk '{print $2}'| base64 -d > k8s_token
  • Step 5.将获取到的k8s_ca.crtk8s_token文件下载到prometheus主配置文件中指定的目录之中;

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    ansible weiyigeek-226 -m fetch -a "src=/home/weiyigeek/prometheus/k8s_ca.crt dest=/tmp"
    ansible weiyigeek-226 -m fetch -a "src=/home/weiyigeek/prometheus/k8s_token dest=/tmp"
    # weiyigeek-226 | CHANGED => {
    # "changed": true,
    # "checksum": "22c40a4f83ad82343affbab3f8a732c14accbdcd",
    # "dest": "/tmp/k8s_token/weiyigeek-226/home/weiyigeek/prometheus/k8s_kube-state-metrics_token",
    # "md5sum": "c9d780a62db497bbfd995b548887e4ed",
    # "remote_checksum": "22c40a4f83ad82343affbab3f8a732c14accbdcd",
    # "remote_md5sum": null
    # }


  • Step 6.配置prometheus.yml主配置文件添加kubernetes_sd_configs对象配置endpoints角色的自动发现。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    - job_name: 'k8s-endpoint-discover'
    scheme: https
    #使用apiserver授权部分解密的token值以文件形式存储
    tls_config:
    ca_file: /etc/prometheus/conf.d/auth/k8s_ca.crt
    insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
    # k8s自动发现具体配置
    kubernetes_sd_configs:
    # 使用endpoint级别自动发现
    - role: endpoints
    api_server: 'https://192.168.12.226:6443'
    tls_config:
    ca_file: /etc/prometheus/conf.d/auth/k8s_ca.crt
    insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
    # 只保留指定匹配正则的标签,不匹配则删除
    action: keep
    regex: '^(kube-state-metrics)$'
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    # 只保留指定匹配正则的标签,不匹配则删除
    action: keep
    regex: true
    - source_labels: [__address__]
    action: replace
    target_label: instance
    - target_label: __address__
    # 使用replacement值替换__address__默认值
    replacement: 192.168.12.226:6443
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_name, __meta_kubernetes_pod_container_port_number]
    # 正则匹配
    regex: ([^;]+);([^;]+);([^;]+)
    # 使用replacement值替换__metrics_path__默认值
    target_label: __metrics_path__
    # 自行构建的apiserver proxy url
    replacement: /api/v1/namespaces/${1}/pods/http:${2}:${3}/proxy/metrics
    - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
    action: replace
    # 将标签__meta_kubernetes_namespace修改为kubernetes_namespace
    target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
    action: replace
    # 将标签__meta_kubernetes_service_name修改为service_name
    target_label: service_name


Tips: 通过relabel_configs构造 prometheus (endpoints) Role 访问 API Server 的 URL;

标签 默认 构造后
__scheme__ https https

__address__ | 172.16.182.200:8081 |192.168.12.226:6443|
__metrics_path__ | /metrics | /api/v1/namespaces/kube-system/pods/http:kube-state-metrics-6477678b78-6qkjg:8081/proxy/metrics |
|URL| https://10.244.2.10:8081/metrics | https://192.168.12.226:6443/api/v1/namespaces/kube-system/pods/http:kube-state-metrics-6477678b78-6qkjg:8081/proxy/metrics |


  • Step 6.修改主配置文件完成后进行重启 prometheus Server 容器然后查看启动状态,由图中可以看见监控成功。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    # k8s-endpoint-discover (2/2 up)
    # - Discovered Labels
    __meta_kubernetes_endpoint_address_target_kind="Pod"
    __meta_kubernetes_endpoint_address_target_name="kube-state-metrics-6477678b78-6qkjg"
    __meta_kubernetes_endpoint_node_name="weiyigeek-226"
    __meta_kubernetes_endpoint_port_name="telemetry"
    __meta_kubernetes_endpoint_port_protocol="TCP"
    __meta_kubernetes_endpoint_ready="true"
    __meta_kubernetes_endpoints_label_app_kubernetes_io_name="kube-state-metrics"
    __meta_kubernetes_endpoints_label_app_kubernetes_io_version="2.0.0"
    __meta_kubernetes_endpoints_labelpresent_app_kubernetes_io_name="true"
    __meta_kubernetes_endpoints_labelpresent_app_kubernetes_io_version="true"
    __meta_kubernetes_endpoints_labelpresent_service_kubernetes_io_headless="true"
    __meta_kubernetes_endpoints_name="kube-state-metrics"
    __meta_kubernetes_namespace="monitor"
    __meta_kubernetes_pod_annotation_cni_projectcalico_org_podIP="172.16.182.200/32"
    __meta_kubernetes_pod_annotation_cni_projectcalico_org_podIPs="172.16.182.200/32"
    __meta_kubernetes_pod_annotationpresent_cni_projectcalico_org_podIP="true"
    __meta_kubernetes_pod_annotationpresent_cni_projectcalico_org_podIPs="true"
    __meta_kubernetes_pod_container_name="kube-state-metrics"
    __meta_kubernetes_pod_container_port_name="telemetry"
    __meta_kubernetes_pod_container_port_number="8081"
    __meta_kubernetes_pod_container_port_protocol="TCP"
    __meta_kubernetes_pod_controller_kind="ReplicaSet"
    __meta_kubernetes_pod_controller_name="kube-state-metrics-6477678b78"
    __meta_kubernetes_pod_host_ip="192.168.12.226"
    __meta_kubernetes_pod_ip="172.16.182.200"
    __meta_kubernetes_pod_label_app_kubernetes_io_name="kube-state-metrics"
    __meta_kubernetes_pod_label_app_kubernetes_io_version="2.0.0"
    __meta_kubernetes_pod_label_pod_template_hash="6477678b78"
    __meta_kubernetes_pod_labelpresent_app_kubernetes_io_name="true"
    __meta_kubernetes_pod_labelpresent_app_kubernetes_io_version="true"
    __meta_kubernetes_pod_labelpresent_pod_template_hash="true"
    __meta_kubernetes_pod_name="kube-state-metrics-6477678b78-6qkjg"
    __meta_kubernetes_pod_node_name="weiyigeek-226"
    __meta_kubernetes_pod_phase="Running"
    __meta_kubernetes_pod_ready="true"
    __meta_kubernetes_pod_uid="70037554-7c4c-4372-9128-e9689b7cff10"
    __meta_kubernetes_service_annotation_kubectl_kubernetes_io_last_applied_configuration="{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/scrape":"true"},"labels":{"app.kubernetes.io/name":"kube-state-metrics","app.kubernetes.io/version":"2.0.0"},"name":"kube-state-metrics","namespace":"monitor"},"spec":{"clusterIP":"None","ports":[{"name":"http-metrics","port":8080,"targetPort":"http-metrics"},{"name":"telemetry","port":8081,"targetPort":"telemetry"}],"selector":{"app.kubernetes.io/name":"kube-state-metrics"}}} "
    __meta_kubernetes_service_annotation_prometheus_io_scrape="true"
    __meta_kubernetes_service_annotationpresent_kubectl_kubernetes_io_last_applied_configuration="true"
    __meta_kubernetes_service_annotationpresent_prometheus_io_scrape="true"
    __meta_kubernetes_service_label_app_kubernetes_io_name="kube-state-metrics"
    __meta_kubernetes_service_label_app_kubernetes_io_version="2.0.0"
    __meta_kubernetes_service_labelpresent_app_kubernetes_io_name="true"
    __meta_kubernetes_service_labelpresent_app_kubernetes_io_version="true"
    __meta_kubernetes_service_name="kube-state-metrics"
    __metrics_path__="/metrics"
    __scheme__="https"
    job="k8s-endpoint-discover"
    # Target Labels
    app_kubernetes_io_name="kube-state-metrics"
    app_kubernetes_io_version="2.0.0"
    instance="172.16.182.200:8081"
    job="k8s-endpoint-discover"
    kubernetes_namespace="monitor"
    service_name="kube-state-metrics"

    # PromQL 表达式
    up{job="k8s-endpoint-discover"} or go_info{job="k8s-endpoint-discover"}
    # up{app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_version="2.0.0", instance="172.16.182.199:8080", job="k8s-endpoint-discover", kubernetes_namespace="monitor", service_name="kube-state-metrics"} 1
    # up{app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_version="2.0.0", instance="172.16.182.199:8081", job="k8s-endpoint-discover", kubernetes_namespace="monitor", service_name="kube-state-metrics"} 1
    # go_info{app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_version="2.0.0", instance="172.16.182.199:8081", job="k8s-endpoint-discover", kubernetes_namespace="monitor", service_name="kube-state-metrics", version="go1.16.3"} 1
WeiyiGeek.k8s-endpoint-discover

WeiyiGeek.k8s-endpoint-discover


补充说明: metrics-server 和 kube-state-metrics对比

类别 metrics-server kube-state-metrics
简单介绍 Metrics Server通过Metrics API公开核心Kubernetes度量 kube state metrics是关于从Kubernetes API对象生成度量而不需要修改,确保了kube状态度量提供的特性具有与kubernetesapi对象本身相同的稳定性。
监控对象 监控Node和Pod等CPU、内存、网络等系统指标 关注Node,Deployment,Pod,Services,Namespace等内部对象的状态
项目地址 https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server (已丢失)
https://github.com/kubernetes-sigs/metrics-server/ (建议)
https://github.com/kubernetes/kube-state-metrics
服务端口 443 8080

示例: kube-state-metrics 收集到的节点信息, 如验证指标是否采集成功请求kube-state-metrics的pod ip+8080端口出现以下页面则正常

1
2
$ kube_node_info{job="k8s-endpoint-discover"}
# kube_node_info{app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_version="2.0.0", container_runtime_version="docker://19.3.15", instance="172.16.182.200:8080", internal_ip="192.168.12.226", job="k8s-endpoint-discover", kernel_version="5.4.0-73-generic", kubelet_version="v1.19.10", kubeproxy_version="v1.19.10", kubernetes_namespace="monitor", node="weiyigeek-226", os_image="Ubuntu 20.04.2 LTS", pod_cidr="172.16.0.0/24", service_name="kube-state-metrics"}

kube-state-metrics

kube-state-metrics


2.Node 之服务自动发现

描述: 通过node-exporter采集集群node节点的服务器层面的数据,如cpu、内存、磁盘、网络流量等,当然node-exporter可以独立部署在node节点服务器上但是每次都要进行手动配置添加监控是非常不方便。

流程步骤:

  • Step 1.此处将node-exporter以DaemonSet形式部署,配合Prometheus动态发现更加方便。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    tee node-exporter.yaml <<'EOF'
    kind: DaemonSet
    apiVersion: apps/v1
    metadata:
    name: node-exporter
    namespace: monitor
    annotations:
    prometheus.io/scrape: 'true'
    spec:
    selector:
    matchLabels:
    app: node-exporter
    template:
    metadata:
    labels:
    app: node-exporter
    name: node-exporter
    spec:
    containers:
    - image: prom/node-exporter:v1.1.2
    name: node-exporter
    ports:
    - containerPort: 9100
    hostPort: 9100
    name: node-exporter
    hostNetwork: true
    hostPID: true
    tolerations:
    - key: "node-role.kubernetes.io/master"
    operator: "Exists"
    effect: "NoSchedule"
    ---
    kind: Service
    apiVersion: v1
    metadata:
    name: node-exporter
    namespace: monitor
    labels:
    app: node-exporter
    annotations:
    prometheus.io/scrape: 'true'
    spec:
    type: ClusterIP
    clusterIP: None
    ports:
    - name: node-exporter
    port: 9100
    protocol: TCP
    selector:
    app: node-exporter
    EOF

    ~$ kubectl apply -f node-exporter.yaml
    # daemonset.apps/node-exporter created

    ~$ kubectl get pod -n monitor
    # NAME READY STATUS RESTARTS AGE
    # node-exporter-p5tbp 1/1 Running 0 20s
  • Step 2.创建SA账户并对其进行RBAC权限设置(最小权限原则)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    $ kubectl create sa prometheus -n monitor
    # serviceaccount/prometheus created

    # 集群角色 RBAC 权限申明
    tee prometheus-clusterRole.yaml <<'EOF'
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
    name: prometheus
    namespace: monitor
    rules:
    - apiGroups:
    - ""
    resources:
    - nodes
    - services
    - endpoints
    - pods
    - nodes/proxy
    verbs:
    - get
    - list
    - watch
    - apiGroups:
    - "extensions"
    resources:
    - ingresses
    verbs:
    - get
    - list
    - watch
    - apiGroups:
    - ""
    resources:
    - configmaps
    - nodes/metrics
    verbs:
    - get
    - nonResourceURLs:
    - /metrics
    verbs:
    - get
    EOF

    # 集群角色权限
    $ kubectl create -f prometheus-clusterRole.yaml
    # clusterrole.rbac.authorization.k8s.io/prometheus created
    # 集群角色绑定
    $ kubectl create clusterrolebinding prometheus --clusterrole prometheus --serviceaccount=monitor:prometheus
    # 或者一步搞定(此处是上面一步得yaml资源清单)
    # apiVersion: rbac.authorization.k8s.io/v1beta1
    # kind: ClusterRoleBinding
    # metadata:
    # name: prometheus
    # roleRef:
    # apiGroup: rbac.authorization.k8s.io
    # kind: ClusterRole
    # name: prometheus
    # subjects:
    # - kind: ServiceAccount
    # name: prometheus
    # namespace: monitor


    # 获取认证的Token
    kubectl get secret -n monitor $(kubectl get sa prometheus -n monitor -o yaml | tail -n 1 | cut -d " " -f 3) -o yaml | grep "token:" | head -n 1 | awk '{print $2}'| base64 -d > k8s_prometheuser_token

    # 将k8s_prometheuser_token下载到prometheus服务器中
    ansible weiyigeek-226 -m fetch -a "src=/home/weiyigeek/prometheus/k8s_prometheuser_token dest=/tmp"
    # weiyigeek-226 | CHANGED => {
    # "changed": true,
    # "checksum": "d4a16cebda1b6037dcb68004d0ff4cdf4079bbc5",
    # "dest": "/tmp/weiyigeek-226/home/weiyigeek/prometheus/k8s_prometheuser_token",
    # "md5sum": "bdcd6c4a77ab6ee2afa5ac6f78ddb94a",
    # "remote_checksum": "d4a16cebda1b6037dcb68004d0ff4cdf4079bbc5",
    # "remote_md5sum": null
    # }


  • Step 3.Prometheus.yaml 主配置文件添加kubernetes_sd_configs对象使用node级别自动发现;
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    - job_name: 'k8s-nodes-discover'
    scheme: https
    # 使用apiserver授权部分解密的token值,以文件形式存储
    tls_config:
    # ca_file: /etc/prometheus/conf.d/auth/k8s_kube-state-metrics_ca.crt
    insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_prometheuser_token
    # k8s自动发现具体配置
    kubernetes_sd_configs:
    # 使用node级别自动发现
    - role: node
    api_server: 'https://192.168.12.226:6443'
    tls_config:
    # ca_file: /etc/prometheus/conf.d/auth/k8s_kube-state-metrics_ca.crt
    insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_prometheuser_token
    relabel_configs:
    #- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    # 只保留指定匹配正则的标签,不匹配则删除
    #action: keep
    #regex: true
    - target_label: __address__
    # 使用replacement值替换__address__默认值
    replacement: 192.168.12.226:6443
    - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    # 使用replacement值替换__metrics_path__默认值 , 如果采用默认的kubelet进行数据的采集
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}:9100/proxy/metrics
    - source_labels: [__meta_kubernetes_service_name]
    action: replace
    # 将标签__meta_kubernetes_service_name修改为service_name
    target_label: service_name
    - source_labels: [__meta_kubernetes_namespace]
    action: replace
    # 将标签__meta_kubernetes_namespace修改为kubernetes_namespace
    target_label: kubernetes_namespace


Tips: 通过relabel_configs构造 prometheus (node) Role 访问 API Server 的 URL;

标签 默认 构造后
__scheme__ https https
__address__ 192.168.3.217:10250 192.168.3.217:6443
__metrics_path__ (node_exporter) /metrics /api/v1/nodes/uvmsvr-3-217:9100/proxy/metrics
URL https://192.168.3.217:10250/metrics https://192.168.3.217:6443/api/v1/nodes/uvmsvr-3-217:9100/proxy/metrics
__metrics_path__ (kubelet) /metrics /api/v1/nodes/uvmsvr-3-217:10250/proxy/metrics
URL https://192.168.3.217:10250/metrics https://192.168.3.217:6443/api/v1/nodes/uvmsvr-3-217:10250/proxy/metrics/cadvisor
__metrics_path__ (advisor) /metrics /api/v1/nodes/uvmsvr-3-217:10250/proxy/metrics
URL https://192.168.3.217:10250/metrics https://192.168.3.217:6443/api/v1/nodes/uvmsvr-3-217:10250/proxy/metrics/cadvisor


  • Step 4.重启服务查看监控目标状态以及服务发现是否成功监控。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    # (1) 该Job的状态信息
    k8s-nodes-discover (1/1 up)
    Endpoint State Labels Last Scrape Scrape Duration Error
    https://192.168.12.226:6443/api/v1/nodes/weiyigeek-226:9100/proxy/metrics UP instance="weiyigeek-226"job="k8s-nodes-discover"

    # (2) PromQL 表达式查询
    up{job="k8s-nodes-discover"} or go_info{job="k8s-nodes-discover"}
    # up{instance="weiyigeek-226", job="k8s-nodes-discover"} 1
    # go_info{instance="weiyigeek-226", job="k8s-nodes-discover", version="go1.15.8"} 1
WeiyiGeek.k8s-nodes-discover-9100

WeiyiGeek.k8s-nodes-discover-9100


  • Step 5.此时我们可以将__metrics_path__替换成/api/v1/nodes/${1}:10250/proxy/metrics,如此便采用了kubelet采集拉取监控指标。
WeiyiGeek.k8s-nodes-discover-10250

WeiyiGeek.k8s-nodes-discover-10250


3.综合实践之(cAdvisor+Kube-state-metrics+Grafana)组合拳方案

描述: Grafana从prometheus数据源读取监控指标并进行图形化,根据其官网提供的众多模板,我们可以针对不同维度的监控指标,我们可以自行选择喜欢的模板直接导入Dashboard id使用。

例如:以下针对于不同场景采用的不同的Dashboard面板:


实践目标: 使用cadvisor采集Pod容器相关信息+使用kube-state-metrics采集集群相关信息+使用Grafana将Prometheus采集到的数据进行展示。

流程步骤:

  • Step 1.在前面的基础环境之上修改的 Prometheus.yaml 主配置文件内容如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    tee prometheus.yaml <<'EOF'
    global:
    scrape_interval: 2m
    scrape_timeout: 10s
    evaluation_interval: 1m
    external_labels:
    monitor: 'prom-demo'

    alerting:
    alertmanagers:
    - scheme: http
    static_configs:
    - targets:
    - '192.168.12.107:30093'

    rule_files:
    - /etc/prometheus/conf.d/rules/*.rules

    scrape_configs:
    - job_name: 'prom-Server'
    static_configs:
    - targets: ['localhost:9090']
    - job_name: 'cAdvisor'
    static_configs:
    - targets: ['192.168.12.111:9100']
    - job_name: 'linux_exporter'
    file_sd_configs:
    - files:
    - /etc/prometheus/conf.d/discovery/k8s_nodes.yaml
    refresh_interval: 1m
    - job_name: 'windows-exporter'
    file_sd_configs:
    - files:
    - /etc/prometheus/conf.d/discovery/win_nodes.yaml
    refresh_interval: 1m
    - job_name: 'mysql_discovery'
    file_sd_configs:
    - files:
    - /etc/prometheus/conf.d/discovery/mysql_discovery.yaml
    - job_name: 'redis_discovery'
    file_sd_configs:
    - files:
    - /etc/prometheus/conf.d/discovery/redis_discovery.yaml
    - job_name: 'k8s-endpoint-discover'
    scheme: https
    #使用apiserver授权部分解密的token值,以文件形式存储
    tls_config:
    #ca_file: /etc/prometheus/conf.d/auth/k8s_kube-state-metrics_ca.crt
    insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
    # k8s自动发现具体配置
    kubernetes_sd_configs:
    # 使用endpoint级别自动发现
    - role: endpoints
    api_server: 'https://192.168.12.226:6443'
    tls_config:
    # ca_file: /etc/prometheus/conf.d/auth/k8s_kube-state-metrics_ca.crt
    insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
    # 只保留指定匹配正则的标签,不匹配则删除
    action: keep
    regex: '^(kube-state-metrics)$'
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    #只保留指定匹配正则的标签,不匹配则删除
    action: keep
    regex: true
    - source_labels: [__address__]
    action: replace
    target_label: instance
    - target_label: __address__
    # 使用replacement值替换__address__默认值
    replacement: 192.168.12.226:6443
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_name, __meta_kubernetes_pod_container_port_number]
    # 正则匹配
    regex: ([^;]+);([^;]+);([^;]+)
    # 使用replacement值替换__metrics_path__默认值
    target_label: __metrics_path__
    # 自行构建的apiserver proxy url
    replacement: /api/v1/namespaces/${1}/pods/http:${2}:${3}/proxy/metrics
    - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
    action: replace
    # 将标签__meta_kubernetes_namespace修改为kubernetes_namespace
    target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
    action: replace
    # 将标签__meta_kubernetes_service_name修改为service_name
    target_label: service_name

    - job_name: 'k8s-cadvisor'
    scheme: https
    # 使用apiserver授权部分解密的token值,以文件形式存储
    tls_config:
    insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
    metrics_path: /metrics/cadvisor
    kubernetes_sd_configs:
    - role: node
    api_server: 'https://192.168.12.226:6443'
    tls_config:
    insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
    relabel_configs:
    - source_labels: [__address__]
    action: replace
    target_label: instance
    - target_label: __address__
    # 使用replacement值替换__address__默认值
    replacement: 192.168.12.226:6443
    - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    # 使用replacement值替换__metrics_path__默认值 , 如果采用默认的kubelet进行数据的采集
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}:10250/proxy/metrics/cadvisor
    - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
    metric_relabel_configs:
    - source_labels: [instance]
    separator: ;
    regex: (.+)
    target_label: node
    replacement: $1
    action: replace
    - source_labels: [pod_name]
    separator: ;
    regex: (.+)
    target_label: pod
    replacement: $1
    action: replace
    - source_labels: [container_name]
    separator: ;
    regex: (.+)
    target_label: container
    replacement: $1
    action: replace
    - source_labels: [origin_prometheus]
    separator: ;
    regex: (.+)
    target_label: node
    replacement: $1
    action: replace
    EOF
  • Step 2.关键配置说明由于此处我们的Prometheus是在k8s集群外部署的所以需要重新构建__metrics_path__字符串以便代理访问。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    # - k8s-cAdvisor
    - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    # 使用replacement值替换__metrics_path__默认值 , 如果采用默认的kubelet进行数据的采集
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}:10250/proxy/metrics/cadvisor

    # - kube-state-metrics
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_name, __meta_kubernetes_pod_container_port_number]
    # 正则匹配
    regex: ([^;]+);([^;]+);([^;]+)
    # 使用replacement值替换__metrics_path__默认值
    target_label: __metrics_path__
    # 自行构建的apiserver proxy url
    replacement: /api/v1/namespaces/${1}/pods/http:${2}:${3}/proxy/metrics


  • Step 3.重启我们的Prometheus服务并验证服务发现和目标。
    1
    2
    3
    4
    5
    # - k8s-cadvisor (1/1 up) : https://192.168.12.226:6443/api/v1/nodes/weiyigeek-226:10250/proxy/metrics/cadvisor

    # - k8s-endpoint-discover (2/2 up)
    # https://192.168.12.226:6443/api/v1/namespaces/monitor/pods/http:kube-state-metrics-6477678b78-6qkjg:8080/proxy/metrics
    # https://192.168.12.226:6443/api/v1/namespaces/monitor/pods/http:kube-state-metrics-6477678b78-6qkjg:8081/proxy/metrics
    WeiyiGeek.k8s-cadvisor

    WeiyiGeek.k8s-cadvisor


WeiyiGeek.cadvisor+Dashboard

WeiyiGeek.cadvisor+Dashboard

  • Step 5.至此完毕此项实践。

Tips : 通过 Dashboard 模板我们需要自行选择并组合, 灵活有余但规范不足, 我们常常使用grafana专门针对Kubernetes集群监控的插件grafana-kubernetes-app它包括4个仪表板,集群,节点,Pod /容器和部署,但由于其插件作者没有更新维护,所以更多是采用KubeGraf,该插件可以用来可视化和分析 Kubernetes 集群的性能,通过各种图形直观的展示了 Kubernetes 集群的主要服务的指标和特征,还可以用于检查应用程序的生命周期和错误日志。