doina

一个小菜鸟运维工程师.

Prometheus配置文件

全局配置文件

官方文档: https://prometheus.io/docs/prometheus/latest/configuration/configuration/

全局配置选项:

global:
  # How frequently to scrape targets by default. 
  [ scrape_interval: <duration> | default = 1m ] #采集被监控端的周期,默认为1分钟采集1次

  # How long until a scrape request times out.
  [ scrape_timeout: <duration> | default = 10s ] #采集的超时时间,默认为10秒

  # How frequently to evaluate rules.
  [ evaluation_interval: <duration> | default = 1m ] #报警评估周期,默认1分钟

  # The labels to add to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    [ <labelname>: <labelvalue> ... ] #外部标签

rule_files:
  [ - <filepath_glob> ... ] #监控告警的规则
scrape_configs:
  [ - <scrape_config> ... ] #配置被监控的指标
alert_relabel_configs:
    [ - <relabel_config> ... ] #报警配置
  alertmanagers:
    [ - <alertmanager_config> ... ] #配置alertmanager告警组件的连接地址
remote_write:
  [ - <remote_write> ... ] #指定远程存储的写操作
remote_read:
  [ - <remote_read> ... ] #指定远程存储的读操作

配置采集目标scrape_configs

配置文件说明:
在一个prometheus中可以起多个job
scrape_configs默认继承prometheus的全局配置
如果在scrape_config中修改了全局配置,会覆盖全局配置,应用到当前job中
http://192.168.1.155:9090/metrics 为默认的接口地址

全局配置

job_name: <job_name> #job的名称
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ] #默认的scrape_configs的全局配置
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ] #默认的scrape_configs的全局配置
[ metrics_path: <path> | default = /metrics ] #默认的接口地址
[ honor_labels: <boolean> | default = false ] #默认的标签,默认为不覆盖

采集目标所用的方式

[ scheme: <scheme> | default = http ] #采集目标所用的方式,是采用http方式还是https方式
params:
  [ <string>: [<string>, ...] ] #访问http时,是否携带参数
basic_auth: #基础监控,访问被监控端,用不用登录后才能采集到数据,
  [ username: <string> ]
  [ password: <secret> ]
  [ password_file: <string> ]
[ bearer_token: <secret> ] #指定验证的token
[ bearer_token_file: /path/to/bearer/token/file ]
tls_config: #指定CA证书
  [ <tls_config> ]
[ proxy_url: <string> ] #指定代理的形式

服务发现模块,动态配置被监控的目标

consul_sd_configs:
  [ - <consul_sd_config> ... ]
dns_sd_configs:
  [ - <dns_sd_config> ... ]
ec2_sd_configs:
  [ - <ec2_sd_config> ... ]
openstack_sd_configs:
  [ - <openstack_sd_config> ... ]
file_sd_configs:
  [ - <file_sd_config> ... ]
gce_sd_configs:
  [ - <gce_sd_config> ... ]
kubernetes_sd_configs:
  [ - <kubernetes_sd_config> ... ]

标签配置

static_configs:
  [ - <static_config> ... ] #静态配置被监控端
relabel_configs:
  [ - <relabel_config> ... ] #在数据采集之前,对标签进行重新标记,如:对标签进行命名等
metric_relabel_configs:
  [ - <relabel_config> ... ] #在数据采集之后,对标签进行重新标记,如:对标签进行命名等

[ sample_limit: <int> | default = 0 ] #采集样本的数量,如果超过采集样本的限制,就认为失败

重打标签

对于prometheus里面的数据模型,最关键的就是指标名称和一组标签来组成一个多维度的数据模型,如果需要进行复杂的查询,就要依靠很多维度,才能完成更细分的,如果维度越小,查询的力度就越大,relabel主要对标签进行处理。

relabel_configs-允许在采集之前对任何目标(被监控端)及其标签进行修改

  • 重打标签的意义
    • 重命名标签名
    • 删除标签
    • 过滤目标
relabel_configs
[ source_labels: '[' <labelname> [, ...] ']' ] #原标签
[ separator: <string> | default = ; ] #多个原标签连接时的分隔符,默认为;号
[ target_label: <labelname> ] #重新标记的标签名
[ regex: <regex> | default = (.*) ] #正则表达式来匹配原标签的值,默认为所有()
[ modulus: <uint64> ]
[ replacement: <string> | default = $1 ] #替换正则表达式匹配到的分组,分组引用$1,$2,$3......
[ action: <relabel_action> | default = replace ] #基于正则表达式匹配执行的动作,对上面匹配的标签要做什么动作

添加额外的标签

例如:根据标签聚合机房CPU总使用率
process_cpu_seconds_total{instance="localhost:9090",job="prometheus"}
// process_cpu_seconds_total-度量名称
// instance="localhost-默认附加的标签
// job="prometheus-默认附加的标签

# 添加额外的标签
vim /usr/local/prometheus/prometheus.yml
static_configs:
    - targets: ['localhost:9090']
      labels:
        idc: bj

# 检查prometheus的配置文件
/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml

# 重新加载prometheus的配置文件(热更新)
ps -ef|grep prome
root    24513    1  0 07:18 ?    00:00:01 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus --web.external-url=http://0.0.0.0:9090 --web.max-connections=512

kill -hup 24513

# 验证标签是否加上
http://192.168.1.155:9090/graph

# 执行任一promQL
prometheus_config_last_reload_successful{idc="bj",instance="localhost:9090",job="prometheus"}

# 查看bj机房的所有CPU使用率总和
http://192.168.1.155:9090/graph

# 执行promQL
sum(process_cpu_seconds_total{idc="bj"})

《Prometheus配置文件》

《Prometheus配置文件》

重命名标签

更改job_name名称
vim /usr/local/prometheus/prometheus.yml
static_configs:
    - targets: ['localhost:9090']
      labels: 删除此行
        idc: bj 删除此行
更改 job_name
将
- job_name: 'prometheus'
更改为
- job_name: 'bj'

检查prometheus的配置文件
/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml

查看prometheus进程PID
ps -ef |grep prometheus|grep -v grep
root        643      1  0 15:04 ?        00:00:09 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml

重新加载prometheus的配置文件
kill -hup 643

验证标签是否更改
http://192.168.21.37:9090/graph
执行任一promQL
process_cpu_seconds_total{idc="bj",instance="localhost:9090",job="prometheus"}
process_cpu_seconds_total{instance="localhost:9090",job="bj"}
------------------------------------------------------------------------------------------------------------------------------
更改job名称
vim /usr/local/prometheus/prometheus.yml
static_configs:
    - targets: ['localhost:9090']
    relabel_configs:
    - action: replace
      source_labels: ['job']
      regex: (.*)
      replacement: $1
      target_label: idc

检查prometheus的配置文件
/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml

查看prometheus进程PID
ps -ef |grep prometheus|grep -v grep
root        24513      1  0 15:04 ?        00:00:09 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml

重新加载prometheus的配置文件
kill -hup 24513

验证标签是否更改
http://192.168.1.155:9090/graph
执行任一promQL
process_cpu_seconds_total{idc="bj",instance="localhost:9090",job="bj"}

再次进行bj的IDC cpu使用率求和
sum(process_cpu_seconds_total{idc="bj"})

根据标签过滤目标

  • action 重新标签动作:
    • replace-默认,通过regex匹配source_label的值,使用replacement来引用表达式匹配的分组
    • keep-删除regex与连接不匹配的目标source_labels,保留匹配的目标
    • drop-删除regex与连接匹配的目标source_labels,保留不匹配的目标
    • labeldrop-删除regex匹配的标签
    • labelkeep-删除regex不匹配的标签
    • hashmod-设置target_label为modulus连接的哈希值source_labels
    • labelmap-匹配regex所有标签名称,然后复制匹配标签的值进行分组,replacement分组引用(${1},${2},……)替代
# drop
drop 选择采集目标,被监控端,例如:当前很多服务器已纳入监控,想采集其中的某几台或不采集其中的某几台

选择采集目标
vim /usr/local/prometheus/prometheus.yml
修改action
relabel_configs:
    - action: replace
      source_labels: ['job']
      regex: (.*)
      replacement: $1
      target_label: idc
    - action: drop #添加此行
      source_labels: ['job'] #添加此行

keep

选择采集目标
vim /usr/local/prometheus/prometheus.yml
修改action
relabel_configs:
    - action: replace
      source_labels: ['job']
      regex: (.*)
      replacement: $1
      target_label: idc
    - action: keep #修改此行为keep
      source_labels: ['job']      

删除标签

labeldrop-删除当前实例中不想要的标签
labelkeep-保留标签

vim /usr/local/prometheus/prometheus.yml
scrape_configs:
  - job_name: 'bj'

    static_configs:
    - targets: ['localhost:9090']
    relabel_configs:
    - action: replace
      source_labels: ['job']
      regex: (.*)
      replacement: $1
      target_label: idc
    - action: keep
      source_labels: ['job']
    - action: labeldrop
      regex: job

基于文件的服务发现

// 将prometheus本身存在的默认本机监控删除

# 更改配置文件,恢复成默认配置文件内容
vim /usr/local/prometheus/prometheus.yml
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    #static_configs: #注销此行
    #- targets: ['localhost:9090'] #注销此行

# 重新加载prometheus配置文件
kill -hup 644
// 添加基于文件的服务自发现功能

# 创建用于存放文件自发现的文件夹
mkdir -p /usr/local/prometheus/sd_config

# 修改prometheus配置文件
vim /usr/local/prometheus/prometheus.yml
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.


    file_sd_configs:
      - files: ['/usr/local/prometheus/sd_config/*.yml']
        refresh_interval: 3s
    #static_configs:
    #- targets: ['localhost:9090']

# 编辑test.yml文件
vim /usr/local/prometheus/sd_config/test.yml
- targets: ['localhost:9090']
  labels:
    idc: bj

# 检查prometheus配置文件是否正确
/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml

# 重新加载prometheus配置文件
kill -hup 644
点赞

发表评论

电子邮件地址不会被公开。

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据