全局配置文件
官方文档: https://prometheus.io/docs/prometheus/latest/configuration/configuration/
全局配置选项:
global:
# How frequently to scrape targets by default.
[ scrape_interval: <duration> | default = 1m ] #采集被监控端的周期,默认为1分钟采集1次
# How long until a scrape request times out.
[ scrape_timeout: <duration> | default = 10s ] #采集的超时时间,默认为10秒
# How frequently to evaluate rules.
[ evaluation_interval: <duration> | default = 1m ] #报警评估周期,默认1分钟
# The labels to add to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
[ <labelname>: <labelvalue> ... ] #外部标签
rule_files:
[ - <filepath_glob> ... ] #监控告警的规则
scrape_configs:
[ - <scrape_config> ... ] #配置被监控的指标
alert_relabel_configs:
[ - <relabel_config> ... ] #报警配置
alertmanagers:
[ - <alertmanager_config> ... ] #配置alertmanager告警组件的连接地址
remote_write:
[ - <remote_write> ... ] #指定远程存储的写操作
remote_read:
[ - <remote_read> ... ] #指定远程存储的读操作
配置采集目标scrape_configs
配置文件说明:
在一个prometheus中可以起多个job
scrape_configs默认继承prometheus的全局配置
如果在scrape_config中修改了全局配置,会覆盖全局配置,应用到当前job中
http://192.168.1.155:9090/metrics 为默认的接口地址
全局配置
job_name: <job_name> #job的名称
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ] #默认的scrape_configs的全局配置
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ] #默认的scrape_configs的全局配置
[ metrics_path: <path> | default = /metrics ] #默认的接口地址
[ honor_labels: <boolean> | default = false ] #默认的标签,默认为不覆盖
采集目标所用的方式
[ scheme: <scheme> | default = http ] #采集目标所用的方式,是采用http方式还是https方式
params:
[ <string>: [<string>, ...] ] #访问http时,是否携带参数
basic_auth: #基础监控,访问被监控端,用不用登录后才能采集到数据,
[ username: <string> ]
[ password: <secret> ]
[ password_file: <string> ]
[ bearer_token: <secret> ] #指定验证的token
[ bearer_token_file: /path/to/bearer/token/file ]
tls_config: #指定CA证书
[ <tls_config> ]
[ proxy_url: <string> ] #指定代理的形式
服务发现模块,动态配置被监控的目标
consul_sd_configs:
[ - <consul_sd_config> ... ]
dns_sd_configs:
[ - <dns_sd_config> ... ]
ec2_sd_configs:
[ - <ec2_sd_config> ... ]
openstack_sd_configs:
[ - <openstack_sd_config> ... ]
file_sd_configs:
[ - <file_sd_config> ... ]
gce_sd_configs:
[ - <gce_sd_config> ... ]
kubernetes_sd_configs:
[ - <kubernetes_sd_config> ... ]
标签配置
static_configs:
[ - <static_config> ... ] #静态配置被监控端
relabel_configs:
[ - <relabel_config> ... ] #在数据采集之前,对标签进行重新标记,如:对标签进行命名等
metric_relabel_configs:
[ - <relabel_config> ... ] #在数据采集之后,对标签进行重新标记,如:对标签进行命名等
[ sample_limit: <int> | default = 0 ] #采集样本的数量,如果超过采集样本的限制,就认为失败
重打标签
对于prometheus里面的数据模型,最关键的就是指标名称和一组标签来组成一个多维度的数据模型,如果需要进行复杂的查询,就要依靠很多维度,才能完成更细分的,如果维度越小,查询的力度就越大,relabel主要对标签进行处理。
relabel_configs-允许在采集之前对任何目标(被监控端)及其标签进行修改
- 重打标签的意义
- 重命名标签名
- 删除标签
- 过滤目标
relabel_configs
[ source_labels: '[' <labelname> [, ...] ']' ] #原标签
[ separator: <string> | default = ; ] #多个原标签连接时的分隔符,默认为;号
[ target_label: <labelname> ] #重新标记的标签名
[ regex: <regex> | default = (.*) ] #正则表达式来匹配原标签的值,默认为所有()
[ modulus: <uint64> ]
[ replacement: <string> | default = $1 ] #替换正则表达式匹配到的分组,分组引用$1,$2,$3......
[ action: <relabel_action> | default = replace ] #基于正则表达式匹配执行的动作,对上面匹配的标签要做什么动作
添加额外的标签
例如:根据标签聚合机房CPU总使用率
process_cpu_seconds_total{instance="localhost:9090",job="prometheus"}
// process_cpu_seconds_total-度量名称
// instance="localhost-默认附加的标签
// job="prometheus-默认附加的标签
# 添加额外的标签
vim /usr/local/prometheus/prometheus.yml
static_configs:
- targets: ['localhost:9090']
labels:
idc: bj
# 检查prometheus的配置文件
/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml
# 重新加载prometheus的配置文件(热更新)
ps -ef|grep prome
root 24513 1 0 07:18 ? 00:00:01 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus --web.external-url=http://0.0.0.0:9090 --web.max-connections=512
kill -hup 24513
# 验证标签是否加上
http://192.168.1.155:9090/graph
# 执行任一promQL
prometheus_config_last_reload_successful{idc="bj",instance="localhost:9090",job="prometheus"}
# 查看bj机房的所有CPU使用率总和
http://192.168.1.155:9090/graph
# 执行promQL
sum(process_cpu_seconds_total{idc="bj"})
重命名标签
更改job_name名称
vim /usr/local/prometheus/prometheus.yml
static_configs:
- targets: ['localhost:9090']
labels: 删除此行
idc: bj 删除此行
更改 job_name
将
- job_name: 'prometheus'
更改为
- job_name: 'bj'
检查prometheus的配置文件
/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml
查看prometheus进程PID
ps -ef |grep prometheus|grep -v grep
root 643 1 0 15:04 ? 00:00:09 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml
重新加载prometheus的配置文件
kill -hup 643
验证标签是否更改
http://192.168.21.37:9090/graph
执行任一promQL
process_cpu_seconds_total{idc="bj",instance="localhost:9090",job="prometheus"}
process_cpu_seconds_total{instance="localhost:9090",job="bj"}
------------------------------------------------------------------------------------------------------------------------------
更改job名称
vim /usr/local/prometheus/prometheus.yml
static_configs:
- targets: ['localhost:9090']
relabel_configs:
- action: replace
source_labels: ['job']
regex: (.*)
replacement: $1
target_label: idc
检查prometheus的配置文件
/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml
查看prometheus进程PID
ps -ef |grep prometheus|grep -v grep
root 24513 1 0 15:04 ? 00:00:09 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml
重新加载prometheus的配置文件
kill -hup 24513
验证标签是否更改
http://192.168.1.155:9090/graph
执行任一promQL
process_cpu_seconds_total{idc="bj",instance="localhost:9090",job="bj"}
再次进行bj的IDC cpu使用率求和
sum(process_cpu_seconds_total{idc="bj"})
根据标签过滤目标
- action 重新标签动作:
- replace-默认,通过regex匹配source_label的值,使用replacement来引用表达式匹配的分组
- keep-删除regex与连接不匹配的目标source_labels,保留匹配的目标
- drop-删除regex与连接匹配的目标source_labels,保留不匹配的目标
- labeldrop-删除regex匹配的标签
- labelkeep-删除regex不匹配的标签
- hashmod-设置target_label为modulus连接的哈希值source_labels
- labelmap-匹配regex所有标签名称,然后复制匹配标签的值进行分组,replacement分组引用(${1},${2},……)替代
# drop
drop 选择采集目标,被监控端,例如:当前很多服务器已纳入监控,想采集其中的某几台或不采集其中的某几台
选择采集目标
vim /usr/local/prometheus/prometheus.yml
修改action
relabel_configs:
- action: replace
source_labels: ['job']
regex: (.*)
replacement: $1
target_label: idc
- action: drop #添加此行
source_labels: ['job'] #添加此行
keep
选择采集目标
vim /usr/local/prometheus/prometheus.yml
修改action
relabel_configs:
- action: replace
source_labels: ['job']
regex: (.*)
replacement: $1
target_label: idc
- action: keep #修改此行为keep
source_labels: ['job']
删除标签
labeldrop-删除当前实例中不想要的标签
labelkeep-保留标签
vim /usr/local/prometheus/prometheus.yml
scrape_configs:
- job_name: 'bj'
static_configs:
- targets: ['localhost:9090']
relabel_configs:
- action: replace
source_labels: ['job']
regex: (.*)
replacement: $1
target_label: idc
- action: keep
source_labels: ['job']
- action: labeldrop
regex: job
基于文件的服务发现
// 将prometheus本身存在的默认本机监控删除
# 更改配置文件,恢复成默认配置文件内容
vim /usr/local/prometheus/prometheus.yml
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
#static_configs: #注销此行
#- targets: ['localhost:9090'] #注销此行
# 重新加载prometheus配置文件
kill -hup 644
// 添加基于文件的服务自发现功能
# 创建用于存放文件自发现的文件夹
mkdir -p /usr/local/prometheus/sd_config
# 修改prometheus配置文件
vim /usr/local/prometheus/prometheus.yml
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
file_sd_configs:
- files: ['/usr/local/prometheus/sd_config/*.yml']
refresh_interval: 3s
#static_configs:
#- targets: ['localhost:9090']
# 编辑test.yml文件
vim /usr/local/prometheus/sd_config/test.yml
- targets: ['localhost:9090']
labels:
idc: bj
# 检查prometheus配置文件是否正确
/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml
# 重新加载prometheus配置文件
kill -hup 644