2023年12月6日发(作者:)

prometheus配置详解

本文按照官方文档的相关内容整理整理的配置语法以及实现功能

heus 配置文件主体

# 此片段指定的是prometheus的全局配置, 比如采集间隔,抓取超时时间等.

global:

# 抓取间隔

[ scrape_interval: | default = 1m ]

# 抓取超时时间

[ scrape_timeout: | default = 10s ]

# 评估规则间隔

[ evaluation_interval: | default = 1m ]

# 外部一些标签设置

external_labels:

[ : ... ]

# File to which PromQL queries are logged.

# Reloading the configuration will reopen the file.

[ query_log_file: ]

# 此片段指定报警规则文件, prometheus根据这些规则信息,会推送报警信息到alertmanager中。

rule_files:

[ - ... ]

# 此片段指定抓取配置,prometheus的数据采集通过此片段配置。

scrape_configs:

[ - ... ]

# 此片段指定报警配置, 这里主要是指定prometheus将报警规则推送到指定的alertmanager实例地址。

alerting:

alert_relabel_configs:

[ - ... ]

alertmanagers:

[ - ... ]

# 指定后端的存储的写入api地址。

remote_write:

[ - ... ]

# 指定后端的存储的读取api地址。

remote_read:

[ - ... ]

_configs配置详解

一个scrape_config 片段指定一组目标和参数, 目标就是实例,指定采集的端点, 参数描述如何采集这些实例, 配置文件格式如下

# The job name assigned to scraped metrics by default.

job_name:

# 抓取间隔,默认继承global值。

[ scrape_interval: | default = ]

# 抓取超时时间,默认继承global值。

[ scrape_timeout: | default = ]

# 抓取路径, 默认是/metrics

[ metrics_path: | default = /metrics ]

# honor_labels controls how Prometheus handles conflicts between labels that are

# already present in scraped data and labels that Prometheus would attach

# already present in scraped data and labels that Prometheus would attach

# server-side ("job" and "instance" labels, manually configured target

# labels, and labels generated by service discovery implementations).

#

# If honor_labels is set to "true", label conflicts are resolved by keeping label

# values from the scraped data and ignoring the conflicting server-side labels.

#

# If honor_labels is set to "false", label conflicts are resolved by renaming

# conflicting labels in the scraped data to "exported_" (for

# example "exported_instance", "exported_job") and then attaching server-side

# labels.

#

# Setting honor_labels to "true" is useful for use cases such as federation and

# scraping the Pushgateway, where all labels specified in the target should be

# preserved.

#

# Note that any globally configured "external_labels" are unaffected by this

# setting. In communication with external systems, they are always applied only

# when a time series does not have a given label yet and are ignored otherwise.

[ honor_labels: | default = false ]

# honor_timestamps controls whether Prometheus respects the timestamps present

# in scraped data.

#

# If honor_timestamps is set to "true", the timestamps of the metrics exposed

# by the target will be used.

#

# If honor_timestamps is set to "false", the timestamps of the metrics exposed

# by the target will be ignored.

[ honor_timestamps: | default = true ]

# 指定采集使用的协议,http或者https。

[ scheme: | default = http ]

# 指定url参数。

params:

[ : [, ...] ]

# 指定认证信息。

basic_auth:

[ username: ]

[ password: ]

[ password_file: ]

# 指定token的数值, 用户get metrics认证使用

[ bearer_token: ]

# 指定获取token的文件, 用户get metrics认证使用

[ bearer_token_file: /path/to/bearer/token/file ]

# 指定获取metrics时需要的tls证书

tls_config:

[ ]

# Optional proxy URL.

[ proxy_url: ]

# List of Azure service discovery configurations.

azure_sd_configs:

[ - ... ]

# List of Consul service discovery configurations.

consul_sd_configs:

[ - ... ]

# List of DNS service discovery configurations.

dns_sd_configs:

[ - ... ]

# List of EC2 service discovery configurations.

ec2_sd_configs:

[ - ... ]

# List of OpenStack service discovery configurations.

openstack_sd_configs:

[ - ... ]

# List of file service discovery configurations.

file_sd_configs:

[ - ... ]

# List of GCE service discovery configurations.

gce_sd_configs:

[ - ... ]

# List of Kubernetes service discovery configurations.

kubernetes_sd_configs:

[ - ... ]

# List of Marathon service discovery configurations.

marathon_sd_configs:

[ - ... ]

# List of AirBnB's Nerve service discovery configurations.

nerve_sd_configs:

[ - ... ]

# List of Zookeeper Serverset service discovery configurations.

serverset_sd_configs:

[ - ... ]

# List of Triton service discovery configurations.

triton_sd_configs:

[ - ... ]

# 静态指定服务job

static_configs:

[ - ... ]

# 控制采集哪些数据标签,可以删除不必要的标签

relabel_configs:

[ - ... ]

# 添加、编辑或修改指标的标签值或标签格式。

metric_relabel_configs:

[ - ... ]

# Per-scrape limit on number of scraped samples that will be accepted.

# If more than this number of samples are present after metric relabelling

# the entire scrape will be treated as failed. 0 means no limit.

[ sample_limit: | default = 0 ]

因为部署在kubernetes环境中所以我只在意基于kubernetes_sd_configs的服务发现和static_configs静态文件的发现

2.1 relabel_configs

relable_configss是功能强大的工具,就是Relabel可以在Prometheus采集数据之前,通过Target实例的Metadata信息,动态重新写入Label

的值。除此之外,我们还能根据Target实例的Metadata信息选择是否采集或者忽略该Target实例。

relabel_configs

配置格式如下:

# The source labels select values from existing labels. Their content is concatenated

# using the configured separator and matched against the configured regular expression

# for the replace, keep, and drop actions.

[ source_labels: '[' [, ...] ']' ]

# 默认分隔符

[ separator: | default = ; ]

# Label to which the resulting value is written in a replace action.

# It is mandatory for replace actions. Regex capture groups are available.

[ target_label: ]

# Regular expression against which the extracted value is matched.

[ regex: | default = (.*) ]

# Modulus to take of the hash of the source label values.

[ modulus: ]

# Replacement value against which a regex replace is performed if the

# regular expression matches. Regex capture groups are available.

[ replacement: | default = $1 ]

# Action to perform based on regex matching.

[ action: | default = replace ]

其中action主要包括:

replace

keep

drop

hashmod

labelmap

labeldrop

labelkeep

replace:默认,通过regex匹配source_label的值,使用replacement来引用表达式匹配的分组

keep:删除regex与连接不匹配的目标 source_labels

drop:删除regex与连接匹配的目标 source_labels

labeldrop:删除regex匹配的标签

labelkeep:删除regex不匹配的标签

hashmod:设置target_label为modulus连接的哈希值source_labels

labelmap:匹配regex所有标签名称。然后复制匹配标签的值进行分组,replacement分组引用({2},…)替代

prometheus中的数值都是key:value格式, 其中replace、keep、drop都是对value的操作, labelmap、labeldrop、labelkeep都是对key的

操作

replace用法

replace是action的默认值, 通过regex匹配source_label的值,使用replacement来引用表达式匹配的分组

- action: replace

regex: ([^:]+)(?::d+)?;(d+)

replacement: $1:$2

source_labels:

- __address__

- __meta_kubernetes_service_annotation_prometheus_io_port

target_label: __address__

上面的列子中address的值为

$1:$2

, 其中

$1

是正则表达式

([^:]+)(?::d+)?

从address中获取,

$2

是正则表达式

(d+)从(d+)

中获取, 最后的

address的数值为192.168.1.1:9100

keep用法

relabel_configs:

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]

action: keep

regex: true

上面的例子只要匹配__meta_kubernetes_service_annotation_prometheus_io_probe=true数据就保留, 反正source_labels中的值没有匹

配regex中的值就丢弃

drop用法

drop 的使用和keep刚好相反, 还是使用keep的例子:

relabel_configs:

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]

action: keep

regex: true

上面的例子只要__meta_kubernetes_service_annotation_prometheus_io_probe这个标签的值为true就丢弃, 反之如果

__meta_kubernetes_service_annotation_prometheus_io_probe!=true的数据就保留

labelmap用法

labelmap的用法和上面说到replace、keep、drop不同,

labelmap匹配的是标签名称, 而replace、keep、drop匹配的是value

relabel_configs:

- action: labelmap

regex: __meta_kubernetes_service_label_(.+)

上面例子中只要匹配到正则表达式

__meta_kubernetes_service_label_(.+)

的标签, 就将标签重写为

(.+)

中的内容, 效果如下:

原标签: __meta_kubernetes_service_label_test=111

重写后: test=111

hashmshmood用法

待续

2.1.6 labeldrop用法

使用labeldrop则可以对Target标签进行过滤,删除符合过滤条件的标签,例如:

relabel_configs:

- action: labeldrop

regex: __meta_kubernetes_service_label_(.+)

该配置会使用regex匹配当前target中的所有标签, 删除符合规则的标签, 反之保留不符合规则的

labelkeep用法

使用labelkeep则可以对Target标签进行过滤,仅保留符合过滤条件的标签,例如:

relabel_configs:

- action: labelkeep

regex: __meta_kubernetes_service_label_(.+)

该配置会使用regex匹配当前target中的所有标签, 保留符合规则的标签, 反之不符合的移除

2.2 metric_relabel_configs

上面我们说到relabel_config是获取metrics之前对标签的重写, 对应的metric_relabel_configs是对获取metrics之后对标签的操作,

metric_relabel_configs能够确定我们保存哪些指标,删除哪些指标,以及这些指标将是什么样子。

metric_relabel_configs的配置和relabel_config的配置基本相同, 如果需要配置相关参数请参考_configs

2.2 static_configs

主要用途为指定exporter获取metrics数据的目标, 可以指定prometheus、 mysql、 nginx等目标

scrape_configs:

- job_name: prometheus

static_configs:

- targets:

- localhost:9090

此规则主要是用于抓取prometheus自己数据的配置, targets列表中的为prometheus 获取metrics的地址和端口, 因为没有指定

metrics_path所以使用默认的/metrics中获取数据,

还可以配置指定exporter中的目的地址, 如获取node_exporter的数据

scrape_configs:

- job_name: node

static_configs:

- targets:

- 10.40.58.153:9100

- 10.40.61.116:9100

- 10.40.58.154:9100

2.3 kubernetes_sd_configs

kubernetes的服务发现可以刮取以下几种数据

node

service

pod

endpoints

ingress

通过指定kubernetes_sd_config的模式为endpoints,Prometheus会自动从Kubernetes中发现到所有的endpoints节点并作为当前Job监控

的Target实例。如下所示,

kubernetes_sd_configs:

- role: endpoints

配置实例一

该配置是使用kubernetes的发现机制发现kube-apiservers

scrape_configs:

- bearer_token_file: /var/run/secrets//serviceaccount/token

job_name: kubernetes-apiservers

kubernetes_sd_configs:

- role: endpoints

relabel_configs:

- action: keep

regex: default;kubernetes;https

source_labels:

- __meta_kubernetes_namespace

- __meta_kubernetes_service_name

- __meta_kubernetes_endpoint_port_name

scheme: https

tls_config:

ca_file: /var/run/secrets//serviceaccount/

insecure_skip_verify: true

上面的刮取配置定义了如下信息:

job名称为kubernetes-apiservers(job-name: kubernetes-apiservers)

获取kubernetes中endpoints的相关信息(role: endpoints)

使用https的方式获取信息(scheme: https)

target的需要满足default名称空间下service名字为kubernetes,并且端口为https

__meta_kubernetes_namespace=~default

__meta_kubernetes_service_name=~kubernetes

__meta_kubernetes_endpoint_port_name=~=https

配置实例二

该配置是自动发现kubernetes中的endpoints

- job_name: 'kubernetes-service-endpoints'

kubernetes_sd_configs:

- role: endpoints

relabel_configs:

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

action: keep

regex: true

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]

action: replace

target_label: __scheme__

regex: (https?)

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]

action: replace

target_label: __metrics_path__

regex: (.+)

- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]

action: replace

target_label: __address__

regex: ([^:]+)(?::d+)?;(d+)

replacement: $1:$2

- action: labelmap

regex: __meta_kubernetes_service_label_(.+)

- source_labels: [__meta_kubernetes_namespace]

action: replace

target_label: kubernetes_namespace

- source_labels: [__meta_kubernetes_service_name]

action: replace

target_label: kubernetes_name

- source_labels: [__meta_kubernetes_pod_node_name]

action: replace

target_label: kubernetes_node

可以看到relable_configs中的规则很多, 具体的内容如下

job名称为kubernetes-service-endpoints(job-name: kubernetes-service-endpoints)

获取kubernetes中endpoints的相关信息(role: endpoints)

使用http的方式获取信息(没有配置使用默认配置http)

relabel配置部分:

annotations中必须存在

/scrape: "true"

配置才会被promethues发现

__scheme__

的值为__meta_kubernetes_service_annotation_prometheus_io_scheme的value, 需要满足正则表达式

(https?)

__metrics_path__

的值为__meta_kubernetes_service_annotation_prometheus_io_path的value, 满足正则表达式

(.+)

__address__

的value替换为IP:port的方式

kubernetes_namespace的value replace为__meta_kubernetes_namespace的value

kubernetes_name的value replace为__meta_kubernetes_service_name的value

kubernetes_node的value replace为__meta_kubernetes_pod_node_name的value

获取的metrics的信息如下:up{app="prometheus",app_kubernetes_io_managed_by="Helm",chart="prometheus-11.3.0",component="node-exporter",heritage="Helm",instance="10.40.61.116:910