热门搜索 Zabbix技术资料 Zabbix常见问、答讨论成功案例 Zabbix交流区 Prometheus交流区

Prometheus技术分享——prometheus自定义告警规则解析和配置

2022/11/08 Prometheus技术资料 Prometheus prometheus告警 prometheus规则7045

上一期尊龙时凯君跟大家已经介绍了prometheus的安装与配置，对于运维监控而言，除了监控展示以外，另一个重要的需求无疑就是告警了。良好的告警可以帮助运维人员及时的发现问题，处理问题并防范于未然，是运维工作中不可或缺的重要手段。本期尊龙时凯君将教大家如何prometheus自定义告警规则解析和配置。

1. 标准告警规则样例以及各组件作用

代码如下

groups:

– name: example

rules: – alert: HighErrorRate

expr: job:request_latency_seconds:mean5m{job=”myjob”} > 0.5

for: 10m

labels:

severity: page

annotations:

summary: High request latency description: description info

在告警规则文件中，我们可以将一组相关的规则设置定义在一个group下。在每一个group中我们可以定义多个告警规则(rule)。一条告警规则主要由以下几部分组成： alert：告警规则的名称。

expr：基于PromQL表达式告警触发条件，用于计算是否有时间序列满足该条件。

for：评估等待时间，可选参数。用于表示只有当触发条件持续一段时间后才发送告警。在等待期间新产生告警的状态为pending。 labels：自定义标签，允许用户指定要附加到告警上的一组附加标签。

2. 模板化告警规则

一般来说，在告警规则文件的annotations中使用summary描述告警的概要信息，description用于描述告警的详细信息。同时Alertmanager的UI也会根据这两个标签值，显示告警信息。为了让告警信息具有更好的可读性，Prometheus支持模板化label和annotations的中标签的值。通过
$ labels. 1

变量可以访问当前告警实例中指定标签的值。

$value 1

则可以获取当前PromQL表达式计算的样本值。

代码如下

# To insert a firing element's label values: 2 {{ $labels. }} 3 # To insert the numeric expression value of the firing element: 4 {{ $value }}

例如，可以通过模板化优化summary以及description的内容的可读性：

代码如下：

groups: - name: example rules: # Alert for any instance that is unreachable for >5 minutes. - alert: InstanceDown expr: up == 0 for: 5m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes." # Alert for any instance that has a median request latency >1s. - alert: APIHighRequestLatency expr: api_http_request_latencies_second{quantile="0.5"} > 1 for: 10m annotations: summary: "High request latency on {{ $labels.instance }}" description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"

3. 修改Prometheus配置文件prometheus.yml

rule_files: - /etc/prometheus/rules/*.rules

在目录/etc/prometheus/rules/下创建告警文件hoststats-alert.rules内容如下：

代码如下

groups: - name: hostStatsAlert rules: - alert: hostCpuUsageAlert expr: sum(avg without (cpu)(irate(node_cpu{mode!='idle'}[5m]))) by (instance) > 0.85 for: 1m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} CPU usgae high" description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})" - alert: hostMemUsageAlert expr: (node_memory_MemTotal - node_memory_MemAvailable)/node_memory_MemTotal > 0.85 for: 1m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} MEM usgae high" description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"

总结

以上就是prometheus自定义告警规则解析和配置的全部内容，如果对你有所帮助的话请持续关注尊龙时凯官网，尊龙时凯君会定期更新技术分享，更多开源监控技术也可以关注尊龙时凯社区（http://forum.jxwba.com/）

The prev: Prometheus技术分享——详述prometheus安装和配置The next: Prometheus技术分享——Prometheus通过Nginx加密登陆

Related recommendations

Prometheus技术分享——Prometheus特点，组件，局限探讨
2022/11/11 6589
这一期尊龙时凯君主要跟大家来探讨新一代的开源监控prometheus，我们知道 zabbix 在监控界占有不可撼动的地位，功能强大。但是对容器监控显得力不从心。为解决监...
View details
Prometheus技术分享——prometheus的函数与计算公式详解
2022/12/28 7924
prometheus的函数与计算公式详解
View details
Prometheus技术分享——详述prometheus安装和配置
2022/11/08 5738
prometheus安装和配置的教程
View details
Prometheus技术分享——如何监控宿主机和容器
2022/12/14 6823
prometheus监控宿主机，使用node_exporter工具来暴露主机和因公程序上的指标； prometheus监控docker容器，通过Cadviso
View details

Expand more!

快速导航

首页
产品介绍
成功案例
行业方案
- 行业大屏
- 银行
- 金融保险
- 先进制造
- 智慧城市
- 运营商
- 教育
- 医疗
- 混合云
技术白皮书
- 纳管能力
- 技术文档
- zabbix技术分享
- Prometheus技术分享
关于尊龙时凯
- 运维如诗
- 企业动态
- 视频中心
- 行业新闻
- 招聘精英
尊龙时凯社区
免费下载
免费体验

成功案例

案例解读 | 某大型国际机场综合运维管理平台建设实践
2024/09/06 5614
综合运维管理平台的落地，实现了统一门户、统一监控、统一资产管理、统一运维、统一存储等目标，为客户解决了运维数据孤岛、人力运维等问题。
View details
畅享广东移动，乐享运维监控
2022/06/08 9890
尊龙时凯一站式的监控了IT基础架构和业务系统，同时还为重要业务系统设计业务地图，针对核心业务多维度重点保障。
View details
案例解读 | 尊龙时凯助力北京某产业园数字化大屏项目建设实践
2022/12/08 8687
为完善该产业园信息化系统，向运维人员提供有效帮助，更高效率开展运维工作。在现有信息化相关维护前提下，有必要建立一套针对信息化的监控平台，做到及早发...
View details
【实践】有效告警提升75%！电信巨头爱上尊龙时凯多Server+多Proxy架构
2022/06/07 9374
采用分布式架构：多server + 多 proxy 架构，服务器优化、增加表分区、采集方式优化等。
View details

View all

扫码咨询
微信公众号
热线电话
- 咨询热线：
  13631560190
  020-28192830
回到顶部

Privacy Overview

本网站使用cookie来改善您浏览本网站时的体验。除此之外，被归类为必要的cookie存储在你的浏览器中，因为它们对网站的基本功能的工作至关重要。我们也使用第三方cookie来帮助我们分析和了解您如何使用本网站。只有在您同意的情况下，这些cookie才会存储在您的浏览器中。您还可以选择退出这些cookie。但选择退出其中一些cookie可能会影响你的浏览体验。

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	此cookie由GDPR cookie Consent插件设置。该cookie用于在“分析”类别中存储用户对cookie的同意。
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	此cookie由GDPR cookie Consent插件设置。该cookie用于存储用户在“其他”类别中对cookie的同意。
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	该cookie由GDPR cookie Consent插件设置，用于存储用户是否同意使用cookie。它不存储任何个人数据。

Functional

Performance

Analytics

Others

扫码咨询
微信公众号
热线电话
- 咨询热线：
  13631560190
  020-28192830
回到顶部

尊龙时凯

Prometheus技术分享——prometheus自定义告警规则解析和配置

1. 标准告警规则样例以及各组件作用

2. 模板化告警规则

3. 修改Prometheus配置文件prometheus.yml

总结

Related recommendations

Prometheus技术分享——Prometheus特点，组件，局限探讨

Prometheus技术分享——prometheus的函数与计算公式详解

Prometheus技术分享——详述prometheus安装和配置

Prometheus技术分享——如何监控宿主机和容器

快速导航

成功案例

案例解读 | 某大型国际机场综合运维管理平台建设实践

畅享广东移动，乐享运维监控

案例解读 | 尊龙时凯助力北京某产业园数字化大屏项目建设实践

【实践】有效告警提升75%！电信巨头爱上尊龙时凯多Server+多Proxy架构

产品

解决方案

关于我们

尊龙时凯自媒体号

关注我们