Prometheus+Grafana+Zabbix+alertmanager 部署

一、准备环境

程序	OS	IP
prometheus& grafana server	centos 7.6	192.168.66.52
prometheus node	centos 7.6	192.168.66.53

二、部署 prometheus

在 prometheus& grafana server 节点部署 prometheus 服务。

1. 下载&部署

下载，官网：https://prometheus.io/


[root@prometheus ~]# mkdir /soft
[root@prometheus ~]# cd /soft/

[root@prometheus soft]# wget https://github.com/prometheus/prometheus/releases/download/v2.12.0/prometheus-2.12.0.linux-amd64.tar.gz

# 部署到/usr/local/目录
# promethus不用编译安装，解压目录中有配置文件与启动文件
[root@prometheus soft]# tar -zxvf prometheus-2.12.0.linux-amd64.tar.gz -C /usr/local/ 

[root@prometheus soft]# cd /usr/local/
[root@prometheus local]# mv prometheus-2.12.0.linux-amd64 prometheus

# 验证
[root@prometheus local]# cd prometheus/
[root@prometheus prometheus]# ./prometheus --version

2. 设置用户


#  添加用户，后期用此账号启动服务
[root@prometheus prometheus]# groupadd prometheus
[root@prometheus prometheus]# useradd -g prometheus -s /sbin/nologin prometheus

# 赋权
[root@prometheus prometheus]# cd ~
[root@prometheus ~]# chown -R prometheus:prometheus /usr/local/prometheus/

# 创建prometheus运行数据目录
[root@prometheus ~]# mkdir -p /var/lib/prometheus
[root@prometheus ~]# chown -R prometheus:prometheus /var/lib/prometheus/

3. 设置开机启动


[root@prometheus ~]# vim /etc/systemd/system/prometheus.service 

[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target

[Service]
# Type设置为notify时，服务会不断重启
Type=simple
User=prometheus
# --storage.tsdb.path是可选项，默认数据目录在运行目录的./dada目录中
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus
Restart=on-failure

[Install]
WantedBy=multi-user.target

设置开机启动


[root@prometheus ~]# systemctl daemon-reload
[root@prometheus ~]# systemctl enable Prometheus
[root@prometheus ~]# systemctl start prometheus

4. 设置 iptables 或 Firewalld


[root@prometheus ~]# vim /etc/sysconfig/iptables
-A INPUT -p tcp -m state --state NEW -m tcp --dport 9090 -j ACCEPT

[root@prometheus ~]# service iptables restart

# centos7默认开启firewalld
查看目前开放的端口
[root@prometheus ~]# firewall-cmd --zone=public --list-ports

开放9090端口
[root@prometheus ~]# firewall-cmd --permanent --zone=public --add-port=9090/tcp	#永久添加

使其规则生效
[root@prometheus ~]# firewall-cmd --reload

5. 启动并验证

1）查看服务状态


[root@prometheus ~]# systemctl status prometheus 
[root@prometheus ~]# ss -tunlp | grep 9090

2）web ui

Prometheus 自带有简单的 UI，http://192.168.66.52:9090

在 Status 菜单下，Configuration，Rule，Targets 等，
Statu-->Configuration 展示 prometheus.yml 的配置，如下：

Statu-->Targets 展示监控具体的监控目标，这里监控目标"linux"暂未设置 node_exporter，未 scrape 数据，如下：

7. 绘图

访问：http://192.168.66.52:9090/metrics，查看从 exporter 具体能抓到的数据，如下：
访问：http://192.168.66.52:9090，在输入框中任意输入 1 个 exporter 能抓取得值，点击"Execute"与"Execute"按钮，即可见相应抓取数据的图形，同时可对时间与 unit 做调整，如下：

三、部署 node_exporter

Node_exporter 收集机器的系统数据，这里采用 prometheus 官方提供的 exporter，除 node_exporter 外，官方还提供 consul，memcached，haproxy，mysqld 等 exporter，具体可查看官网。
这里在 prometheus node 节点部署相关服务。

1. 下载&部署


# 下载
[root@node1 ~]# cd /soft
[root@node1 soft]# wget https://github.com/prometheus/node_exporter/releases/download/v0.15.1/node_exporter-0.18.1.linux-amd64.tar.gz

# 部署
[root@node1 soft]# tar -zxvf node_exporter-0.18.1.linux-amd64.tar.gz -C /usr/local/
[root@node1 soft]# cd /usr/local
[root@node1 local]# mv node_exporter-0.18.1.linux-amd64 node_exporter

2. 设置用户


[root@node1 ~]# groupadd prometheus
[root@node1 ~]# useradd -g prometheus -s /sbin/nologin prometheus
[root@node1 ~]# chown -R prometheus:prometheus /usr/local/node_exporter/

3. 设置开机启动


[root@node1 ~]# vim /etc/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target


[root@node1 ~]# systemctl daemon-reload
[root@node1 ~]# systemctl enable node_exporter
[root@node1 ~]# systemctl start node_exporter

4. 设置 iptables 或 Firewalld


# 官方node_exporter默认使用9100端口
[root@node1 ~]# vim /etc/sysconfig/iptables
-A INPUT -p tcp -m state --state NEW -m tcp --dport 9100 -j ACCEPT

[root@node1 ~]# service iptables restart

# centos7默认开启firewalld
查看目前开放的端口
[root@node1 ~]# firewall-cmd --zone=public --list-ports

开放9100端口
[root@node1 ~]# firewall-cmd --permanent --zone=public --add-port=9100/tcp	#永久添加

使其规则生效
[root@node1 ~]# firewall-cmd --reload

再次查看端口
[root@node1 ~]# firewall-cmd --zone=public --list-ports

#检查服务状态
[root@node1 ~]# systemctl status node_exporter
[root@node1 ~]#  ss -tnlp | grep 9100

5. Prometheus 添加该主机


[root@prometheus ~]# vim /usr/local/prometheus/prometheus.yml
# 简单验证，主要配置采用默认文件配置，有修改/新增处用红色标示
[root@prometheus prometheus]# vim prometheus.yml
# 全局配置
global:
  scrape_interval:     15s # 设置抓取(pull)时间间隔，默认是1m
  evaluation_interval: 15s # 设置rules评估时间间隔，默认是1m
  # scrape_timeout is set to the global default (10s).

# 告警管理配置，暂未使用，默认配置
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# 加载rules，并根据设置的时间间隔定期评估，暂未使用，默认配置
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# 抓取(pull)，即监控目标配置
# 默认只有主机本身的监控配置
scrape_configs:

   - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
	static_configs:
        - targets: ['localhost:9090']

- job_name: 'linux'
      static_configs:
      - targets: ['192.168.66.53:9100']		#被监控端服务器ip
        labels:
          instance: node1
# 监控目标的label（这里的监控目标只是一个metric，而不是指某特定主机，可以在特定主机取多个监控目标），在抓取的每条时间序列表中都会添加此label

注意：job_name 定义的组名必须不能相同


#重启prometheus服务
[root@prometheus ~]# systemctl restart prometheus

6、验证

访问：http://192.168.66.52:9090，可见 node1 主机已经可被监控，如下：

四、部署 grafana

在 prometheus& grafana server 节点部署 grafana 服务。

1. 下载&安装


https://grafana.com/grafana/download
# 下载，可以下载好之后上传至服务器
[root@prometheus ~]# cd /usr/local/src/
[root@prometheus soft]# wget https://dl.grafana.com/oss/release/grafana-6.3.3-1.x86_64.rpm

# 安装本地rpm包 localinstall
[root@prometheus soft]# yum localinstall grafana-6.3.3-1.x86_64.rpm

2. 配置文件

配置文件位于/etc/grafana/grafana.ini，这里暂时保持默认配置即可。

3. 设置开机启动


[root@prometheus soft]# systemctl enable grafana-server
[root@prometheus soft]# systemctl start grafana-server

#查看服务状态
[root@prometheus soft]# systemctl status grafana-server
[root@prometheus soft]# ss -tnlp | grep 3000

4. 设置 iptables


# grafana-server默认使用3000端口
[root@prometheus soft]# vim /etc/sysconfig/iptables
-A INPUT -p tcp -m state --state NEW -m tcp --dport 3000 -j ACCEPT

[root@prometheus soft]# service iptables restart

# centos7默认开启firewalld
查看目前开放的端口
[root@prometheus soft]# firewall-cmd --zone=public --list-ports

开放3000端口
[root@prometheus soft]# firewall-cmd --permanent --zone=public --add-port=3000/tcp	#永久添加

使其规则生效
[root@prometheus soft]# firewall-cmd --reload

再次查看端口
[root@prometheus soft]# firewall-cmd --zone=public --list-ports

5. 添加数据源

1）登陆

访问：http://192.168.66.52:3000，默认账号/密码：admin/admin

2）添加数据源

在登陆首页，点击"Add data source"按钮，跳转到添加数据源页面，配置如下：

Name: prometheus
Type: prometheus
URL: http://localhost:9090/ #prometheus 服务器所在地址，如果就在本机填写 localhost
Access: Server

其余默认，点击"Add"，如下：
在"Dashboards"页签下"import"自带的模版，如下：

6. 导入 dashboard

从 grafana 官网下载相关 dashboaed 到本地，如：https://grafana.com/
Grafana 首页--> 页面顶部-->Dashboards--> 页面左边 Filter by-选择仪表盘 prometheus
上传已下载至本地的 json 文件到 Grafana 服务器[192.168.66.52:3000]（或者使用 dashboard id），如下：
数据源选择"prometheus"，即添加的数据源 name，点击"Import"按钮，如下：

7. 安装饼图的插件


#使用新的grafana-cli工具从命令行安装piechart-panel：
[root@prometheus ~]# grafana-cli plugins install grafana-piechart-panel
 

该插件将安装到您的grafana插件目录中; 如果安装了grafana软件包，则默认在/var/lib/grafana/plugins
[root@prometheus ~]# cd /var/lib/grafana/plugins/
 

#重启grafana
[root@prometheus ~]# systemctl restart grafana-server

这样dashboard中的饼图就可以正常展示出来了

8. 查看 dashboard

Grafana 首页--> 左上角图标--> Home，Home 列表中可见有已添加的两个 dashboard，" Node Exporter 0.16 0.17 for Prometheus 监控展示看板"与"Node dashboard Copy"，选择 1 个即可，如下：

9. 更新 dashboard 名称

10. 删除 dashboard 操作

五、部署 mysql_exporter

1、部署的架构图

2、安装 mysqld_exporter


在node节点安装
[root@node2 ~]# wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.11.0/mysqld_exporter-0.11.0.linux-amd64.tar.gz

[root@node2 ~]# tar xzvf mysqld_exporter-0.11.0.linux-amd64.tar.gz -C /usr/local/

[root@node2 ~]# cd /usr/local/

[root@node2 local]# mv mysqld_exporter-0.11.0.linux-amd64/ mysqld_exporter

3、设置账户


[root@node2 ~]# groupadd prometheus

[root@node2 ~]# useradd -g prometheus -s /sbin/nologin prometheus

#更改目录权限
[root@node2 ~]# chown -R prometheus:prometheus /usr/local/mysqld_exporter/

4、添加 MySQL 远程登入用户

注意：MySQL5.7 设置密码时必须符合长度，与 validate_password_policy 的值有关。如果要修改简单点密码可以设置以下参数值


mysql> set global validate_password_policy=0;

mysql> GRANT REPLICATION CLIENT, PROCESS ON *.* TO 'mysqld_exporter'@'%' identified by '123456';

mysql> GRANT SELECT ON performance_schema.* TO 'mysqld_exporter'@'%';

mysql> flush privileges;

5、创建一个用于连接 MySQL 的配置文件

mysqld_exporter 默认会读取~/.my.cnf 文件。这里是创建在 mysqld_exporter 的安装目录下的


[root@node2 ~]# vim /usr/local/mysqld_exporter/.my.cnf
[client]
user=mysqld_exporter	//刚才创建的用户
password=123456		//登入密码

6、创建 systemd 服务


[root@exporter ~]# vim /etc/systemd/system/mysqld_exporter.service

[Unit]
Description=mysql_exporter
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter/.my.cnf
Restart=on-failure
[Install]
WantedBy=multi-user.target

7、启动 mysqld_exporter


[root@node2 ~]# systemctl enable mysqld_exporter

[root@node2 ~]# systemctl start mysqld_exporter

检查端口9104
[root@node2 ~]# lsof -i:9104
COMMAND    PID       USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
mysqld_ex 2573 prometheus    3u  IPv6 23391639      0t0  TCP *:peerwire (LISTEN)

8、设置 iptables 或者 Firewalld


# 官方mysqld_exporter默认使用9104端口
[root@node1 ~]# vim /etc/sysconfig/iptables
-A INPUT -p tcp -m state --state NEW -m tcp --dport 9104 -j ACCEPT

[root@node2 ~]# service iptables restart

# centos7默认开启firewalld
查看目前开放的端口
[root@node2 ~]# firewall-cmd --zone=public --list-ports

开放9104端口
[root@node2 ~]# firewall-cmd --permanent --zone=public --add-port=9104/tcp	#永久添加

使其规则生效
[root@node2 ~]# firewall-cmd --reload

再次查看端口
[root@node2 ~]#firewall-cmd --zone=public --list-ports

9、修改 prometheus.yml，加入下面的监控目标


[root@prometheus ~]# vim /usr/local/prometheus/prometheus.yml
  - job_name: mysql
    static_configs:
      - targets: ['192.168.66.11:9104']
        labels:
          instance: db1

8、重启 Prometheus


[root@Prometheus ~]# systemctl restart prometheus

9、在 Grafana 中导入模板

9.1:上传文件

点击页面左侧---Dashboards---Home---即可看见刚才创建的仪表盘

六、Grafana 配合 zabbix 展示

1、安装 zabbix 插件，在 Grafana 服务器


[root@prometheus ~]# grafana-cli plugins install alexanderzobnin-zabbix-app
重启Grafan
[root@prometheus ~]# systemctl restart grafana-server

2、添加数据源

URL http://zabbix 服务 IP/zabbix/api_jsonrpc.php
Zabbix API details
Username：zabbix 登入用户
Password：zabbix 用户密码

根据安装的 zabbix 选择对于的版本
点击 Save & Test 保存

3、创建数据源

1、首先选择数据源：zabbix
2、选择 zabbix 内的主机组
3、选择主机
4、选择应用集
5、选择监控项
6、给这一条数据添加一个名称（将显示在图表下方，方便观看者辨识每条曲线代表什么意思）
7、如果需要添加多条曲线点击 ADD Query，然后循环 2-6 即可
8、完成后点击右边的白色叉叉即可展现出图表，如下图：

右上角点击保存图标进行保存
定义 dashboard 名称

七、Alertmanager 告警

首先需要在企业微信中添加应用小程序
应用的 ID 需要记住，后面使用
企业的唯一 ID 需要记住，后面使用

1、安装 Alertmanager

https://prometheus.io/download/


[root@prometheus ~]# tar -xzvf alertmanager-0.18.1.linux-amd64.tar.gz –C /usr/local/
[root@prometheus local]# cd /usr/local/

[root@prometheus local]# mv alertmanager-0.18.0.linux-amd64/ alertmanager

2、配置 prometheus.yml


[root@prometheus ~]# vim /usr/local/prometheus/prometheus.yml
![image.png](https://b3logfile.com/file/2019/10/image-abf214bb.png)

3、创建告警规则文件


[root@prometheus ~]# cd /usr/local/prometheus/
[root@prometheus prometheus]# vim rules.yml
groups:
- name: node
  rules:
  - alert: server_status   # alert 名字
    expr: up{job="prometheus"} == 0	#job是prometheus.yml中的jobname
    for: 15s	# 条件保持 15s 才会发出 alert
    labels:	#自定义标签
      severity: page
annotations:	# alert 的其他标签，但不用于标识 alert
      ummary: "机器 {{ $labels.instance }} 挂了"	#告警标题
      description: "请立即查看"	#告警详细内容

alert：告警规则的名称。
expr：基于 PromQL 表达式告警触发条件，用于计算是否有时间序列满足该条件。
for：评估等待时间，可选参数。用于表示只有当触发条件持续一段时间后才发送告警。在等待期间新产生告警的状态为 pending。
labels：自定义标签，允许用户指定要附加到告警上的一组附加标签。
annotations：用于指定一组附加信息，比如用于描述告警详细信息的文字等，annotations 的内容在告警产生时会一同作为参数发送到 Alertmanager。

4、配置 alertmanager.yml


[root@prometheus prometheus]# cd /usr/local/alertmanager/
[root@prometheus alertmanager]# vim alertmanager.yml
global:
    resolve_timeout: 5m  #该参数定义了当Alertmanager持续多长时间未接收到告警后标记告警状态为resolved（已解决）。该参数的定义可能会影响到告警恢复通知的接收时间，读者可根据自己的实际场景进行定义，其默认值为5分钟

templates:       #自定义告警模板文件
- '/usr/local/alertmanager/wechat.tmpl'

route:
    group_by: ['alertname'] #将传入的报警中有这些标签的分为一个组.
    group_wait: 10s      #指分组创建多久后才可以发送压缩的警报，也就是初次发警报的延时.这样会确保第一次通知的时候, 有更多的报警被压缩在一起.
    group_interval: 10s   #当第一个通知发送，等待多久发送压缩的警报
    repeat_interval: 1h   #如果报警发送成功, 等待多久重新发送一次
    receiver: 'wechat'    #默认的接收器
receivers:
- name: 'wechat'
  wechat_configs:
  - corp_id: 'xxx'
    to_party: '1'
    agent_id: '1000002'
    api_secret: 'xxx'
    send_resolved: true

corp_id: 企业微信账号唯一 ID，可以在我的企业中查看。
to_party: 需要发送的组。
agent_id: 第三方企业应用的 ID，可以在自己创建的第三方企业应用详情页面查看。
api_secret: 第三方企业应用的密钥，可以在自己创建的第三方企业应用详情页面查看。

5、配置 wechat 告警模板


[root@prometheus ~]# vim /usr/local/alertmanager/wechat.tmpl

{{ define "wechat.default.message" }}
{{ range .Alerts }}
========start==========
告警程序：prometheus_alert
告警级别：{{ .Labels.severity }}
告警类型：{{ .Labels.alertname }}
故障主机: {{ .Labels.instance }}
告警主题: {{ .Annotations.summary }}
告警详情: {{ .Annotations.description }}
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
========end==========
{{ end }}
{{ end }}

6、配置 systemd 启动


[root@prometheus ~]# vim /etc/systemd/system/alertmanager.service
[Unit]
Description=Prometheus: the alerting system
Documentation=http://prometheus.io/docs/
After=prometheus.service

[Service]
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

7、启动 Alertmanager


[root@prometheus ~]# systemctl start alertmanager
[root@prometheus ~]# systemctl enable alertmanager

###8、检查启动状态


[root@prometheus ~]# ss –tnlp | grep 9093

注意：如果启动检查没有 9093 端口，那么需要查看系统日志
如果报错为：did not find expected key
原因：alertmanager.yml 的配置文件有问题，出现了空格或者格式不对导致
解决方案：检查配置文件进行修改

9、浏览器访问查看

查看 prometheus 页面
查看 Alertmanager 页面
http://ip:9093

10、手动停止 node_exporter 测试


[root@prometheus ~]# systemctl stop node_exporte

11、prometheus 监控页面查看 Alert

12、查看 AlertManager 告警页面

13、查看微信消息

14、常用告警规则总结

告警规则不会定义，也可以在 Grafan 中直接复制即可
可以在一个告警规则文件中定义多个告警项

1、修改 prometheus 配置文件


[root@prometheus ~]# cd /usr/local/prometheus/
[root@prometheus prometheus]# vim prometheus.yml

2、创建目录和告警规则文件


[root@prometheus prometheus]# mkdir rules
[root@prometheus prometheus]# chown -R prometheus. rules

[root@prometheus prometheus]# cd rules/
[root@prometheus rules]# vim cpu_rules.yml
groups:
- name: hostStatsAlert
  rules:
  - alert: hostCpuUsageAlert
    expr: sum(avg without (cpu)(irate(node_cpu{mode!='idle'}[5m]))) by (instance) > 0.85
    for: 1m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} CPU usgae high"
      description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"
  - alert: hostMemUsageAlert
    expr: (node_memory_MemTotal - node_memory_MemAvailable)/node_memory_MemTotal > 0.85
    for: 1m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} MEM usgae high"
      description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"

3、重启 prometheus


[root@prometheus rules]# systemctl restart prometheus

4、查看页面

5、测试告警

修改告警规则文件触发告警

Prometheus+Grafana+Zabbix+alertmanager 部署

一、准备环境

二、部署 prometheus

1. 下载&部署

2. 设置用户

3. 设置开机启动

4. 设置 iptables 或 Firewalld

5. 启动并验证

1）查看服务状态

2）web ui

7. 绘图

三、部署 node_exporter

1. 下载&部署

2. 设置用户

3. 设置开机启动

4. 设置 iptables 或 Firewalld

5. Prometheus 添加该主机

6、验证

四、部署 grafana

1. 下载&安装

2. 配置文件

3. 设置开机启动

4. 设置 iptables

5. 添加数据源

1）登陆

2）添加数据源

6. 导入 dashboard

7. 安装饼图的插件

8. 查看 dashboard

9. 更新 dashboard 名称

10. 删除 dashboard 操作

五、部署 mysql_exporter

1、部署的架构图

2、安装 mysqld_exporter

3、设置账户

4、添加 MySQL 远程登入用户

5、创建一个用于连接 MySQL 的配置文件

6、创建 systemd 服务

7、启动 mysqld_exporter

8、设置 iptables 或者 Firewalld

9、修改 prometheus.yml，加入下面的监控目标

8、重启 Prometheus

9、在 Grafana 中导入模板

9.1:上传文件

六、Grafana 配合 zabbix 展示

1、安装 zabbix 插件，在 Grafana 服务器

2、添加数据源

3、创建数据源

七、Alertmanager 告警

1、安装 Alertmanager

2、配置 prometheus.yml

3、创建告警规则文件

4、配置 alertmanager.yml

5、配置 wechat 告警模板

6、配置 systemd 启动

7、启动 Alertmanager

9、浏览器访问查看

10、手动停止 node_exporter 测试

11、prometheus 监控页面查看 Alert

12、查看 AlertManager 告警页面

13、查看微信消息

14、常用告警规则总结

1、修改 prometheus 配置文件

2、创建目录和告警规则文件

3、重启 prometheus

4、查看页面

5、测试告警

相关帖子

Prometheus 监控 redis 实例

Grafana 获取 Zabbix 数据

Prometheus+Grafana 可视化监控 SpringBoot 项目

InfluxDB、Grafana 与 influxdata 其它软件，构建性能监控平台

Prometheus 简介

Micrometer concepts

vmalert

欢迎来到这里！