admin管理员组

文章数量:1637858

Prometheus简单介绍:

 

Prometheus使用Go语言开发,是Google BorgMon监控系统的开源版本,怎么产生的就不在这讨论了,反正就是香,简单易用。

2016年由Google发起Linux基金会旗下的原生云基金会(Cloud Native Computing Foundation), 将Prometheus纳入其下第二大开源项目。Prometheus目前在开源社区相当活跃(活跃表现在插件非常多),并且是kubernetes之后的第二个毕业项目,由于和kubernetes一样同为go语言编写的,同根同族,和etcd的情况类似,因此,也是非常容易的kubernetes集群就可以连入prometheus,也就是说在云原生领域普罗米修斯是天然的伴侣,基本不做第二选择。

  • 开发层面 

Prometheus支持多种语言(Go,java,python,ruby官方提供客户端,其他语言有第三方开源客户端)。我们可以通过客户端方面的对核心业务进行埋点。如下单流程、添加购物车流程。

  • 在应用层用作应用监控系统

一些主流应用可以通过官方或第三方的导出器,来对这些应用做核心指标的收集。如redis,mysql,MongoDB,nginx,haproxy,kubernetes集群等等。

  • 在系统层用作系统监控

除了常用软件, prometheus也有相关系统层和网络层exporter,用以监控服务器或网络。

  • 集成其它监控方面

prometheus可以通过各种exporte,集成其他的监控系统,收集监控数据,如AWS CloudWatch,JMX,Pingdom等等。

那么,普罗米修斯也有一些缺点,在数据展示层面比较的弱,因此, grafana这家伙就闪亮登场了。

Grafana简单介绍:

grafana是用于可视化大型测量数据的开源程序,他提供了强大和优雅的方式去创建、共享、浏览数据。dashboard中显示了你不同metric数据源中的数据。

Grafana是一个开源的,拥有丰富dashboard和图表编辑的指标分析平台,和Kibana不同的是Grafana专注于时序类图表分析,而且支持多种数据源,如Graphite、InfluxDB、Elasticsearch、Mysql、K8s、Zabbix等。

瞅着这种描述,可能更多的可以用作运维相关的指标。

Grafana最早其实应该是Kibana3的一个分支,拥有自己的权限管理和用户管理系统,而Kibana没有权限管理。Kibana和ES结合紧密,支持强大的ES语法,比较适合做一些多维度的分析和查询,而Grafana更适合用于展示,图形比Kibana美观很多。

也就是说一般可以用到运维平台里面,但是仅仅是展示,显然对运维没有太大帮助,需要加入更多的告警或者互动查询相关的功能,然后从性能或者使用角度有更好的指标,才会被选择使用,另外,一般模板类的东西也可以用做参考。

一,

Prometheus的架构

prometheus是一个用Go编写的时序数据库,可以支持多种语言客户端,注意,因为它是数据库,所以它的缺点就是数据展示功能不够,因此,才有Grafana的闪亮登场。

TSDB简介

TSDB(Time Series Database)时序列数据库,我们可以简单的理解为一个优化后用来处理时间序列数据的软件,并且数据中的数组是由时间进行索引的。

时间序列数据库的特点

大部分时间都是写入操作。

写入操作几乎是顺序添加,大多数时候数据到达后都以时间排序。

写操作很少写入很久之前的数据,也很少更新数据。大多数情况在数据被采集到数秒或者数分钟后就会被写入数据库。

删除操作一般为区块删除,选定开始的历史时间并指定后续的区块。很少单独删除某个时间或者分开的随机时间的数据。

基本数据大,一般超过内存大小。一般选取的只是其一小部分且没有规律,缓存几乎不起任何作用。

读操作是十分典型的升序或者降序的顺序读。

高并发的读操作十分常见。

常见的时间序列数据库

  • influxDB
  • RRDtool
  • Graphite
  • OpenTSDB
  • Kdb+
  • Druid
  • KairosDB
  • Prometheus

Prometheus的生态系统

Prometheus生态系统由多个组件组成,它们中的一些是可选的。多数Prometheus组件是Go语言写的,这使得这些组件很容易编译和部署。

1.Prometheus Server

主要负责数据采集和存储,提供PromQL查询语言的支持。

2.客户端SDK

官方提供的客户端类库有go、java、scala、python、ruby,其他还有很多第三方开发的类库,支持nodejs、php、erlang等。

3.Push Gateway

支持临时性Job主动推送指标的中间网关。

4.PromDash

使用Rails开发可视化的Dashboard,用于可视化指标数据。

5.Exporter

Exporter是Prometheus的一类数据采集组件的总称。它负责从目标处搜集数据,并将其转化为Prometheus支持的格式。与传统的数据采集组件不同的是,它并不向中央服务器发送数据,而是等待中央服务器主动前来抓取。

Prometheus提供多种类型的Exporter用于采集各种不同服务的运行状态。目前支持的有数据库、硬件、消息中间件、存储系统、HTTP服务器、JMX等。

6.alertmanager

警告管理器,用来进行报警。

7.prometheus_cli

命令行工具。

8.其他辅助性工具

多种导出工具,可以支持Prometheus存储数据转化为HAProxy、StatsD、Graphite等工具所需要的数据存储格式。

架构图如下所示:

二,

普罗米修斯的部署方式

  • 1. 二进制部署
  • 2. Docker部署
  • 3. kubernetes集群内部署

本文选择的是二进制部署方式,

在192.168.217.24服务器上安装Prometheus server,同时安装节点信息收集器node_exporter

在192.168.217.23服务器上安装MySQL信息收集器 mysqld_exporter和node_exporter 节点信息收集器(因MySQL安装在23服务器上的)

三,

Prometheus server的安装

下载地址:Download | Prometheus

因为我的是amd64架构的,因此,选择linux-amd64,版本选择长期支持稳定版本2.37.2,将下载的安装包上传到服务器24并解压。

tar zxf prometheus-2.37.2.linux-amd64.tar.gz
mv prometheus-2.37.2.linux-amd64 /usr/local/prometheus
[root@node4 prometheus]# ll
total 202256
drwxr-xr-x. 2 root root        38 May  8  2022 console_libraries #web控制台的依赖库
drwxr-xr-x. 2 root root       173 May  8  2022 consoles #web控制台的网页文件
drwxr-xr-x. 6 root root       126 Nov 15 22:12 data   #时序数据库的数据
-rw-r--r--. 1 root root     11357 Apr 21  2022 LICENSE #说明书
-rw-r--r--. 1 root root      3773 Apr 21  2022 NOTICE  #说明
-rwxr-xr-x  1 3434 3434 109691493 Nov  4 19:09 prometheus #主程序,可执行文件
-rw-r--r--. 1 root root      1148 Nov 15 21:09 prometheus.yml #Prometheus的主要配置文件
-rwxr-xr-x. 1 root root  97394322 Apr 21  2022 promtool  #Prometheus的管理工具,可以查看时序数据库,以及报警规则文件的测试等等功能。

例如,查看时序数据库

[root@node4 prometheus]# ./promtool tsdb list
BLOCK ULID                  MIN TIME       MAX TIME       DURATION     NUM SAMPLES  NUM CHUNKS   NUM SERIES   SIZE
01GHXKCBSD5JWBT1YTDZE3ME78  1668503659419  1668506400000  45m40.581s   162609       1212         1212         494354
01GHXP04ETXXVQ5W28XXGZDEC4  1668506400000  1668513600000  2h0m0s       591840       4896         1308         1609719

当然,这个Prometheus可以前台启动, ./程序  就可以前台启动了,但每次启停需要占据一个shell,未免不人性化,因此,给它增加一个启停脚本,脚本如下:

cat >/etc/systemd/system/prometheus.service <<EOF
[Unit]
Descriptinotallow=Prometheus Monitoring System
Documentatinotallow=Prometheus Monitoring System

[Service]
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --web.listen-address=:9090 
# 这里的路径按实际填写

[Install]
WantedBy=multi-user.target
EOF
systemctl enable prometheus && systemctl start prometheus

查看服务状态,如绿色表示启动正常,否则需要排查问题,日志里可以看到有 TSDB started以及web准备完毕的语句Server is ready to receive web requests

[root@node4 prometheus]# systemctl status prometheus
● prometheus.service
   Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2022-11-16 11:09:20 CST; 20min ago
 Main PID: 3925 (prometheus)
   CGroup: /system.slice/prometheus.service
           └─3925 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --web.listen-address=:9090

Nov 16 11:09:21 node4 prometheus[3925]: ts=2022-11-16T03:09:21.098Z caller=main.go:993 level=info fs_type=XFS_SUPER_MAGIC
Nov 16 11:09:21 node4 prometheus[3925]: ts=2022-11-16T03:09:21.098Z caller=main.go:996 level=info msg="TSDB started"
Nov 16 11:09:21 node4 prometheus[3925]: ts=2022-11-16T03:09:21.098Z caller=main.go:1177 level=info msg="Loading configuration file" filename=/usr/local/prometheus/prometheus.yml
Nov 16 11:09:21 node4 prometheus[3925]: ts=2022-11-16T03:09:21.234Z caller=main.go:1214 level=info msg="Completed loading of configuration file" filename=/usr/local/prometheus/prometheus.yml totalDuration=135.316399ms db_storage=1.16…µs
Nov 16 11:09:21 node4 prometheus[3925]: ts=2022-11-16T03:09:21.235Z caller=main.go:957 level=info msg="Server is ready to receive web requests."
Nov 16 11:09:21 node4 prometheus[3925]: ts=2022-11-16T03:09:21.236Z caller=manager.go:941 level=info component="rule manager" msg="Starting rule manager..."
Nov 16 11:09:27 node4 prometheus[3925]: ts=2022-11-16T03:09:27.708Z caller=compact.go:510 level=info component=tsdb msg="write block resulted in empty block" mint=1668528000000 maxt=1668535200000 duration=36.455932ms
Nov 16 11:09:27 node4 prometheus[3925]: ts=2022-11-16T03:09:27.713Z caller=head.go:842 level=info component=tsdb msg="Head GC completed" duration=3.84614ms
Nov 16 11:09:27 node4 prometheus[3925]: ts=2022-11-16T03:09:27.714Z caller=checkpoint.go:97 level=info component=tsdb msg="Creating checkpoint" from_segment=0 to_segment=1 mint=1668535200000
Nov 16 11:09:27 node4 prometheus[3925]: ts=2022-11-16T03:09:27.760Z caller=head.go:1011 level=info component=tsdb msg="WAL checkpoint complete" first=0 last=1 duration=46.524627ms
Hint: Some lines were ellipsized, use -l to show in full.

此时的配置文件修改成这样:

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["192.168.217.24:9090"]  #本机IP+端口,其它的不用改

     

打开浏览器,输入最后的那一段网址:

 可以看到state是up 绿色的,检查点可以打开看看:

OK,Prometheus server 就安装好了

四,

node_exporter的安装和配置

node_exporter等于是一个客户端信息收集器,收集的目标是类unix操作系统的CPU,内存等等基本数据,具体的收集范围可以看它的帮助:

可以看到CPU,edac,ipvs等等都是默认收集的,但还有一些是不收集的,例如ntp时间服务器,但默认的已经可以满足我们基本的百分之99的需求了。

[root@node4 prometheus]# node_exporter --help
截取里面的收集动作
                                 test fixtures to use for wifi collector metrics
      --collector.arp            Enable the arp collector (default: enabled).
      --collector.bcache         Enable the bcache collector (default: enabled).
      --collector.bonding        Enable the bonding collector (default: enabled).
      --collector.btrfs          Enable the btrfs collector (default: enabled).
      --collector.buddyinfo      Enable the buddyinfo collector (default: disabled).
      --collector.cgroups        Enable the cgroups collector (default: disabled).
      --collector.conntrack      Enable the conntrack collector (default: enabled).
      --collector.cpu            Enable the cpu collector (default: enabled).
      --collector.cpufreq        Enable the cpufreq collector (default: enabled).
      --collector.diskstats      Enable the diskstats collector (default: enabled).
      --collector.dmi            Enable the dmi collector (default: enabled).
      --collector.drbd           Enable the drbd collector (default: disabled).
      --collector.drm            Enable the drm collector (default: disabled).
      --collector.edac           Enable the edac collector (default: enabled).
      --collector.entropy        Enable the entropy collector (default: enabled).
      --collector.ethtool        Enable the ethtool collector (default: disabled).
      --collector.fibrechannel   Enable the fibrechannel collector (default: enabled).
      --collector.filefd         Enable the filefd collector (default: enabled).
      --collector.filesystem     Enable the filesystem collector (default: enabled).
      --collector.hwmon          Enable the hwmon collector (default: enabled).
      --collector.infiniband     Enable the infiniband collector (default: enabled).
      --collector.interrupts     Enable the interrupts collector (default: disabled).
      --collector.ipvs           Enable the ipvs collector (default: enabled).
      --collector.ksmd           Enable the ksmd collector (default: disabled).
      --collector.lnstat         Enable the lnstat collector (default: disabled).
      --collector.loadavg        Enable the loadavg collector (default: enabled).
      --collector.logind         Enable the logind collector (default: disabled).
      --collector.mdadm          Enable the mdadm collector (default: enabled).
      --collector.meminfo        Enable the meminfo collector (default: enabled).
      --collector.meminfo_numa   Enable the meminfo_numa collector (default: disabled).
      --collector.mountstats     Enable the mountstats collector (default: disabled).
      --collectorclass       Enable the netclass collector (default: enabled).
      --collectordev         Enable the netdev collector (default: enabled).
      --collectorstat        Enable the netstat collector (default: enabled).
      --collectorwork_route  Enable the network_route collector (default: disabled).
      --collector.nfs            Enable the nfs collector (default: enabled).
      --collector.nfsd           Enable the nfsd collector (default: enabled).
      --collector.ntp            Enable the ntp collector (default: disabled).
      --collector.nvme           Enable the nvme collector (default: enabled).
      --collector.os             Enable the os collector (default: enabled).
      --collector.perf           Enable the perf collector (default: disabled).
      --collector.powersupplyclass  
                                 Enable the powersupplyclass collector (default: enabled).
      --collector.pressure       Enable the pressure collector (default: enabled).
      --collector.processes      Enable the processes collector (default: disabled).
      --collector.qdisc          Enable the qdisc collector (default: disabled).
      --collector.rapl           Enable the rapl collector (default: enabled).
      --collector.runit          Enable the runit collector (default: disabled).
      --collector.schedstat      Enable the schedstat collector (default: enabled).
      --collector.selinux        Enable the selinux collector (default: enabled).
      --collector.slabinfo       Enable the slabinfo collector (default: disabled).
      --collector.sockstat       Enable the sockstat collector (default: enabled).
      --collector.softnet        Enable the softnet collector (default: enabled).
      --collector.stat           Enable the stat collector (default: enabled).
      --collector.supervisord    Enable the supervisord collector (default: disabled).
      --collector.sysctl         Enable the sysctl collector (default: disabled).
      --collector.systemd        Enable the systemd collector (default: disabled).
      --collector.tapestats      Enable the tapestats collector (default: enabled).
      --collector.tcpstat        Enable the tcpstat collector (default: disabled).
      --collector.textfile       Enable the textfile collector (default: enabled).
      --collector.thermal_zone   Enable the thermal_zone collector (default: enabled).
      --collector.time           Enable the time collector (default: enabled).
      --collector.timex          Enable the timex collector (default: enabled).
      --collector.udp_queues     Enable the udp_queues collector (default: enabled).
      --collector.uname          Enable the uname collector (default: enabled).
      --collector.vmstat         Enable the vmstat collector (default: enabled).
      --collector.wifi           Enable the wifi collector (default: disabled).
      --collector.xfs            Enable the xfs collector (default: enabled).
      --collector.zfs            Enable the zfs collector (default: enabled).
      --collector.zoneinfo       Enable the zoneinfo collector (default: disabled).

 由于此采集器是go语言编写的,就一个可执行文件,因此,将node_exporter-1.4.0.linux-amd64.tar.gz上传到服务器后,解压并将可执行文件放到环境变量内即可。

tar zxf node_exporter-1.4.0.linux-amd64.tar.gz
mv node_exporter-1.4.0.linux-amd64/node_exporter /usr/local/bin/

还是老办法,使用启停脚本进行管理:

多说一句,以上说的定制化收集其实就在这个启停脚本里设置即可,本例是默认,因此很多都没有写的。

cat >/etc/systemd/system/node_exporter.service <<EOF
[Unit]
Descriptinotallow=node_exporter Monitoring System
Documentatinotallow=node_exporter Monitoring System
 
[Service]
ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9100
 
[Install]
WantedBy=multi-user.target
EOF
systemctl enable node_exporter && systemctl start node_exporter

查看服务状态,绿色表示正常:

[root@node4 ~]# systemctl status node_exporter
● node_exporter.service
   Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2022-11-16 12:27:46 CST; 1min 23s ago
 Main PID: 7519 (node_exporter)
   CGroup: /system.slice/node_exporter.service
           └─7519 /usr/local/bin/node_exporter --web.listen-address=:9100

Nov 16 12:27:46 node4 node_exporter[7519]: ts=2022-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=timex
Nov 16 12:27:46 node4 node_exporter[7519]: ts=2022-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=udp_queues
Nov 16 12:27:46 node4 node_exporter[7519]: ts=2022-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=uname
Nov 16 12:27:46 node4 node_exporter[7519]: ts=2022-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=vmstat
Nov 16 12:27:46 node4 node_exporter[7519]: ts=2022-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=xfs
Nov 16 12:27:46 node4 node_exporter[7519]: ts=2022-11-16T04:27:46.961Z caller=node_exporter.go:115 level=info collector=zfs
Nov 16 12:27:46 node4 node_exporter[7519]: ts=2022-11-16T04:27:46.961Z caller=node_exporter.go:199 level=info msg="Listening on" address=:9100
Nov 16 12:27:46 node4 node_exporter[7519]: ts=2022-11-16T04:27:46.961Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false
Nov 16 12:28:10 node4 systemd[1]: [/etc/systemd/system/node_exporter.service:2] Unknown lvalue 'Descriptinotallow' in section 'Unit'
Nov 16 12:28:10 node4 systemd[1]: [/etc/systemd/system/node_exporter.service:3] Unknown lvalue 'Documentatinotallow' in section 'Unit'

现在的node采集器已经工作,差最后一哆嗦,将此采集器收集的数据接入Prometheus。集成方式为编辑Prometheus的配置文件,增加target字段:

(同样的,在23服务器也也这么安装部署一哈,把node_exporter服务启动了)

[root@node4 ~]# cat /etc/systemd/system/node_exporter.service 
[Unit]
Descriptinotallow=node_exporter Monitoring System
Documentatinotallow=node_exporter Monitoring System
 
[Service]
ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9100
 
[Install]
WantedBy=multi-user.target
[root@node4 ~]# cat /usr/local/prometheus/prometheus.yml 
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["192.168.217.24:9090"]
  - job_name: "server"
    static_configs:
      - targets: ["192.168.217.24:9100"]
      - targets: ["192.168.217.23:9100"]

重启Prometheus server,在浏览器上就可以看到多出了两个target了:

 五,

MySQL收集器的安装和配置(192.168.217.23服务器上执行)

解压安装包,并重命名到指定路径 /usr/local/下:

tar zxf mysqld_exporter-0.14.0.linux-amd64.tar.gz
mv mysqld_exporter-0.14.0.linux-amd64 /usr/local/mysqld_exporter

数据库建立专用用户:

create user 'exporter'@'%' identified by '123456';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%'  WITH MAX_USER_CONNECTIONS 3;
flush privileges;

编辑MySQL的配置文件:

MySQL的端口和密码,我这里不是默认的端口,是3311,MySQL安装在192.168.217.23上的。

cat >/usr/local/mysqld_exporter/myf <<EOF
[client]
host = 192.168.217.23
port = 3311
user = exporter
password = 123456
[mysqladmin]
host = 192.168.217.23
port = 3311
user = exporter
password = 123456
EOF

编辑启停脚本:

 cat  >/usr/lib/systemd/system/mysqld-exporter.service <<EOF
[Unit]
Description=mysqld_exporter
[Service]
User=expoter
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter \
--config.my-cnf=/usr/local/mysqld_exporter/myf \
--web.listen-address=:9104 \
--collect.slave_status \
--collect.binlog_size \
--collect.info_schema.processlist \
--collect.info_schema.innodb_metrics \
--collect.engine_innodb_status \
--collect.perf_schema.file_events \
--collect.perf_schema.replication_group_member_stats
Restart=on-failure
[Install]
WantedBy=multi-user.targe
EOF

以上的参数都是通过 mysqld_exported 的帮助得来的,有兴趣的同学可以看看下面的帮助,对比使用了哪些参数:

[root@node3 ~]# mysqld_exporter --help
usage: mysqld_exporter [<flags>]

Flags:
  -h, --help                   Show context-sensitive help (also try --help-long and --help-man).
      --exporter.lock_wait_timeout=2  
                               Set a lock_wait_timeout (in seconds) on the connection to avoid long metadata locking.
      --exporter.log_slow_filter  
                               Add a log_slow_filter to avoid slow query logging of scrapes. NOTE: Not supported by Oracle MySQL.
      --collect.heartbeat.database="heartbeat"  
                               Database from where to collect heartbeat data
      --collect.heartbeat.table="heartbeat"  
                               Table from where to collect heartbeat data
      --collect.heartbeat.utc  Use UTC for timestamps of the current server (`pt-heartbeat` is called with `--utc`)
      --collect.info_schema.processlist.min_time=0  
                               Minimum time a thread must be in each state to be counted
      --collect.info_schema.processlist.processes_by_user  
                               Enable collecting the number of processes by user
      --collect.info_schema.processlist.processes_by_host  
                               Enable collecting the number of processes by host
      --collect.info_schema.tables.databases="*"  
                               The list of databases to collect table stats for, or '*' for all
      --collect.mysql.user.privileges  
                               Enable collecting user privileges from mysql.user
      --collect.perf_schema.eventsstatements.limit=250  
                               Limit the number of events statements digests by response time
      --collect.perf_schema.eventsstatements.timelimit=86400  
                               Limit how old the 'last_seen' events statements can be, in seconds
      --collect.perf_schema.eventsstatements.digest_text_limit=120  
                               Maximum length of the normalized statement text
      --collect.perf_schema.file_instances.filter=".*"  
                               RegEx file_name filter for performance_schema.file_summary_by_instance
      --collect.perf_schema.file_instances.remove_prefix="/var/lib/mysql/"  
                               Remove path prefix in performance_schema.file_summary_by_instance
      --collect.perf_schema.memory_events.remove_prefix="memory/"  
                               Remove instrument prefix in performance_schema.memory_summary_global_by_event_name
      --web.config.file=""     [EXPERIMENTAL] Path to configuration file that can enable TLS or authentication.
      --web.listen-address=":9104"  
                               Address to listen on for web interface and telemetry.
      --web.telemetry-path="/metrics"  
                               Path under which to expose metrics.
      --timeout-offset=0.25    Offset to subtract from timeout in seconds.
      --config.my-cnf="/root/.myf"  
                               Path to .myf file to read MySQL credentials from.
      --tls.insecure-skip-verify  
                               Ignore certificate and server verification when using a tls connection.
      --collect.global_variables  
                               Collect from SHOW GLOBAL VARIABLES
      --collect.slave_status   Collect from SHOW SLAVE STATUS
      --collect.info_schema.processlist  
                               Collect current thread state counts from the information_schema.processlist
      --collect.mysql.user     Collect data from mysql.user
      --collect.info_schema.tables  
                               Collect metrics from information_schema.tables
      --collect.info_schema.innodb_tablespaces  
                               Collect metrics from information_schema.innodb_sys_tablespaces
      --collect.info_schema.innodb_metrics  
                               Collect metrics from information_schema.innodb_metrics
      --collect.global_status  Collect from SHOW GLOBAL STATUS
      --collect.binlog_size    Collect the current size of all registered binlog files
      --collect.perf_schema.tableiowaits  
                               Collect metrics from performance_schema.table_io_waits_summary_by_table
      --collect.perf_schema.indexiowaits  
                               Collect metrics from performance_schema.table_io_waits_summary_by_index_usage
      --collect.perf_schema.tablelocks  
                               Collect metrics from performance_schema.table_lock_waits_summary_by_table
      --collect.perf_schema.eventsstatements  
                               Collect metrics from performance_schema.events_statements_summary_by_digest
      --collect.perf_schema.eventsstatementssum  
                               Collect metrics of grand sums from performance_schema.events_statements_summary_by_digest
      --collect.perf_schema.eventswaits  
                               Collect metrics from performance_schema.events_waits_summary_global_by_event_name
      --collect.auto_increment.columns  
                               Collect auto_increment columns and max values from information_schema
      --collect.perf_schema.file_instances  
                               Collect metrics from performance_schema.file_summary_by_instance
      --collect.perf_schema.memory_events  
                               Collect metrics from performance_schema.memory_summary_global_by_event_name
      --collect.perf_schema.replication_group_members  
                               Collect metrics from performance_schema.replication_group_members
      --collect.perf_schema.replication_group_member_stats  
                               Collect metrics from performance_schema.replication_group_member_stats
      --collect.perf_schema.replication_applier_status_by_worker  
                               Collect metrics from performance_schema.replication_applier_status_by_worker
      --collect.info_schema.userstats  
                               If running with userstat=1, set to true to collect user statistics
      --collect.info_schema.clientstats  
                               If running with userstat=1, set to true to collect client statistics
      --collect.perf_schema.file_events  
                               Collect metrics from performance_schema.file_summary_by_event_name
      --collect.info_schema.schemastats  
                               If running with userstat=1, set to true to collect schema statistics
      --collect.info_schema.innodb_cmp  
                               Collect metrics from information_schema.innodb_cmp
      --collect.info_schema.innodb_cmpmem  
                               Collect metrics from information_schema.innodb_cmpmem
      --collect.info_schema.query_response_time  
                               Collect query response time distribution if query_response_time_stats is ON.
      --collect.engine_tokudb_status  
                               Collect from SHOW ENGINE TOKUDB STATUS
      --collect.engine_innodb_status  
                               Collect from SHOW ENGINE INNODB STATUS
      --collect.heartbeat      Collect from heartbeat
      --collect.info_schema.tablestats  
                               If running with userstat=1, set to true to collect table statistics
      --collect.info_schema.replica_host  
                               Collect metrics from information_schema.replica_host_status
      --collect.slave_hosts    Scrape information from 'SHOW SLAVE HOSTS'
      --log.level=info         Only log messages with the given severity or above. One of: [debug, info, warn, error]
      --log.format=logfmt      Output format of log messages. One of: [logfmt, json]
      --version                Show application version.

查看端口:

[root@node3 ~]# netstat -antup |grep 3311
tcp        0      0 192.168.217.23:59276    192.168.217.23:3311     TIME_WAIT   -                   
tcp        0      0 192.168.217.23:59278    192.168.217.23:3311     TIME_WAIT   -                   
tcp        0      0 192.168.217.23:59274    192.168.217.23:3311     TIME_WAIT   -                   
tcp        0      0 192.168.217.23:59270    192.168.217.23:3311     TIME_WAIT   -                   
tcp        0      0 192.168.217.23:59272    192.168.217.23:3311     TIME_WAIT   -                   
tcp6       0      0 :::3311                 :::*                    LISTEN      2859/mysqld         
tcp6       0      0 192.168.217.23:3311     192.168.217.23:59270    TIME_WAIT   -             
[root@node3 ~]# netstat -antup |grep 9104
tcp6       0      0 :::9104                 :::*                    LISTEN      7041/mysqld_exporte 
tcp6       0      0 192.168.217.23:9104     192.168.217.24:60422    ESTABLISHED 7041/mysqld_exporte 

将MySQL采集器接入Prometheus:

同样的,修改Prometheus的配置文件,增加一个target:

[root@node4 ~]# cat /usr/local/prometheus/prometheus.yml 
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["192.168.217.24:9090"]
  - job_name: "server"
    static_configs:
      - targets: ["192.168.217.24:9100"]
      - targets: ["192.168.217.23:9100"]
  - job_name: "mysqld"
    static_configs:
      - targets: ["192.168.217.23:9104"]

重启Prometheus server,再次打开浏览器,查看target,有一个绿色up表示接入成功:

 六,

部署Grafana (在192.168.217.24上部署的)

Download Grafana | Grafana Labs

 yum安装完毕后,Grafana就已经可以使用了,直接浏览器打开,输入192.168.217.24:3000就可以登录,初始的账号/密码是 admin/admin,登录后将会要求修改初始密码,按要求修改就可以了。(修改后的密码要记住哦)

登录进去后,集成Prometheus,选择data source 数据源:

点旁边的Seetings 

 

 

 dashboard的模板配置文件一般是json格式的文件,这些文件官网都有提供,网址是:Dashboards | Grafana Labs

例如,首页上的node exporter采集器的模板配置文件: 

选择上图的import按钮,导入此文件:

 

 

同样的,MySQL_exporter收集器也需要一个json类型的配置文件来生成dashboard,在官网寻找就可以了,当然了,一般是选择标星高的,例如:

MySQL Overview | Grafana Labs  这个ID为7362的模板文件下载了30多w次,证明还是比较可靠的哦。

以上为二进制部署Prometheus+Grafana,是不是很简单呢?这么简单的几步就有了一个酷炫的装逼运维监控神器了,对吧~~~

后面打算写一下报警,敬请期待。

本文标签: 神器来了网络监控方式基础