Prometheus

入門資料

https://speakerdeck.com/superbrothers/introduction-to-prometheus?slide=2

https://speakerdeck.com/yosshi_/purodakutodan-sheng-falsebei-jing-karaxue-bu-prometheustografana-loki

Alert Rule && Recording Rule

https://zaki-hmkc.hatenablog.com/entry/2020/10/19/213855

https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/

https://knowledge.sakura.ad.jp/11635/

アラートルールは特定の条件を満たしたときにアラートを発生させるためのルール。

条件を満たすと firing 状態になり、アラート状態となる。

ただし、これだけだとユーザが UI を直接見に行かないといけない。

そのため、別途 #AlertManager を指定することによって、アラートを能動的に見れるようにする。

code:yaml

alert: HighCPUUsage

expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}5m)) * 100) > 80

for: 5m

labels:

severity: warning

annotations:

summary: "High CPU usage detected"

description: "CPU usage is above 80% on {{ $labels.instance }}"

Recording Rule は複雑なクエリを事前に設定しておき、その結果をメトリクスとして保存しておくもの

下記の場合、 job:http_inprogress_requests:sum というメトリクスにしている

code:yaml

groups:

- name: ＜ルール名＞

rules:

- record: ＜名称＞

expr: ＜クエリ式＞

labels:

＜ラベル名＞: ＜値＞, ...

これらルールは prometheus.yaml の rule_files に指定する。

これらは AlertRule & Prometheus Rule どちらでもかける。

PrometheusOperator においても、 PrometheusRule という１つのリソースとして共通で管理される。

https://scrapbox.io/files/671481902cb9af1a72453f3b.png

Node-Exporter

もっとも有名な Exporter であり、デフォルトでは各ノードの 9100 で動作する

code:bash

❯ curl localhost:9100/metrics | head

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.

# TYPE go_gc_duration_seconds summary

go_gc_duration_seconds{quantile="0"} 1.9951e-05

go_gc_duration_seconds{quantile="0.25"} 2.8636e-05

go_gc_duration_seconds{quantile="0.5"} 3.631e-05

go_gc_duration_seconds{quantile="0.75"} 4.5303e-05

go_gc_duration_seconds{quantile="1"} 9.2775e-05

go_gc_duration_seconds_sum 3.065867826

go_gc_duration_seconds_count 68995

# HELP go_goroutines Number of goroutines that currently exist.

100 12091 0 12091 0 0 121k 0 --:--:-- --:--:-- --:--:-- 122k

curl: (23) Failure writing output to destination

#prometheus