Skip to content

Commit 5a07010

Browse files
committed
(fleet/prometheus-alerts) add pvc alert
1 parent 86b1526 commit 5a07010

File tree

4 files changed

+26
-5
lines changed

4 files changed

+26
-5
lines changed

fleet/lib/kube-prometheus-stack/overlays/antu/values.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ alertmanager:
192192
- site
193193
group_wait: 30s
194194
group_interval: 5m
195-
repeat_interval: 24h
195+
repeat_interval: 8760h
196196
receiver: blackhole
197197
routes:
198198
- receiver: blackhole

fleet/lib/prometheus-alerts/README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ file with the `rules.namespace` key.
2727
of receivers, pre- and suffixed with `,` to make regex matching easier in the
2828
alertmanager. For example: `,slack,squadcast,email,` The receivers are defined
2929
in the alertmanager configuration.
30-
Currently (20240503) the following receivers are configured:
31-
* `slack-test`
32-
* `squadcast-test`
30+
31+
Currently (20250616) the following receivers are configured:
32+
* `gnocpush`: Requires label `gnoc = "true"`
33+
* `squadcast-alertmanager`: Requires label `prod = "true"`
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
groups:
2+
- name: k8s.rules
3+
rules:
4+
- alert: PVCLowFreeSpace
5+
annotations:
6+
summary: PVC {{ $labels.persistentvolumeclaim }} is low on free space
7+
description: >
8+
PVC {{ $labels.persistentvolumeclaim }} in namespace {{ $labels.namespace }}
9+
has less than 20% free space.
10+
expr: |
11+
(kubelet_volume_stats_available_bytes{job="kubelet",metrics_path="/metrics",namespace=~".*"}
12+
/ kubelet_volume_stats_capacity_bytes{job="kubelet",metrics_path="/metrics",namespace=~".*"}) < 0.20
13+
and kubelet_volume_stats_used_bytes{job="kubelet",metrics_path="/metrics",namespace=~".*"} > 0
14+
for: 2m
15+
labels:
16+
prod: "true"
17+
severity: warning
18+
node_name: '{{ $labels.prom_cluster }}'
19+
device: null
20+
service_name: null

fleet/lib/prometheus-alerts/rules/prometheusrule-net.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ groups:
44
- alert: HostDown
55
annotations:
66
summary: Host {{ $labels.instance }} is down
7-
description: Host {{ $labels.instance }} is down. Maybe it is on fire??? 🗑🔥
7+
description: Host {{ $labels.instance }} is down. Maybe it is on fire???
88
expr: probe_success != 1
99
for: 1m
1010
labels:

0 commit comments

Comments
 (0)