Skip to content

Commit

Permalink
prometheus.rules.yml: restrict tooManyFiles rule
Browse files Browse the repository at this point in the history
Checking this alert for all present partitions is nice however we saw that
the one that has OS files often hits some of thresholds and start creating a noise.

If fact we only care about the partition that has scylla 'data' directory and by
default this partition is /var/lib/scylla.

Let's restrict these alerts (there are 3 with different thresholds and corresponding severities)
to the mount point above only.

Any user that has a different mount point hosting scylla's 'data' directory will have to
adjust the filtering correspondingly.

Fixes #2113
  • Loading branch information
vladzcloudius committed Nov 6, 2023
1 parent e040a90 commit 4bd2ec8
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions prometheus/prom_rules/prometheus.rules.yml
Original file line number Diff line number Diff line change
Expand Up @@ -262,26 +262,26 @@ groups:
description: 'OOM Kill on {{ $labels.instance }}'
summary: A process was terminated on Instance {{ $labels.instance }}
- alert: tooManyFiles
expr: (node_filesystem_files - node_filesystem_files_free) / on(instance) group_left count(scylla_reactor_cpu_busy_ms) by (instance)>20000
expr: (node_filesystem_files{mountpoint="/var/lib/scylla"} - node_filesystem_files_free{mountpoint="/var/lib/scylla"}) / on(instance) group_left count(scylla_reactor_cpu_busy_ms) by (instance)>20000
for: 10s
labels:
severity: "info"
description: 'Over 20k open files per shard {{ $labels.instance }}'
summary: There are over 20K open files per shard on Insace {{ $labels.instance }}
description: 'Over 20k open files in /var/lib/scylla per shard {{ $labels.instance }}'
summary: There are over 20K open files per shard on Instance {{ $labels.instance }}
- alert: tooManyFiles
expr: (node_filesystem_files - node_filesystem_files_free) / on(instance) group_left count(scylla_reactor_cpu_busy_ms) by (instance)>30000
expr: (node_filesystem_files{mountpoint="/var/lib/scylla"} - node_filesystem_files_free{mountpoint="/var/lib/scylla"}) / on(instance) group_left count(scylla_reactor_cpu_busy_ms) by (instance)>30000
for: 10s
labels:
severity: "warn"
description: 'Over 30k open files per shard {{ $labels.instance }}'
summary: There are over 30K open files per shard on Insace {{ $labels.instance }}
description: 'Over 30k open files in /var/lib/scylla per shard {{ $labels.instance }}'
summary: There are over 30K open files per shard on Instance {{ $labels.instance }}
- alert: tooManyFiles
expr: (node_filesystem_files - node_filesystem_files_free) / on(instance) group_left count(scylla_reactor_cpu_busy_ms) by (instance)>40000
expr: (node_filesystem_files{mountpoint="/var/lib/scylla"} - node_filesystem_files_free{mountpoint="/var/lib/scylla"}) / on(instance) group_left count(scylla_reactor_cpu_busy_ms) by (instance)>40000
for: 10s
labels:
severity: "error"
description: 'Over 40k open files per shard {{ $labels.instance }}'
summary: There are over 40K open files per shard on Insace {{ $labels.instance }}
description: 'Over 40k open files in /var/lib/scylla per shard {{ $labels.instance }}'
summary: There are over 40K open files per shard on Instance {{ $labels.instance }}
- alert: nodeInJoinMode
expr: scylla_node_operation_mode == 2
for: 5h
Expand Down

0 comments on commit 4bd2ec8

Please sign in to comment.