Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Prometheus Rule for automated alerts #193

Merged
merged 4 commits into from
Mar 1, 2024
Merged

Conversation

itay-grudev
Copy link
Collaborator

@itay-grudev itay-grudev commented Feb 22, 2024

Adds the following alerts and associated runbooks:

  • CNPGClusterHACritical - No ready replicas
  • CNPGClusterHAWarning - Less than 2 ready replicas
  • CNPGClusterOffline - No ready cluster instances at all
  • CNPGClusterHighConnectionsWarning - connections to any instance exceeds 85% of its capacity.
  • CNPGClusterHighConnectionsCritical - connections to any instance exceeds 95% of its capacity.
  • CNPGClusterHighReplicationLag - Replication lag exceeds 1s,
  • CNPGClusterInstancesOnSameNode - Cluster instances scheduled on the same node
  • CNPGClusterZoneSpreadWarning - Cluster instances scheduled in the same zone
  • CNPGClusterLowDiskSpaceWarning - Any cluster disk exceeds 70%
  • CNPGClusterLowDiskSpaceCritical - Any cluster disk exceeds 90%

Alertmanager alerts

Closes #192

Signed-off-by: Gabriele Bartolini <[email protected]>
@itay-grudev itay-grudev merged commit b2088c4 into main Mar 1, 2024
4 checks passed
@itay-grudev itay-grudev deleted the dev/192 branch March 1, 2024 10:02
itay-grudev added a commit that referenced this pull request Mar 1, 2024
* Implemented Prometheus Rule for automated alerts (#193)
* Renamed: `cluster.monitoring.enablePodMonitor` to `cluster.monitoring.podMonitor.enabled`
* New configuration option: `cluster.monitoring.prometheusRule.enabled` defaults to `true`

Signed-off-by: Itay Grudev <[email protected]>
itay-grudev added a commit that referenced this pull request Mar 1, 2024
* Implemented Prometheus Rule for automated alerts (#193)
* Renamed: `cluster.monitoring.enablePodMonitor` to `cluster.monitoring.podMonitor.enabled`
* New configuration option: `cluster.monitoring.prometheusRule.enabled` defaults to `true`

Signed-off-by: Itay Grudev <[email protected]>
Co-authored-by: Itay Grudev <[email protected]>
@itay-grudev itay-grudev added the chart( cluster ) Related to the cluster chart label May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chart( cluster ) Related to the cluster chart
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Cluster Chart PrometheusRule alerts
2 participants