Skip to content

Commit

Permalink
docs: minimal changes
Browse files Browse the repository at this point in the history
Signed-off-by: Gabriele Bartolini <[email protected]>
  • Loading branch information
gbartolini committed Mar 1, 2024
1 parent 02a4ee9 commit 7357634
Show file tree
Hide file tree
Showing 8 changed files with 32 additions and 19 deletions.
4 changes: 2 additions & 2 deletions charts/cluster/docs/runbooks/CNPGClusterHACritical.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ Meaning

The `CNPGClusterHACritical` alert is triggered when the CloudNativePG cluster has no ready standby replicas.

This can happen during a normal fail-over or automated minor version upgrades in a cluster with 2 or less
This can happen during either a normal failover or automated minor version upgrades in a cluster with 2 or less
instances. The replaced instance may need some time to catch-up with the cluster primary instance.

This alarm will be always trigger if your cluster is configured to run with only 1 instance. In this case you
This alarm will be always triggered if your cluster is configured to run with only 1 instance. In this case you
may want to silence it.

Impact
Expand Down
2 changes: 1 addition & 1 deletion charts/cluster/docs/runbooks/CNPGClusterHAWarning.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Impact
Having less than two available replicas puts your cluster at risk if another instance fails. The cluster is still able
to operate normally, although the `-ro` and `-r` endpoints operate at reduced capacity.

This can happen during a normal fail-over or automated minor version upgrades. The replaced instance may need some time
This can happen during a normal failover or automated minor version upgrades. The replaced instance may need some time
to catch-up with the cluster primary instance which will trigger the alert if the operation takes more than 5 minutes.

At `0` available ready replicas, a `CNPGClusterHACritical` alert will be triggered.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ CNPGClusterHighConnectionsCritical
Meaning
-------

This alert is triggered when the number of connections to the CNPG cluster instance exceeds 95% of its capacity.
This alert is triggered when the number of connections to the CloudNativePG cluster instance exceeds 95% of its capacity.

Impact
------

At 100% capacity, the CNPG cluster instance will not be able to accept new connections. This will result in a service
At 100% capacity, the CloudNativePG cluster instance will not be able to accept new connections. This will result in a service
disruption.

Diagnosis
Expand All @@ -20,5 +20,5 @@ Use the [CloudNativePG Grafana Dashboard](https://grafana.com/grafana/dashboards
Mitigation
----------

* Increase the maximum number of connections by increasing the `max_connections` Postgresql parameter.
* Increase the maximum number of connections by increasing the `max_connections` PostgreSQL parameter.
* Use connection pooling by enabling PgBouncer to reduce the number of connections to the database.
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ CNPGClusterHighConnectionsWarning
Meaning
-------

This alert is triggered when the number of connections to the CNPG cluster instance exceeds 85% of its capacity.
This alert is triggered when the number of connections to the CloudNativePG cluster instance exceeds 85% of its capacity.

Impact
------

At 100% capacity, the CNPG cluster instance will not be able to accept new connections. This will result in a service
At 100% capacity, the CloudNativePG cluster instance will not be able to accept new connections. This will result in a service
disruption.

Diagnosis
Expand All @@ -20,5 +20,5 @@ Use the [CloudNativePG Grafana Dashboard](https://grafana.com/grafana/dashboards
Mitigation
----------

* Increase the maximum number of connections by increasing the `max_connections` Postgresql parameter.
* Increase the maximum number of connections by increasing the `max_connections` PostgreSQL parameter.
* Use connection pooling by enabling PgBouncer to reduce the number of connections to the database.
4 changes: 2 additions & 2 deletions charts/cluster/docs/runbooks/CNPGClusterHighReplicationLag.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ CNPGClusterHighReplicationLag
Meaning
-------

This alert is triggered when the replication lag of the CNPG cluster exceed `1s`.
This alert is triggered when the replication lag of the CloudNativePG cluster exceed `1s`.

Impact
------
Expand All @@ -21,7 +21,7 @@ High replication lag can be caused by a number of factors, including:
* Network issues
* High load on the primary or replicas
* Long running queries
* Suboptimal Postgres configuration, in particular small numbers of `max_wal_senders`.
* Suboptimal PostgreSQL configuration, in particular small numbers of `max_wal_senders`.

```yaml
kubectl exec --namespace <namespace> --stdin --tty services/<cluster_name>-rw -- psql -c "SELECT * from pg_stat_replication;"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ Mitigation

1. Verify you have more than a single node with no taints, preventing pods to be scheduled there.
2. Verify your [affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/) configuration.
3. For more information, please refer to the ["Scheduling"](https://cloudnative-pg.io/documentation/current/scheduling/) section in the documentation
14 changes: 10 additions & 4 deletions charts/cluster/docs/runbooks/CNPGClusterLowDiskSpaceCritical.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ CNPGClusterLowDiskSpaceCritical
Meaning
-------

This alert is triggered when the disk space on the CNPG cluster exceeds 90%. It can be triggered by either:
This alert is triggered when the disk space on the CloudNativePG cluster exceeds 90%. It can be triggered by either:

* Data PVC
* WAL PVC
* Tablespace PVC
* the PVC hosting the `PGDATA` (`storage` section)
* the PVC hosting WAL files (`walStorage` section), where applicable
* any PVC hosting a tablespace (`tablespaces` section)

Impact
------
Expand All @@ -23,3 +23,9 @@ Diagnosis

Mitigation
----------

If you experience issues with the WAL (Write-Ahead Logging) volume and have
set up continuous archiving, ensure that WAL archiving is functioning
correctly. This is crucial to avoid a buildup of WAL files in the `pg_wal`
folder. Monitor the `cnpg_collector_pg_wal_archive_status` metric, specifically
ensuring that the number of `ready` files does not increase linearly.
14 changes: 10 additions & 4 deletions charts/cluster/docs/runbooks/CNPGClusterLowDiskSpaceWarning.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ CNPGClusterLowDiskSpaceWarning
Meaning
-------

This alert is triggered when the disk space on the CNPG cluster exceeds 70%. It can be triggered by either:
This alert is triggered when the disk space on the CloudNativePG cluster exceeds 90%. It can be triggered by either:

* Data PVC
* WAL PVC
* Tablespace PVC
* the PVC hosting the `PGDATA` (`storage` section)
* the PVC hosting WAL files (`walStorage` section), where applicable
* any PVC hosting a tablespace (`tablespaces` section)

Impact
------
Expand All @@ -23,3 +23,9 @@ Diagnosis

Mitigation
----------

If you experience issues with the WAL (Write-Ahead Logging) volume and have
set up continuous archiving, ensure that WAL archiving is functioning
correctly. This is crucial to avoid a buildup of WAL files in the `pg_wal`
folder. Monitor the `cnpg_collector_pg_wal_archive_status` metric, specifically
ensuring that the number of `ready` files does not increase linearly.

0 comments on commit 7357634

Please sign in to comment.