Skip to content

Commit

Permalink
Merge branch 'main' into kuttl
Browse files Browse the repository at this point in the history
Signed-off-by: Itay Grudev <[email protected]>
  • Loading branch information
itay-grudev committed Mar 28, 2024
2 parents af5ab64 + 0a85ff4 commit 0e26b38
Show file tree
Hide file tree
Showing 19 changed files with 277 additions and 172 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
with:
version: v3.4.0

- uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5
- uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5.1.0
with:
python-version: 3.7

Expand Down
4 changes: 2 additions & 2 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ In order to create a new release of the `cloudnative-pg` chart, follow these ste
```
3. Create a branch named `release/cloudnative-pg-vX.Y.Z` and switch to it:
```bash
git checkout -b release/cloudnative-pg-v$NEW_VERSION
git switch --create release/cloudnative-pg-v$NEW_VERSION
```
4. Update the `.version` in the [Chart.yaml](./charts/cloudnative-pg/Chart.yaml) file to `"X.Y.Z"`
```bash
Expand Down Expand Up @@ -111,7 +111,7 @@ In order to create a new release of the `cluster` chart, follow these steps:
```
3. Create a branch: named `release/cluster-vX.Y.Z` and switch to it
```bash
git checkout -b release/cluster-v$NEW_VERSION
git switch --create release/cluster-v$NEW_VERSION
```
4. Update the `.version` in the [Chart.yaml](./charts/cluster/Chart.yaml) file to `"X.Y.Z"`
```bash
Expand Down
2 changes: 1 addition & 1 deletion charts/cluster/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ name: cluster
description: Deploys and manages a CloudNativePG cluster and its associated resources.
icon: https://raw.githubusercontent.com/cloudnative-pg/artwork/main/cloudnativepg-logo.svg
type: application
version: 0.0.3
version: 0.0.5
sources:
- https://github.com/cloudnative-pg/charts
keywords:
Expand Down
8 changes: 6 additions & 2 deletions charts/cluster/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# cluster

![Version: 0.0.3](https://img.shields.io/badge/Version-0.0.3-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square)
![Version: 0.0.5](https://img.shields.io/badge/Version-0.0.5-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square)

> **Warning**
> ### This chart is under active development.
Expand Down Expand Up @@ -125,7 +125,8 @@ refer to the [CloudNativePG Documentation](https://cloudnative-pg.io/documentat
| backups.data.jobs | int | `2` | Number of data files to be archived or restored in parallel. |
| backups.destinationPath | string | `""` | Overrides the provider specific default path. Defaults to: S3: s3://<bucket><path> Azure: https://<storageAccount>.<serviceName>.core.windows.net/<clusterName><path> Google: gs://<bucket><path> |
| backups.enabled | bool | `false` | You need to configure backups manually, so backups are disabled by default. |
| backups.endpointCA | object | `{}` | Specify a secret containing the CA bundle of the barman endpoint. Useful when using self-signed certificates to avoid errors with certificate issuer and barman-cloud-wal-archive |
| backups.endpointCA | object | `{"create":false,"key":"","name":"","value":""}` | Specifies a CA bundle to validate a privately signed certificate. |
| backups.endpointCA.create | bool | `false` | Creates a secret with the given value if true, otherwise uses an existing secret. |
| backups.endpointURL | string | `""` | Overrides the provider specific default endpoint. Defaults to: S3: https://s3.<region>.amazonaws.com" |
| backups.google.applicationCredentials | string | `""` | |
| backups.google.bucket | string | `""` | |
Expand Down Expand Up @@ -161,6 +162,7 @@ refer to the [CloudNativePG Documentation](https://cloudnative-pg.io/documentat
| cluster.monitoring.enabled | bool | `false` | Whether to enable monitoring |
| cluster.monitoring.podMonitor.enabled | bool | `true` | Whether to enable the PodMonitor |
| cluster.monitoring.prometheusRule.enabled | bool | `true` | Whether to enable the PrometheusRule automated alerts |
| cluster.monitoring.prometheusRule.excludeRules | list | `[]` | Exclude specified rules |
| cluster.postgresGID | int | `26` | The GID of the postgres user inside the image, defaults to 26 |
| cluster.postgresUID | int | `26` | The UID of the postgres user inside the image, defaults to 26 |
| cluster.postgresql | object | `{}` | Configuration of the PostgreSQL server. See: https://cloudnative-pg.io/documentation/current/cloudnative-pg.v1/#postgresql-cnpg-io-v1-PostgresConfiguration |
Expand Down Expand Up @@ -193,6 +195,8 @@ refer to the [CloudNativePG Documentation](https://cloudnative-pg.io/documentat
| recovery.backupName | string | `""` | Backup Recovery Method |
| recovery.clusterName | string | `""` | Object Store Recovery Method |
| recovery.destinationPath | string | `""` | Overrides the provider specific default path. Defaults to: S3: s3://<bucket><path> Azure: https://<storageAccount>.<serviceName>.core.windows.net/<clusterName><path> Google: gs://<bucket><path> |
| recovery.endpointCA | object | `{"create":false,"key":"","name":"","value":""}` | Specifies a CA bundle to validate a privately signed certificate. |
| recovery.endpointCA.create | bool | `false` | Creates a secret with the given value if true, otherwise uses an existing secret. |
| recovery.endpointURL | string | `""` | Overrides the provider specific default endpoint. Defaults to: S3: https://s3.<region>.amazonaws.com" Leave empty if using the default S3 endpoint |
| recovery.google.applicationCredentials | string | `""` | |
| recovery.google.bucket | string | `""` | |
Expand Down
24 changes: 24 additions & 0 deletions charts/cluster/prometheus_rules/cluster-ha-critical.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{{- $alert := "CNPGClusterHACritical" -}}
{{- if not (has $alert .excludeRules) -}}
alert: {{ $alert }}
annotations:
summary: CNPG Cluster has no standby replicas!
description: |-
CloudNativePG Cluster "{{ .labels.job }}" has no ready standby replicas. Your cluster at a severe
risk of data loss and downtime if the primary instance fails.
The primary instance is still online and able to serve queries, although connections to the `-ro` endpoint
will fail. The `-r` endpoint os operating at reduced capacity and all traffic is being served by the main.
This can happen during a normal fail-over or automated minor version upgrades in a cluster with 2 or less
instances. The replaced instance may need some time to catch-up with the cluster primary instance.
This alarm will be always trigger if your cluster is configured to run with only 1 instance. In this
case you may want to silence it.
runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterHACritical.md
expr: |
max by (job) (cnpg_pg_replication_streaming_replicas{namespace="{{ .namespace }}"} - cnpg_pg_replication_is_wal_receiver_up{namespace="{{ .namespace }}"}) < 1
for: 5m
labels:
severity: critical
{{- end -}}
22 changes: 22 additions & 0 deletions charts/cluster/prometheus_rules/cluster-ha-warning.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{{- $alert := "CNPGClusterHAWarning" -}}
{{- if not (has $alert .excludeRules) -}}
alert: {{ $alert }}
annotations:
summary: CNPG Cluster less than 2 standby replicas.
description: |-
CloudNativePG Cluster "{{ .labels.job }}" has only {{ .value }} standby replicas, putting
your cluster at risk if another instance fails. The cluster is still able to operate normally, although
the `-ro` and `-r` endpoints operate at reduced capacity.
This can happen during a normal fail-over or automated minor version upgrades. The replaced instance may
need some time to catch-up with the cluster primary instance.
This alarm will be constantly triggered if your cluster is configured to run with less than 3 instances.
In this case you may want to silence it.
runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterHAWarning.md
expr: |
max by (job) (cnpg_pg_replication_streaming_replicas{namespace="{{ .namespace }}"} - cnpg_pg_replication_is_wal_receiver_up{namespace="{{ .namespace }}"}) < 2
for: 5m
labels:
severity: warning
{{- end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{{- $alert := "CNPGClusterHighConnectionsCritical" -}}
{{- if not (has $alert .excludeRules) -}}
alert: {{ $alert }}
annotations:
summary: CNPG Instance maximum number of connections critical!
description: |-
CloudNativePG Cluster "{{ .cluster }}" instance {{ .labels.pod }} is using {{ .value }}% of
the maximum number of connections.
runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterHighConnectionsCritical.md
expr: |
sum by (pod) (cnpg_backends_total{namespace=~"{{ .namespace }}", pod=~"{{ .podSelector }}"}) / max by (pod) (cnpg_pg_settings_setting{name="max_connections", namespace=~"{{ .namespace }}", pod=~"{{ .podSelector }}"}) * 100 > 95
for: 5m
labels:
severity: critical
{{- end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{{- $alert := "CNPGClusterHighConnectionsWarning" -}}
{{- if not (has $alert .excludeRules) -}}
alert: {{ $alert }}
annotations:
summary: CNPG Instance is approaching the maximum number of connections.
description: |-
CloudNativePG Cluster "{{ .cluster }}" instance {{ .labels.pod }} is using {{ .value }}% of
the maximum number of connections.
runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterHighConnectionsWarning.md
expr: |
sum by (pod) (cnpg_backends_total{namespace=~"{{ .namespace }}", pod=~"{{ .podSelector }}"}) / max by (pod) (cnpg_pg_settings_setting{name="max_connections", namespace=~"{{ .namespace }}", pod=~"{{ .podSelector }}"}) * 100 > 80
for: 5m
labels:
severity: warning
{{- end -}}
17 changes: 17 additions & 0 deletions charts/cluster/prometheus_rules/cluster-high_replication_lag.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{{- $alert := "CNPGClusterHighReplicationLag" -}}
{{- if not (has $alert .excludeRules) -}}
alert: {{ $alert }}
annotations:
summary: CNPG Cluster high replication lag
description: |-
CloudNativePG Cluster "{{ .cluster }}" is experiencing a high replication lag of
{{ .value }}ms.
High replication lag indicates network issues, busy instances, slow queries or suboptimal configuration.
runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterHighReplicationLag.md
expr: |
max(cnpg_pg_replication_lag{namespace=~"{{ .namespace }}",pod=~"{{ .podSelector }}"}) * 1000 > 1000
for: 5m
labels:
severity: warning
{{- end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{{- $alert := "CNPGClusterInstancesOnSameNode" -}}
{{- if not (has $alert .excludeRules) -}}
alert: {{ $alert }}
annotations:
summary: CNPG Cluster instances are located on the same node.
description: |-
CloudNativePG Cluster "{{ .cluster }}" has {{ .value }}
instances on the same node {{ .labels.node }}.
A failure or scheduled downtime of a single node will lead to a potential service disruption and/or data loss.
runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterInstancesOnSameNode.md
expr: |
count by (node) (kube_pod_info{namespace=~"{{ .namespace }}", pod=~"{{ .podSelector }}"}) > 1
for: 5m
labels:
severity: warning
{{- end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{{- $alert := "CNPGClusterLowDiskSpaceCritical" -}}
{{- if not (has $alert .excludeRules) -}}
alert: {{ $alert }}
annotations:
summary: CNPG Instance is running out of disk space!
description: |-
CloudNativePG Cluster "{{ .cluster }}" is running extremely low on disk space. Check attached PVCs!
runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterLowDiskSpaceCritical.md
expr: |
max(max by(persistentvolumeclaim) (1 - kubelet_volume_stats_available_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}"} / kubelet_volume_stats_capacity_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}"})) > 0.9 OR
max(max by(persistentvolumeclaim) (1 - kubelet_volume_stats_available_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}-wal"} / kubelet_volume_stats_capacity_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}-wal"})) > 0.9 OR
max(sum by (namespace,persistentvolumeclaim) (kubelet_volume_stats_used_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}-tbs.*"})
/
sum by (namespace,persistentvolumeclaim) (kubelet_volume_stats_capacity_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}-tbs.*"})
*
on(namespace, persistentvolumeclaim) group_left(volume)
kube_pod_spec_volumes_persistentvolumeclaims_info{pod=~"{{ .podSelector }}"}
) > 0.9
for: 5m
labels:
severity: critical
{{- end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{{- $alert := "CNPGClusterLowDiskSpaceWarning" -}}
{{- if not (has $alert .excludeRules) -}}
alert: {{ $alert }}
annotations:
summary: CNPG Instance is running out of disk space.
description: |-
CloudNativePG Cluster "{{ .cluster }}" is running low on disk space. Check attached PVCs.
runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterLowDiskSpaceWarning.md
expr: |
max(max by(persistentvolumeclaim) (1 - kubelet_volume_stats_available_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}"} / kubelet_volume_stats_capacity_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}"})) > 0.7 OR
max(max by(persistentvolumeclaim) (1 - kubelet_volume_stats_available_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}-wal"} / kubelet_volume_stats_capacity_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}-wal"})) > 0.7 OR
max(sum by (namespace,persistentvolumeclaim) (kubelet_volume_stats_used_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}-tbs.*"})
/
sum by (namespace,persistentvolumeclaim) (kubelet_volume_stats_capacity_bytes{namespace="{{ .namespace }}", persistentvolumeclaim=~"{{ .podSelector }}-tbs.*"})
*
on(namespace, persistentvolumeclaim) group_left(volume)
kube_pod_spec_volumes_persistentvolumeclaims_info{pod=~"{{ .podSelector }}"}
) > 0.7
for: 5m
labels:
severity: warning
{{- end -}}
17 changes: 17 additions & 0 deletions charts/cluster/prometheus_rules/cluster-offline.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{{- $alert := "CNPGClusterOffline" -}}
{{- if not (has $alert .excludeRules) -}}
alert: {{ $alert }}
annotations:
summary: CNPG Cluster has no running instances!
description: |-
CloudNativePG Cluster "{{ .labels.job }}" has no ready instances.
Having an offline cluster means your applications will not be able to access the database, leading to
potential service disruption and/or data loss.
runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterOffline.md
expr: |
({{ .Values.cluster.instances }} - count(cnpg_collector_up{namespace=~"{{ .namespace }}",pod=~"{{ .podSelector }}"}) OR vector(0)) > 0
for: 5m
labels:
severity: critical
{{- end -}}
16 changes: 16 additions & 0 deletions charts/cluster/prometheus_rules/cluster-zone_spread-warning.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{{- $alert := "CNPGClusterZoneSpreadWarning" -}}
{{- if not (has $alert .excludeRules) -}}
alert: {{ $alert }}
annotations:
summary: CNPG Cluster instances in the same zone.
description: |-
CloudNativePG Cluster "{{ .cluster }}" has instances in the same availability zone.
A disaster in one availability zone will lead to a potential service disruption and/or data loss.
runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterZoneSpreadWarning.md
expr: |
{{ .Values.cluster.instances }} > count(count by (label_topology_kubernetes_io_zone) (kube_pod_info{namespace=~"{{ .namespace }}", pod=~"{{ .podSelector }}"} * on(node,instance) group_left(label_topology_kubernetes_io_zone) kube_node_labels)) < 3
for: 5m
labels:
severity: warning
{{- end -}}
9 changes: 7 additions & 2 deletions charts/cluster/templates/_barman_object_store.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@
endpointURL: {{ . }}
{{- end }}

{{- with .scope.endpointCA }}
{{- if or (.scope.endpointCA.create) (.scope.endpointCA.name) }}
endpointCA:
{{- . | toYaml | nindent 4 }}
name: {{ .chartFullname }}-ca-bundle
key: ca-bundle.crt
{{- end }}

{{- if .scope.destinationPath }}
destinationPath: {{ .scope.destinationPath }}
{{- end }}

{{- with .scope.destinationPath }}
Expand Down
9 changes: 9 additions & 0 deletions charts/cluster/templates/ca-bundle.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{{- if .Values.backups.endpointCA.create }}
apiVersion: v1
kind: Secret
metadata:
name: {{ .Values.backups.endpointCA.name | default (printf "%s-ca-bundle" (include "cluster.fullname" .)) | quote }}
data:
{{ .Values.backups.endpointCA.key | default "ca-bundle.crt" | quote }}: {{ .Values.backups.endpointCA.value }}

{{- end }}
Loading

0 comments on commit 0e26b38

Please sign in to comment.