Skip to content

Commit

Permalink
feat(cluster): Recovery using pg_basebackup (#252)
Browse files Browse the repository at this point in the history
feat(cluster): Recovery using pg_basebackup (#252)

---------

Signed-off-by: Pieter van der Giessen <[email protected]>
Signed-off-by: Itay Grudev <[email protected]>
Co-authored-by: Itay Grudev <[email protected]>
  • Loading branch information
Pionerd and itay-grudev authored Aug 28, 2024
1 parent 17cb83c commit c14ed18
Show file tree
Hide file tree
Showing 18 changed files with 409 additions and 24 deletions.
18 changes: 18 additions & 0 deletions charts/cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,24 @@ refer to the [CloudNativePG Documentation](https://cloudnative-pg.io/documentat
| recovery.google.gkeEnvironment | bool | `false` | |
| recovery.google.path | string | `"/"` | |
| recovery.method | string | `"backup"` | Available recovery methods: * `backup` - Recovers a CNPG cluster from a CNPG backup (PITR supported) Needs to be on the same cluster in the same namespace. * `object_store` - Recovers a CNPG cluster from a barman object store (PITR supported). * `pg_basebackup` - Recovers a CNPG cluster viaa streaming replication protocol. Useful if you want to migrate databases to CloudNativePG, even from outside Kubernetes. # TODO |
| recovery.pgBaseBackup.database | string | `"app"` | |
| recovery.pgBaseBackup.owner | string | `""` | |
| recovery.pgBaseBackup.secret | string | `""` | |
| recovery.pgBaseBackup.source.database | string | `"app"` | |
| recovery.pgBaseBackup.source.host | string | `""` | |
| recovery.pgBaseBackup.source.passwordSecret.create | bool | `false` | Whether to create a secret for the password |
| recovery.pgBaseBackup.source.passwordSecret.key | string | `"password"` | The key in the secret containing the password |
| recovery.pgBaseBackup.source.passwordSecret.name | string | `""` | Name of the secret containing the password |
| recovery.pgBaseBackup.source.passwordSecret.value | string | `""` | The password value to use when creating the secret |
| recovery.pgBaseBackup.source.port | int | `5432` | |
| recovery.pgBaseBackup.source.sslCertSecret.key | string | `""` | |
| recovery.pgBaseBackup.source.sslCertSecret.name | string | `""` | |
| recovery.pgBaseBackup.source.sslKeySecret.key | string | `""` | |
| recovery.pgBaseBackup.source.sslKeySecret.name | string | `""` | |
| recovery.pgBaseBackup.source.sslMode | string | `"verify-full"` | |
| recovery.pgBaseBackup.source.sslRootCertSecret.key | string | `""` | |
| recovery.pgBaseBackup.source.sslRootCertSecret.name | string | `""` | |
| recovery.pgBaseBackup.source.username | string | `""` | |
| recovery.pitrTarget.time | string | `""` | Time in RFC3339 format |
| recovery.provider | string | `"s3"` | One of `s3`, `azure` or `google` |
| recovery.s3.accessKey | string | `""` | |
Expand Down
10 changes: 5 additions & 5 deletions charts/cluster/docs/Recovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,18 @@ You can find more information about the recovery process in the [CNPG documentat
There are 3 types of recovery possible with CNPG:
* Recovery from a backup object in the same Kubernetes namespace.
* Recovery from a Barman Object Store, that could be located anywhere.
* Streaming replication from an operating cluster using `pg_basebackup` (not supported by the chart yet).
* Streaming replication from an operating cluster using `pg_basebackup`.

When performing a recovery you are strongly advised to use the same configuration and PostgreSQL version as the original cluster.

To begin, create a `values.yaml` that contains the following:

1. Set `mode: recovery` to indicate that you want to perform bootstrap the new cluster from an existing one.
2. Set the `recovery.method` to the type of recovery you want to perform.
3. Set either the `recovery.backupName` or the Barman Object Store configuration - i.e. `recovery.provider` and appropriate S3, Azure or GCS configuration.
4. Optionally set the `recovery.pitrTarget.time` in RFC3339 format to perform a point-in-time recovery.
4. Retain the identical PostgreSQL version and configuration as the original cluster.
5. Make sure you don't use the same backup section name as the original cluster. We advise you change the `path` within the storage location if you want to reuse the same storage location/bucket.
3. Set either the `recovery.backupName` or the Barman Object Store configuration - i.e. `recovery.provider` and appropriate S3, Azure or GCS configuration. In case of `pg_basebackup` complete the `recovery.pgBaseBackup` section.
4. Optionally set the `recovery.pitrTarget.time` in RFC3339 format to perform a point-in-time recovery (not applicable for `pgBaseBackup`).
5. Retain the identical PostgreSQL version and configuration as the original cluster.
6. Make sure you don't use the same backup section name as the original cluster. We advise you change the `path` within the storage location if you want to reuse the same storage location/bucket.
One pattern is adding a version number at the end of the path, e.g. `/v1` or `/v2` after each recovery procedure.

Example recovery configurations can be found in the [examples](../examples) directory.
14 changes: 14 additions & 0 deletions charts/cluster/examples/recovery-pg_basebackup.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
mode: "recovery"

recovery:
method: "pg_basebackup"
pgBaseBackup:
sourceHost: "source-db.foo.com"
sourceUsername: "streaming_replica"
existingPasswordSecret: "source-db-replica-password"

cluster:
instances: 1

backups:
enabled: false
47 changes: 32 additions & 15 deletions charts/cluster/templates/NOTES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,22 +41,39 @@ Configuration
{{- range (rest .Values.backups.scheduledBackups) -}}
{{ $scheduledBackups = printf "%s, %s" $scheduledBackups .name }}
{{- end -}}
{{- if eq (len .Values.backups.scheduledBackups) 0 }}
{{- $scheduledBackups = "None" -}}
{{- end -}}

{{- $mode := .Values.mode -}}
{{- $source := "" -}}
{{- if eq .Values.mode "recovery" }}
{{- $mode = printf "%s (%s)" .Values.mode .Values.recovery.method -}}
{{- if eq .Values.recovery.method "pg_basebackup" }}
{{- $source = printf "postgresql://%s@%s:%.0f/%s" .Values.recovery.pgBaseBackup.source.username .Values.recovery.pgBaseBackup.source.host .Values.recovery.pgBaseBackup.source.port .Values.recovery.pgBaseBackup.source.database -}}
{{- end -}}
{{- end -}}

╭───────────────────┬────────────────────────────────────────────────────────╮
│ Configuration │ Value │
┝━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ Cluster mode │ {{ (printf "%-54s" .Values.mode) }} │
│ Type │ {{ (printf "%-54s" .Values.type) }} │
│ Image │ {{ include "cluster.color-info" (printf "%-54s" (include "cluster.imageName" .)) }} │
│ Instances │ {{ include (printf "%s%s" "cluster.color-" $redundancyColor) (printf "%-54s" (toString .Values.cluster.instances)) }} │
│ Backups │ {{ include (printf "%s%s" "cluster.color-" (ternary "ok" "error" .Values.backups.enabled)) (printf "%-54s" (ternary "Enabled" "Disabled" .Values.backups.enabled)) }} │
│ Backup Provider │ {{ (printf "%-54s" (title .Values.backups.provider)) }} │
│ Scheduled Backups │ {{ (printf "%-54s" $scheduledBackups) }} │
│ Storage │ {{ (printf "%-54s" .Values.cluster.storage.size) }} │
│ Storage Class │ {{ (printf "%-54s" (default "Default" .Values.cluster.storage.storageClass)) }} │
│ PGBouncer │ {{ (printf "%-54s" (ternary "Enabled" "Disabled" .Values.pooler.enabled)) }} │
│ Monitoring │ {{ include (printf "%s%s" "cluster.color-" (ternary "ok" "error" .Values.cluster.monitoring.enabled)) (printf "%-54s" (ternary "Enabled" "Disabled" .Values.cluster.monitoring.enabled)) }} │
╰───────────────────┴────────────────────────────────────────────────────────╯
╭───────────────────┬──────────────────────────────────────────────────────────╮
│ Configuration │ Value │
┝━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ Cluster mode │ {{ printf "%-56s" $mode }} │
│ Type │ {{ printf "%-56s" .Values.type }} │
│ Image │ {{ include "cluster.color-info" (printf "%-56s" (include "cluster.imageName" .)) }} │
{{- if eq .Values.mode "recovery" }}
│ Source │ {{ printf "%-56s" $source }} │
{{- end }}
│ Instances │ {{ include (printf "%s%s" "cluster.color-" $redundancyColor) (printf "%-56s" (toString .Values.cluster.instances)) }} │
│ Backups │ {{ include (printf "%s%s" "cluster.color-" (ternary "ok" "error" .Values.backups.enabled)) (printf "%-56s" (ternary "Enabled" "Disabled" .Values.backups.enabled)) }} │
{{- if .Values.backups.enabled }}
│ Backup Provider │ {{ printf "%-56s" (title .Values.backups.provider) }} │
│ Scheduled Backups │ {{ printf "%-56s" $scheduledBackups }} │
{{- end }}
│ Storage │ {{ printf "%-56s" .Values.cluster.storage.size }} │
│ Storage Class │ {{ printf "%-56s" (default "Default" .Values.cluster.storage.storageClass) }} │
│ PGBouncer │ {{ printf "%-56s" (ternary "Enabled" "Disabled" .Values.pooler.enabled) }} │
│ Monitoring │ {{ include (printf "%s%s" "cluster.color-" (ternary "ok" "error" .Values.cluster.monitoring.enabled)) (printf "%-56s" (ternary "Enabled" "Disabled" .Values.cluster.monitoring.enabled)) }} │
╰───────────────────┴──────────────────────────────────────────────────────────╯

{{ if not .Values.backups.enabled }}
{{- include "cluster.color-error" "Warning! Backups not enabled. Recovery will not be possible! Do not use this configuration in production.\n" }}
Expand Down
47 changes: 46 additions & 1 deletion charts/cluster/templates/_bootstrap.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,50 @@ bootstrap:
{{- end -}}
{{- else if eq .Values.mode "recovery" -}}
bootstrap:
{{- if eq .Values.recovery.method "pg_basebackup" }}
pg_basebackup:
source: pgBaseBackupSource
{{ with .Values.recovery.pgBaseBackup.database }}
database: {{ . }}
{{- end }}
{{ with .Values.recovery.pgBaseBackup.owner }}
owner: {{ . }}
{{- end }}
{{ with .Values.recovery.pgBaseBackup.secret }}
secret:
{{- toYaml . | nindent 6 }}
{{- end }}

externalClusters:
- name: pgBaseBackupSource
connectionParameters:
host: {{ .Values.recovery.pgBaseBackup.source.host | quote }}
port: {{ .Values.recovery.pgBaseBackup.source.port | quote }}
user: {{ .Values.recovery.pgBaseBackup.source.username | quote }}
dbname: {{ .Values.recovery.pgBaseBackup.source.database | quote }}
sslmode: {{ .Values.recovery.pgBaseBackup.source.sslMode | quote }}
{{- if .Values.recovery.pgBaseBackup.source.passwordSecret.name }}
password:
name: {{ default (printf "%s-pg-basebackup-password" (include "cluster.fullname" .)) .Values.recovery.pgBaseBackup.source.passwordSecret.name }}
key: {{ .Values.recovery.pgBaseBackup.source.passwordSecret.key }}
{{- end }}
{{- if .Values.recovery.pgBaseBackup.source.sslKeySecret.name }}
sslKey:
name: {{ .Values.recovery.pgBaseBackup.source.sslKeySecret.name }}
key: {{ .Values.recovery.pgBaseBackup.source.sslKeySecret.key }}
{{- end }}
{{- if .Values.recovery.pgBaseBackup.source.sslCertSecret.name }}
sslCert:
name: {{ .Values.recovery.pgBaseBackup.source.sslCertSecret.name }}
key: {{ .Values.recovery.pgBaseBackup.source.sslCertSecret.key }}
{{- end }}
{{- if .Values.recovery.pgBaseBackup.source.sslRootCertSecret.name }}
sslRootCert:
name: {{ .Values.recovery.pgBaseBackup.source.sslRootCertSecret.name }}
key: {{ .Values.recovery.pgBaseBackup.source.sslRootCertSecret.key }}
{{- end }}

{{- else }}
recovery:
{{- with .Values.recovery.pitrTarget.time }}
recoveryTarget:
Expand All @@ -38,9 +82,10 @@ bootstrap:
externalClusters:
- name: objectStoreRecoveryCluster
barmanObjectStore:
serverName: {{ default (include "cluster.fullname" .) .Values.recovery.clusterName }}
serverName: {{ .Values.recovery.clusterName }}
{{- $d := dict "chartFullname" (include "cluster.fullname" .) "scope" .Values.recovery "secretPrefix" "recovery" -}}
{{- include "cluster.barmanObjectStoreConfig" $d | nindent 4 }}
{{- end }}
{{- else }}
{{ fail "Invalid cluster mode!" }}
{{- end }}
Expand Down
8 changes: 8 additions & 0 deletions charts/cluster/templates/recovery-pg_basebackup-password.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{{- if and (eq .Values.mode "recovery") (eq .Values.recovery.method "pg_basebackup") .Values.recovery.pgBaseBackup.source.passwordSecret.create }}
apiVersion: v1
kind: Secret
metadata:
name: {{ default (printf "%s-pg-basebackup-password" (include "cluster.fullname" .)) .Values.recovery.pgBaseBackup.source.passwordSecret.name }}
data:
{{ .Values.recovery.pgBaseBackup.source.passwordSecret.key }}: {{ required ".Values.recovery.pgBaseBackup.source.passwordSecret.value required when creating a password secret." .Values.recovery.pgBaseBackup.source.passwordSecret.value | b64enc | quote }}
{{- end }}
5 changes: 2 additions & 3 deletions charts/cluster/test/monitoring/chainsaw-test.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
##
# This is a test that verifies that non-default configuration options are correctly propagated to the CNPG cluster.
# P.S. This test is not designed to have a good running configuration, it is designed to test the configuration propagation!
# This is a test that checks if PodMonitors, ConfigMaps and PrometheusRules are correctly provisioned when requested.
apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
Expand All @@ -11,7 +10,7 @@ spec:
assert: 20s
cleanup: 30s
steps:
- name: Install the non-default configuration cluster
- name: Install the monitoring cluster
try:
- script:
content: |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: source-cluster
status:
readyInstances: 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
mode: "standalone"
cluster:
instances: 1
backups:
enabled: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: batch/v1
kind: Job
metadata:
name: data-write
status:
succeeded: 1
30 changes: 30 additions & 0 deletions charts/cluster/test/postgresql-pg_basebackup/01-data_write.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
apiVersion: batch/v1
kind: Job
metadata:
name: data-write
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: data-write
env:
- name: DB_USER
valueFrom:
secretKeyRef:
name: source-cluster-superuser
key: username
- name: DB_PASS
valueFrom:
secretKeyRef:
name: source-cluster-superuser
key: password
- name: DB_URI
value: postgres://$(DB_USER):$(DB_PASS)@source-cluster-rw:5432
image: alpine:3.19
command: ['sh', '-c']
args:
- |
apk --no-cache add postgresql-client kubectl
psql "$DB_URI" -c "CREATE DATABASE mygooddb;"
psql "$DB_URI/mygooddb" -c "CREATE TABLE mygoodtable (id serial PRIMARY KEY);"
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-basebackup-cluster
status:
readyInstances: 2
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
mode: "recovery"
recovery:
method: "pg_basebackup"
pgBaseBackup:
source:
host: "source-cluster-rw"
database: "mygooddb"
username: "streaming_replica"
sslMode: "require"
sslKeySecret:
name: source-cluster-replication
key: tls.key
sslCertSecret:
name: source-cluster-replication
key: tls.crt

cluster:
instances: 2

backups:
enabled: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: batch/v1
kind: Job
metadata:
name: data-test
status:
succeeded: 1
23 changes: 23 additions & 0 deletions charts/cluster/test/postgresql-pg_basebackup/03-data_test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
apiVersion: batch/v1
kind: Job
metadata:
name: data-test
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: data-test
env:
- name: DB_URI
valueFrom:
secretKeyRef:
name: pg-basebackup-cluster-superuser
key: uri
image: alpine:3.19
command: ['sh', '-c']
args:
- |
apk --no-cache add postgresql-client
DB_URI=$(echo $DB_URI | sed "s|/\*|/|" )
test "$(psql "${DB_URI}mygooddb" -t -c 'SELECT EXISTS (SELECT FROM information_schema.tables WHERE table_name = $$mygoodtable$$)' --csv -q 2>/dev/null)" = "t"
64 changes: 64 additions & 0 deletions charts/cluster/test/postgresql-pg_basebackup/chainsaw-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
##
# This is a test that provisions a regular (non CNPG) PostgreSQL cluster and attempts to perform a pg_basebackup recovery.
apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
name: postgresql-pg-basebackup
spec:
timeouts:
apply: 1s
assert: 2m
cleanup: 1m
steps:
- name: Install the external PostgreSQL cluster
try:
- script:
content: |
helm upgrade \
--install \
--namespace $NAMESPACE \
--values ./00-source-cluster.yaml \
--wait \
source ../../
- assert:
file: ./00-source-cluster-assert.yaml
- apply:
file: ./01-data_write.yaml
- assert:
file: ./01-data_write-assert.yaml
- name: Install the pg_basebackup cluster
timeouts:
assert: 5m
try:
- script:
content: |
helm upgrade \
--install \
--namespace $NAMESPACE \
--values ./02-pg_basebackup-cluster.yaml \
--wait \
pg-basebackup ../../
- assert:
file: ./02-pg_basebackup-cluster-assert.yaml
catch:
- describe:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
- name: Verify the data from step 1 exists
try:
- apply:
file: ./03-data_test.yaml
- assert:
file: ./03-data_test-assert.yaml
catch:
- describe:
apiVersion: batch/v1
kind: Job
- podLogs:
selector: batch.kubernetes.io/job-name=data-test
- name: Cleanup
try:
- script:
content: |
helm uninstall --namespace $NAMESPACE source
helm uninstall --namespace $NAMESPACE pg-basebackup
Loading

0 comments on commit c14ed18

Please sign in to comment.