Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG4K PGD 1.0.0 release #4710

Merged
merged 43 commits into from
Apr 23, 2024
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
7bdf43e
test import for 1.0
josh-heyer Aug 28, 2023
a4a0ef1
Update index.mdx:
kelpoole Sep 7, 2023
0883316
sync with 0.7.1; add to homepage
josh-heyer Sep 8, 2023
bdcfd14
Update release_notes.mdx
kelpoole Sep 8, 2023
1ad7922
First set of edits to pgd for kubernetes doc
ebgitelman Sep 12, 2023
d6894ec
Fix URLs in api doc
josh-heyer Apr 3, 2024
3f0e6a0
v1 import (April 22nd 2024)
josh-heyer Feb 16, 2024
3c227ec
Second set of edits on pgd for kubernetes
ebgitelman Sep 12, 2023
41aa8ec
Standardize release notes
josh-heyer Feb 16, 2024
6bcd0f6
DOCS-192: link fixes
josh-heyer Mar 20, 2024
27f4da9
Known Issues
gvasquezvargas Apr 8, 2024
1dd17d3
fixing format
gvasquezvargas Apr 8, 2024
638fd17
Update known_issues.mdx
gvasquezvargas Apr 8, 2024
949838c
Update known_issues.mdx
gvasquezvargas Apr 10, 2024
a61b6ea
adding new known issues for DOCS-363 and DOCS-364
gvasquezvargas Apr 15, 2024
9336583
tech review feedback
gvasquezvargas Apr 16, 2024
ae1624a
adding section for DOCS-371 + feedback implementation
gvasquezvargas Apr 16, 2024
f428776
technical review
gvasquezvargas Apr 16, 2024
6c5f814
Corrected typos on known_issues.mdx
gvasquezvargas Apr 22, 2024
f2664c4
Update known_issues.mdx
gvasquezvargas Apr 22, 2024
df0d1ec
Update known_issues.mdx
gvasquezvargas Apr 22, 2024
2319b7a
Implementing feedback on upgrades from Jaime
gvasquezvargas Apr 22, 2024
c9bc2b3
initial draft of RN notes with teams style guide
gvasquezvargas Apr 22, 2024
5fe80de
Clarification of app services
gvasquezvargas Apr 22, 2024
12b4935
implementing feedback from review and updating release date on index …
gvasquezvargas Apr 22, 2024
bc13cd1
Submariner for distros with multiple AZ
gvasquezvargas Apr 3, 2024
4fdfc8c
Update connectivity.mdx
gvasquezvargas Apr 3, 2024
1d6b99d
Apply suggestions from code review
gvasquezvargas Apr 3, 2024
36f936d
spaces in note titles
gvasquezvargas Apr 4, 2024
03f4a77
Apply suggestions from code review
gvasquezvargas Apr 5, 2024
926e6e0
implemented Djs feedback
gvasquezvargas Apr 5, 2024
450a470
Merge pull request #5482 from EnterpriseDB/docs/pg4k-pgd/known-issues
josh-heyer Apr 22, 2024
ce5d84a
Merge pull request #5524 from EnterpriseDB/docs432/release-notes/1.0.0
josh-heyer Apr 22, 2024
84e3da5
Merge pull request #5467 from EnterpriseDB/pg4k-pgd/submariner
josh-heyer Apr 22, 2024
20c3815
Editorial review of new pgd4pk content
ebgitelman Apr 2, 2024
bd82643
Additional editorial changes
ebgitelman Apr 4, 2024
c42e734
Second read of pgd4k content after rebase
ebgitelman Apr 4, 2024
d84f8aa
Apply suggestions from code review
josh-heyer Apr 22, 2024
a0ce2ee
Merge pull request #5474 from EnterpriseDB/docs/editorial_markup
josh-heyer Apr 22, 2024
c1d0965
Fix links to PG4K API docs
josh-heyer Apr 23, 2024
a893c0a
Correct the type of node in definition
josh-heyer Apr 23, 2024
688e0a7
Allow searching!
josh-heyer Apr 23, 2024
934fd3f
PGD4K approved abbreviation by Stephen
gvasquezvargas Apr 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,196 @@ title: 'Architecture'
originalFilePath: 'src/architecture.md'
---

This section covers the main architectural aspects you need to consider
when deploying EDB Postgres Distributed for Kubernetes (PG4K-PGD).
Consider these main architectural aspects
when deploying EDB Postgres Distributed in Kubernetes.

PG4K-PGD can be installed within a [single Kubernetes cluster](#single-kubernetes-cluster)
or across [multiple Kubernetes clusters](#multiple-kubernetes-clusters) - normally
in different regions.
EDB Postgres Distributed for Kubernetes is a
[Kubernetes operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)
designed to deploy and manage EDB Postgres Distributed clusters
running in private, public, hybrid, or multi-cloud environments.

## Relationship with EDB Postgres Distributed

[EDB Postgres Distributed (PGD)](https://www.enterprisedb.com/docs/pgd/latest/)
is a multi-master implementation of Postgres designed for high performance and
availability.
PGD generally requires deployment using
[Trusted Postgres Architect (TPA)](/pgd/latest/tpa/),
a tool that uses [Ansible](https://www.ansible.com) to provision and
deploy PGD clusters.

EDB Postgres Distributed for Kubernetes offers a different way of deploying PGD clusters, leveraging containers
and Kubernetes. The advantages are that the resulting architecture:

- Is self-healing and robust.
- Is managed through declarative configuration.
- Takes advantage of the vast and growing Kubernetes ecosystem.

## Relationship with EDB Postgres for Kubernetes

A PGD cluster is made up by one or more PGD groups, each having one or more PGD
nodes. A PGD node in Kubernetes can be assimilated to a Postgres cluster
without any physical replicas.
PG4K-PGD internally manages each PGD node using the `Cluster` resource as
defined by EDB Postgres for Kubernetes (PG4K), specifically a `Cluster` with
`.spec.instances = 1`.
A PGD cluster consists of one or more *PGD groups*, each having one or more *PGD
nodes*. A PGD node is a Postgres database. EDB Postgres Distributed for Kubernetes internally
manages each PGD node using the `Cluster` resource as defined by EDB Postgres
for Kubernetes, specifically a cluster with a single instance (that is, no
replicas).

You can configure the single PostgreSQL instance created by each `Cluster` in the
[`.spec.cnp` section](pg4k-pgd.v1beta1.md#pgd-k8s-enterprisedb-io-v1beta1-CnpConfiguration)
of the PGD Group spec.

In EDB Postgres Distributed for Kubernetes, as in EDB Postgres for Kubernetes, the underlying database implementation is responsible
for data replication. However, it's important to note that failover and
switchover work differently, entailing Raft election and nominating new
write leaders. EDB Postgres for Kubernetes handles only the deployment and healing of data nodes.

## Managing PGD using EDB Postgres Distributed for Kubernetes

The EDB Postgres Distributed for Kubernetes operator can manage the complete lifecycle of PGD clusters. As
such, in addition to PGD nodes (represented as single-instance `Clusters`), it
needs to manage other objects associated with PGD.

PGD relies on the Raft algorithm for distributed consensus to manage node
metadata, specifically agreement on a *write leader*. Consensus among data
nodes is also required for operations such as generating new global sequences
or performing distributed DDL.

These considerations force additional actors in PGD above database nodes.

EDB Postgres Distributed for Kubernetes manages the following:

- Data nodes. A node is a database and is managed
by EDB Postgres for Kubernetes, creating a `Cluster` with a single instance.
- [Witness nodes](https://www.enterprisedb.com/docs/pgd/latest/nodes/#witness-nodes)
are basic database instances that don't participate in data
replication. Their function is to guarantee that consensus is possible in
groups with an even number of data nodes or after network partitions. Witness
nodes are also managed using a single-instance `Cluster` resource.
- [PGD proxies](https://www.enterprisedb.com/docs/pgd/latest/routing/proxy/)
act as Postgres proxies with knowledge of the write leader. PGD proxies need
information from Raft to route writes to the current write leader.

### Proxies and routing

PGD groups assume full mesh connectivity of PGD nodes. Each node must be able to
connect to every other node using the appropriate connection string (a
`libpq`-style DSN). Write operations don't need to be sent to every node. PGD
takes care of replicating data after it's committed to one node.

For performance, we often recommend sending write operations mostly to a
single node, the *write leader*. Raft is used to identify which node is the
write leader and to hold metadata about the PGD nodes. PGD proxies are used to
transparently route writes to write leaders and to quickly pivot to the new
write leader in case of switchover or failover.

It's possible to configure *Raft subgroups*, each of which can maintain a
separate write leader. In EDB Postgres Distributed for Kubernetes, a PGD group containing a PGD proxy
comprises a Raft subgroup.

Two kinds of routing are available with PGD proxies:

- Global routing uses the top-level Raft group and maintains one global write
leader.
- Local routing uses subgroups to maintain separate write leaders. Local
routing is often used to achieve geographical separation of writes.

In EDB Postgres Distributed for Kubernetes, local routing is used by default, and a configuration option is
available to select global routing.

For more information, see the
[PGD documentation of routing with Raft](https://www.enterprisedb.com/docs/pgd/latest/routing/raft/).

### PGD architectures and high availability

EDB proposes several recommended architectures to make good use of PGD's
distributed multi-master capabilities and to offer high availability.

The Always On architectures are built from either one group in a single location
or two groups in two separate locations.
See [Choosing your architecture](/pgd/latest/architectures/) in the PGD documentation
for more information.

## Deploying PGD on Kubernetes

EDB Postgres Distributed for Kubernetes leverages Kubernetes to deploy and manage PGD clusters. As such, some
adaptations are necessary to translate PGD into the Kubernetes ecosystem.

### Images and operands

You can configure PGD to run one of three Postgres distributions. See the
[PGD documentation](/pgd/latest/choosing_server/)
to understand the features of each distribution.

To function in Kubernetes, containers are provided for each Postgres
distribution. These are the *operands*.
In addition, the operator images are kept in those same repositories.

See [EDB private image registries](private_registries.md)
for details on accessing the images.

### Kubernetes architecture

Some of the points of the
[PG4K document on Kubernetes architecture](/postgres_for_kubernetes/latest/architecture/)
are reproduced here. See the PG4K documentation for details.

Kubernetes natively provides the possibility to span separate physical locations.
These physical locations are also known as data centers, failure zones, or, more frequently, *availability
zones*. They are connected to each other by way of redundant, low-latency, private network
connectivity.

Being a distributed system, the recommended minimum number of availability zones
for a *Kubernetes cluster* is three. This minimum makes the control plane
resilient to the failure of a single zone. This means that each data center is
active at any time and can run workloads simultaneously.

You can install EDB Postgres Distributed for Kubernetes in a
[single Kubernetes cluster](#single-kubernetes-cluster)
or across
[multiple Kubernetes clusters](#multiple-kubernetes-clusters).

### Single Kubernetes cluster

A multi-availability-zone Kubernetes architecture is typical of Kubernetes
services managed by cloud providers. Such an architecture enables the EDB Postgres Distributed for Kubernetes
and the EDB Postgres for Kubernetes operators to schedule workloads and nodes across availability
zones, considering all zones active.

![Kubernetes cluster spanning over 3 independent data centers](./images/k8s-architecture-3-az.png)

PGD clusters can be deployed in a single Kubernetes cluster and take advantage
of Kubernetes availability zones to enable high-availability architectures,
including the Always On recommended architectures.

You can realize the *Always On Single Location* architecture shown in
[Choosing your architecture](/pgd/latest/architectures/) in the PGD documentation on
a single Kubernetes cluster with three availability zones.

![Always On Single Region](./images/always_on_1x3_updated.png)

The EDB Postgres Distributed for Kubernetes operator can control the scheduling of pods (that is, which pods go
to which data center) using affinity, tolerations, and node selectors, as is the
case with EDB Postgres for Kubernetes. Individual scheduling controls are available for proxies as well
as nodes.

See the
[Kubernetes documentation on scheduling](https://kubernetes.io/docs/concepts/scheduling-eviction/),
and [Scheduling](/postgres_for_kubernetes/latest/scheduling/) in the EDB Postgres for Kubernetes documentation
for more information.

### Multiple Kubernetes clusters

The PostgreSQL instances created by the `Cluster` can be configured in the
[`.spec.cnp` section](api_reference.md#CnpConfiguration).
PGD clusters can also be deployed in multiple Kubernetes clusters that can
reliably communicate with each other.

## Single Kubernetes cluster
![Multiple Kubernetes clusters](./images/k8s-architecture-multi.png)

EDB Postgres Distributed can be executed inside a single Kubernetes cluster.
[Always On multi-location PGD architectures](https://www.enterprisedb.com/docs/pgd/latest/architectures/)
can be realized on multiple Kubernetes clusters that meet the connectivity
requirements.

## Multiple Kubernetes clusters
For more information, see ["Connectivity"](connectivity.md).

EDB Postgres Distributed for Kubernetes can also be deployed in different
Kubernetes clusters that can reliably communicate with each other.
More information can be found in the ["Connectivity"](connectivity.md) section.
!!! Note Regions and availability zones
When creating Kubernetes clusters in different regions or availability zones for cross-regional replication,
ensure the clusters can communicate with each other by enabling network connectivity. Specifically, every service created with a `-node` or `-group` suffix must be discoverable by all other `-node` and `-group` services. You can achieve this by deploying a network connectivity application like
[Submariner](https://submariner.io/) on every cluster.
144 changes: 144 additions & 0 deletions product_docs/docs/postgres_distributed_for_kubernetes/1/backup.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
---
title: 'Backup on object stores'
originalFilePath: 'src/backup.md'
---

EDB Postgres Distributed for Kubernetes supports *online/hot backup* of
PGD clusters through physical backup and WAL archiving on an object store.
This means that the database is always up (no downtime required) and that
point-in-time recovery (PITR) is available.

## Common object stores

Multiple object stores are supported, such as AWS S3, Microsoft Azure Blob Storage,
Google Cloud Storage, MinIO Gateway, or any S3-compatible provider.
Given that EDB Postgres Distributed for Kubernetes configures the connection with object stores by relying on
EDB Postgres for Kubernetes, see the [EDB Postgres for Kubernetes cloud provider support](/postgres_for_kubernetes/latest/backup_recovery/#cloud-provider-support)
documentation for more information.

!!! Important
The EDB Postgres for Kubernetes documentation's Cloud Provider configuration section is
available at `spec.backup.barmanObjectStore`. In EDB Postgres Distributed for Kubernetes examples, the object store section is at a
different path: `spec.backup.configuration.barmanObjectStore`.

## WAL archive

WAL archiving is the process that sends WAL files to the object storage, and it's essential to
execute online/hot backups or PITR.
In EDB Postgres Distributed for Kubernetes, each PGD node is set up to archive WAL files in the object store independently.

The WAL archive is defined in the PGD Group `spec.backup.configuration.barmanObjectStore` stanza,
and is enabled as soon as a destination path and cloud credentials are set.
You can choose to compress WAL files before they're uploaded and you can encrypt them.
You can also enable parallel WAL archiving:

```yaml
apiVersion: pgd.k8s.enterprisedb.io/v1beta1
kind: PGDGroup
[...]
spec:
backup:
configuration:
barmanObjectStore:
[...]
wal:
compression: gzip
encryption: AES256
maxParallel: 8
```

For more information, see the [EDB Postgres for Kubernetes WAL archiving](/postgres_for_kubernetes/latest/backup_recovery/#wal-archiving) documentation.

## Scheduled backups

Scheduled backups are the recommended way to configure your backup strategy in EDB Postgres Distributed for Kubernetes.
When the PGD group `spec.backup.configuration.barmanObjectStore` stanza is configured, the operator selects one of the
PGD data nodes as the elected backup node for which it creates a `Scheduled Backup` resource.

The `.spec.backup.cron.schedule` field allows you to define a cron schedule specification, expressed
in the [Go `cron` package format](https://pkg.go.dev/github.com/robfig/cron#hdr-CRON_Expression_Format).

```yaml
apiVersion: pgd.k8s.enterprisedb.io/v1beta1
kind: PGDGroup
[...]
spec:
backup:
cron:
schedule: "0 0 0 * * *"
backupOwnerReference: self
suspend: false
immediate: true
```

You can suspend scheduled backups if necessary by setting `.spec.backup.cron.suspend` to `true`. Setting this setting
to `true` prevents any new backup from being scheduled.

If you want to execute a backup as soon as the `ScheduledBackup` resource is created,
set `.spec.backup.cron.immediate` to `true`.

`.spec.backupOwnerReference` indicates the `ownerReference` to use
in the created backup resources. The choices are:

- **none** — No owner reference for created backup objects.
- **self** — Sets the `ScheduledBackup` object as owner of the backup.
- **cluster** — Sets the cluster as owner of the backup.

!!! Note
The EDB Postgres for Kubernetes `ScheduledBackup` object contains the `cluster` option to specify the
cluster to back up. This option is currently not supported by EDB Postgres Distributed for Kubernetes and is
ignored if specified.

If an elected backup node is deleted, the operator transparently elects a new backup node
and reconciles the `ScheduledBackup` resource accordingly.

## Retention policies

EDB Postgres Distributed for Kubernetes can manage the automated deletion of backup files from the backup
object store using retention policies based on the recovery window.
This process also takes care of removing unused WAL files and WALs associated with backups
that are scheduled for deletion.

You can define your backups with a retention policy of 30 days:

```yaml
apiVersion: pgd.k8s.enterprisedb.io/v1beta1
kind: PGDGroup
[...]
spec:
backup:
configuration:
retentionPolicy: "30d"
```

For more information, see the [EDB Postgres for Kubernetes retention policies](/postgres_for_kubernetes/latest/backup_recovery/#retention-policies) in the EDB Postgres for Kubernetes documentation.

!!! Important
Currently, the retention policy is applied only for the elected `Backup Node`
backups and WAL files. Given that each other PGD node also archives its own WALs
independently, it's your responsibility to manage the lifecycle of those WAL files,
for example by leveraging the object storage data retention policy.
Also, if you have an object storage data retention policy set up on every PGD node
directory, make sure it's not overlapping or interfering with the retention policy managed
by the operator.

## Compression algorithms

Backups and WAL files are uncompressed by default. However, multiple compression algorithms are
supported. For more information, see the [EDB Postgres for Kubernetes compression algorithms](/postgres_for_kubernetes/latest/backup_recovery/#compression-algorithms) documentation.

## Tagging of backup objects

It's possible to specify tags as key-value pairs for the backup objects, namely base backups, WAL files, and history files.
For more information, see the EDB Postgres for Kubernetes documentation about [tagging of backup objects](/postgres_for_kubernetes/latest/backup_recovery/#tagging-of-backup-objects).

## On-demand backups of a PGD node

A PGD node is represented as single-instance EDB Postgres for Kubernetes `Cluster` object.
As such, if you need to, it's possible to request an on-demand backup
of a specific PGD node by creating a EDB Postgres for Kubernetes `Backup` resource.
To do that, see [EDB Postgres for Kubernetes on-demand backups](/postgres_for_kubernetes/latest/backup_recovery/#on-demand-backups) in the EDB Postgres for Kubernetes documentation.

!!! Hint
You can retrieve the list of EDB Postgres for Kubernetes clusters that make up your PGD group
by running `kubectl get cluster -l k8s.pgd.enterprisedb.io/group=my-pgd-group -n my-namespace`.
Loading
Loading