Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node management - Refresh #4849

Merged
merged 29 commits into from
Nov 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
1044702
First pass reorg into sections with intro
djw-m Sep 26, 2023
2025bf8
Only link apparently to Nodes fixes.
djw-m Sep 27, 2023
3895701
Link fix
djw-m Sep 27, 2023
933d2b8
Supporting edits and tweaks for links
djw-m Sep 27, 2023
cd28a56
Remove todo, make link to pgd-cli clearer
djw-m Oct 5, 2023
2fed4b1
Update product_docs/docs/pgd/5/node_management/index.mdx
djw-m Oct 5, 2023
f5fef38
Update product_docs/docs/pgd/5/node_management/index.mdx
djw-m Oct 6, 2023
dff13da
Update product_docs/docs/pgd/5/node_management/decoding_worker.mdx
djw-m Oct 6, 2023
d0f14a8
Update product_docs/docs/pgd/5/node_management/logical_standby_nodes.mdx
djw-m Oct 6, 2023
e79b4ad
Update product_docs/docs/pgd/5/node_management/node_recovery.mdx
djw-m Oct 6, 2023
269e8ee
Update product_docs/docs/pgd/5/node_management/node_recovery.mdx
djw-m Oct 6, 2023
c7a7d19
Update product_docs/docs/pgd/5/node_management/node_recovery.mdx
djw-m Oct 6, 2023
e608bac
Update product_docs/docs/pgd/5/node_management/physical_standby_nodes…
djw-m Oct 6, 2023
b61ab42
Update product_docs/docs/pgd/5/node_management/viewing_topology.mdx
djw-m Oct 6, 2023
1991cc6
fix cli links viewing_topology.mdx
djw-m Oct 6, 2023
0ffddbe
Update product_docs/docs/pgd/5/node_management/connections_dsns_and_s…
djw-m Oct 6, 2023
6b55321
Update product_docs/docs/pgd/5/node_management/connections_dsns_and_s…
djw-m Oct 6, 2023
096fd70
Removed space from redirects
djw-m Oct 6, 2023
682b920
Read through reorged topics
ebgitelman Oct 6, 2023
caf2a06
Merged the groups-subgroups into one. Witness TBD.
djw-m Oct 12, 2023
1d78d7d
Fix a few remaining links
josh-heyer Oct 17, 2023
8347a13
Remove subscriber only groups post groups merge
djw-m Oct 17, 2023
117b683
More sections to add clarity
djw-m Oct 17, 2023
271415c
small tweaks to language
djw-m Oct 17, 2023
29405d8
WIP commit
djw-m Oct 17, 2023
a95f042
Expanded witness nodes and updated types
djw-m Oct 18, 2023
a0a842f
More edits
djw-m Oct 18, 2023
a1b130c
Fixed link to subscriber-only content
djw-m Nov 2, 2023
6a41f3b
Fix links to subscriber_only topic (again) - underscores!!
josh-heyer Nov 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions product_docs/docs/pgd/5/choosing_server.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The following table lists features of EDB Postgres Distributed that are dependen
| [Granular DDL Locking](ddl/#ddl-locking-details) | Y | Y | Y |
| [Streaming of large transactions](transaction-streaming/) | v14+ | v13+ | v14+ |
| [Distributed sequences](sequences/#pgd-global-sequences) | Y | Y | Y |
| [Subscribe-only nodes](nodes/#physical-standby-nodes) | Y | Y | Y |
| [Subscribe-only nodes](node_management/subscriber_only/) | Y | Y | Y |
| [Monitoring](monitoring/) | Y | Y | Y |
| [OpenTelemetry support](monitoring/otel/) | Y | Y | Y |
| [Parallel apply](parallelapply) | Y | Y | Y |
Expand All @@ -28,7 +28,7 @@ The following table lists features of EDB Postgres Distributed that are dependen
| [Commit At Most Once (CAMO)](durability/camo/) | N | Y | 14+ |
| [Eager Conflict Resolution](consistency/eager/) | N | Y | 14+ |
| [Lag Control](durability/lag-control/) | N | Y | 14+ |
| [Decoding Worker](nodes/#decoding-worker) | N | 13+ | 14+ |
| [Decoding Worker](node_management/decoding_worker) | N | 13+ | 14+ |
| [Lag tracker](monitoring/sql/#monitoring-outgoing-replication) | N | Y | 14+ |
| Missing partition conflict | N | Y | 14+ |
| No need for UPDATE Trigger on tables with TOAST | N | Y | 14+ |
Expand Down
2 changes: 1 addition & 1 deletion product_docs/docs/pgd/5/consistency/conflicts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -475,7 +475,7 @@ Origin info is available only up to the point where a row is frozen. Updates arr
A node that was offline that reconnects and begins sending data changes can cause divergent
errors if the newly arrived updates are older than the frozen rows that they update. Inserts and deletes aren't affected by this situation.

We suggest that you don't leave down nodes for extended outages, as discussed in [Node restart and down node recovery](../nodes).
We suggest that you don't leave down nodes for extended outages, as discussed in [Node restart and down node recovery](../node_management).

On EDB Postgres Extended Server and EDB Postgres Advanced Server, PGD holds back the freezing of rows while a node is down. This mechanism handles this situation gracefully so you don't need to change parameter settings.

Expand Down
2 changes: 1 addition & 1 deletion product_docs/docs/pgd/5/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ navigation:
- upgrades
- "#Using"
- appusage
- node_management
- postgres-configuration
- nodes
- ddl
- security
- sequences
Expand Down
6 changes: 3 additions & 3 deletions product_docs/docs/pgd/5/monitoring/sql.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ and

Each node has one PGD group slot that must never have a connection to it
and is very rarely be marked as active. This is normal and doesn't imply
something is down or disconnected. See [Replication slots created by PGD`](../nodes/#replication-slots-created-by-pgd).
something is down or disconnected. See [Replication slots](../node_management/replication_slots) in Node Management.

### Monitoring outgoing replication

Expand Down Expand Up @@ -271,7 +271,7 @@ subscription_status | replicating

### Monitoring WAL senders using LCR

If the [decoding worker](../nodes#decoding-worker) is enabled, you can monitor information about the
If the [decoding worker](../node_management/decoding_worker/) is enabled, you can monitor information about the
current logical change record (LCR) file for each WAL sender
using the function [`bdr.wal_sender_stats()`](/pgd/latest/reference/functions/#bdrwal_sender_stats). For example:

Expand All @@ -287,7 +287,7 @@ postgres=# SELECT * FROM bdr.wal_sender_stats();

If `is_using_lcr` is `FALSE`, `decoder_slot_name`/`lcr_file_name` is `NULL`.
This is the case if the decoding worker isn't enabled or the WAL sender is
serving a [logical standby](../nodes#logical-standby-nodes).
serving a [logical standby](../node_management/logical_standby_nodes/).

Also, you can monitor information about the decoding worker using the function
[`bdr.get_decoding_worker_stat()`](/pgd/latest/reference/functions/#bdrget_decoding_worker_stat). For example:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: Connection DSNs and SSL (TLS)
---

Because nodes connect
using `libpq`, the DSN of a node is a [`libpq`]( https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-SSLMODE) connection string. As such, the connection string can contain any permitted `libpq` connection
parameter, including those for SSL. The DSN must work as the
connection string from the client connecting to the node in which it's
specified. An example of such a set of parameters using a client certificate is:

```ini
sslmode=verify-full sslcert=bdr_client.crt sslkey=bdr_client.key
sslrootcert=root.crt
```

With this setup, the files `bdr_client.crt`, `bdr_client.key`, and
`root.crt` must be present in the data directory on each node, with the
appropriate permissions.
For `verify-full` mode, the server's SSL certificate is checked to
ensure that it's directly or indirectly signed with the `root.crt` certificate
authority and that the host name or address used in the connection matches the
contents of the certificate. In the case of a name, this can match a subject's
alternative name or, if there are no such names in the certificate, the
subject's common name (CN) field.
Postgres doesn't currently support subject alternative names for IP
addresses, so if the connection is made by address rather than name, it must
match the CN field.

The CN of the client certificate must be the name of the user making the
PGD connection,
which is usually the user postgres. Each node requires matching
lines permitting the connection in the `pg_hba.conf` file. For example:

```ini
hostssl all postgres 10.1.2.3/24 cert
hostssl replication postgres 10.1.2.3/24 cert
```

Another setup might be to use `SCRAM-SHA-256` passwords instead of client
certificates and not verify the server identity as long as
the certificate is properly signed. Here the DSN parameters might be:

```ini
sslmode=verify-ca sslrootcert=root.crt
```

The corresponding `pg_hba.conf` lines are:

```ini
hostssl all postgres 10.1.2.3/24 scram-sha-256
hostssl replication postgres 10.1.2.3/24 scram-sha-256
```

In such a scenario, the postgres user needs a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) file
containing the correct password.
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: Creating and joining PGD groups
navTitle: Creating and joining PGD groups
---

## Creating and joining PGD groups

For PGD, every node must connect to every other node. To make
configuration easy, when a new node joins, it configures all
existing nodes to connect to it. For this reason, every node, including
the first PGD node created, must know the [PostgreSQL connection string](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING) that other nodes
can use to connect to it. This connection string is
sometimes referred to as a data source name (DSN).

Both formats of connection string are supported.
So you can use either key-value format, like `host=myhost port=5432 dbname=mydb`,
or URI format, like `postgresql://myhost:5432/mydb`.

The SQL function [`bdr.create_node_group()`](/pgd/latest/reference/nodes-management-interfaces#bdrcreate_node_group) creates the PGD group
from the local node. Doing so activates PGD on that node and allows other
nodes to join the PGD group, which consists of only one node at that point.
At the time of creation, you must specify the connection string for other
nodes to use to connect to this node.

Once the node group is created, every further node can join the PGD
group using the [`bdr.join_node_group()`](/pgd/latest/reference/nodes-management-interfaces#bdrjoin_node_group) function.

Alternatively, use the command line utility [bdr_init_physical](/pgd/latest/reference/nodes/#bdr_init_physical) to
create a new node, using `pg_basebackup` or a physical standby of an existing
node. If using `pg_basebackup`, the bdr_init_physical utility can optionally
specify the base backup of only the target database. The earlier
behavior was to back up the entire database cluster. With this utility, the activity
completes faster and also uses less space because it excludes
unwanted databases. If you specify only the target database, then the excluded
databases get cleaned up and removed on the new node.

When a new PGD node is joined to an existing PGD group or a node subscribes
to an upstream peer, before replication can begin the system must copy the
existing data from the peer nodes to the local node. This copy must be
carefully coordinated so that the local and remote data starts out
identical. It's not enough to use pg_dump yourself. The BDR
extension provides built-in facilities for making this initial copy.

During the join process, the BDR extension synchronizes existing data
using the provided source node as the basis and creates all metadata
information needed for establishing itself in the mesh topology in the PGD
group. If the connection between the source and the new node disconnects during
this initial copy, restart the join process from the
beginning.

The node that's joining the cluster must not contain any schema or data
that already exists on databases in the PGD group. We recommend that the
newly joining database be empty except for the BDR extension. However,
it's important that all required database users and roles are created.

Optionally, you can skip the schema synchronization using the `synchronize_structure`
parameter of the [`bdr.join_node_group`](/pgd/latest/reference/nodes-management-interfaces#bdrjoin_node_group) function. In this case, the schema must
already exist on the newly joining node.

We recommend that you select the source node that has the best connection (the
closest) as the source node for joining. Doing so lowers the time
needed for the join to finish.

Coordinate the join procedure using the Raft consensus algorithm, which
requires most existing nodes to be online and reachable.

The logical join procedure (which uses the [`bdr.join_node_group`](/pgd/latest/reference/nodes-management-interfaces#bdrjoin_node_group) function)
performs data sync doing `COPY` operations and uses multiple writers
(parallel apply) if those are enabled.

Node join can execute concurrently with other node joins for the majority of
the time taken to join. However, only one regular node at a time can be in
either of the states PROMOTE or PROMOTING. These states are typically fairly short if
all other nodes are up and running. Otherwise the join is serialized at
this stage. The subscriber-only nodes are an exception to this rule, and they
can be concurrently in PROMOTE and PROMOTING states as well, so their join
process is fully concurrent.

The join process uses only one node as the source, so it can be
executed when nodes are down if a majority of nodes are available.
This approach can cause a complexity when running logical join.
During logical join, the commit timestamp of rows copied from the source
node is set to the latest commit timestamp on the source node.
Committed changes on nodes that have a commit timestamp earlier than this
(because nodes are down or have significant lag) can conflict with changes
from other nodes. In this case, the newly joined node can be resolved
differently to other nodes, causing a divergence. As a result, we recommend
not running a node join when significant replication lag exists between nodes.
If this is necessary, run LiveCompare on the newly joined node to
correct any data divergence once all nodes are available and caught up.

`pg_dump` can fail when there's concurrent DDL activity on the source node
because of cache-lookup failures. Since [`bdr.join_node_group`](/pgd/latest/reference/nodes-management-interfaces#bdrjoin_node_group) uses pg_dump
internally, it might fail if there's concurrent DDL activity on the source node.
Retrying the join works in that case.
79 changes: 79 additions & 0 deletions product_docs/docs/pgd/5/node_management/decoding_worker.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: Decoding worker
---

PGD provides an option to enable a decoding worker process that performs
decoding once, no matter how many nodes are sent data. This option introduces a
new process, the WAL decoder, on each PGD node. One WAL sender process still
exists for each connection, but these processes now just perform the task of
sending and receiving data. Taken together, these changes reduce the CPU
overhead of larger PGD groups and also allow higher replication throughput
since the WAL sender process now spends more time on communication.

## Enabling

`enable_wal_decoder` is an option for each PGD group, which is currently
disabled by default. You can use [`bdr.alter_node_group_config()`](../reference/nodes-management-interfaces/#bdralter_node_group_config) to enable or
disable the decoding worker for a PGD group.

When the decoding worker is enabled, PGD stores logical change record (LCR)
files to allow buffering of changes between decoding and when all
subscribing nodes received data. LCR files are stored under the
`pg_logical` directory in each local node's data directory. The number and
size of the LCR files varies as replication lag increases, so this process also
needs monitoring. The LCRs that aren't required by any of the PGD nodes are cleaned
periodically. The interval between two consecutive cleanups is controlled by
[`bdr.lcr_cleanup_interval`](/pgd/latest/reference/pgd-settings#bdrlcr_cleanup_interval), which defaults to 3 minutes. The cleanup is
disabled when [`bdr.lcr_cleanup_interval`](/pgd/latest/reference/pgd-settings#bdrlcr_cleanup_interval) is 0.

## Disabling

When disabled, logical decoding is performed by the WAL sender process for each
node subscribing to each node. In this case, no LCR files are written.

Even though the decoding worker is enabled for a PGD group, following
GUCs control the production and use of LCR per node. By default
these are `false`. For production and use of LCRs, enable the
decoding worker for the PGD group and set these GUCs to `true` on each of the nodes in the PGD group.

- [`bdr.enable_wal_decoder`](/pgd/latest/reference/pgd-settings#bdrenable_wal_decoder) — When `false`, all WAL
senders using LCRs restart to use WAL directly. When `true`
along with the PGD group config, a decoding worker process is
started to produce LCR and WAL senders that use LCR.
- [`bdr.receive_lcr`](/pgd/latest/reference/pgd-settings#bdrreceive_lcr) — When `true` on the subscribing node, it requests WAL
sender on the publisher node to use LCRs if available.


!!! Note Notes
As of now, a decoding worker decodes changes corresponding to the node where it's
running. A logical standby is sent changes from all the nodes in the PGD group
through a single source. Hence a WAL sender serving a logical standby currently can't
use LCRs.

A subscriber-only node receives changes from respective nodes directly. Hence
a WAL sender serving a subscriber-only node can use LCRs.

Even though LCRs are produced, the corresponding WALs are still retained similar
to the case when a decoding worker isn't enabled. In the future, it might be possible
to remove WAL corresponding the LCRs, if they aren't otherwise required.
!!!

## LCR file names

For reference, the first 24 characters of an LCR file name are similar to those
in a WAL file name. The first 8 characters of the name are currently all '0'.
In the future, they're expected to represent the TimeLineId similar to the first 8
characters of a WAL segment file name. The following sequence of 16 characters
of the name is similar to the WAL segment number, which is used to track LCR
changes against the WAL stream.

However, logical changes are
reordered according to the commit order of the transactions they belong to.
Hence their placement in the LCR segments doesn't match the placement of
corresponding WAL in the WAL segments.

The set of the last 16 characters represents the
subsegment number in an LCR segment. Each LCR file corresponds to a
subsegment. LCR files are binary and variable sized. The maximum size of an
LCR file can be controlled by `bdr.max_lcr_segment_file_size`, which
defaults to 1 GB.
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: Groups and subgroups
---

## Groups

A PGD cluster's nodes are gathered in groups. A "top level" group always exists and is the group
to which all data nodes belong to automatically. The "top level" group can also be
the direct parent of sub-groups.

## Sub-groups

A group can also contain zero or more subgroups. Subgroups can be used to
represent data centers or locations allowing commit scopes to refer to nodes
in a particular region as a whole. PGD Proxy can also make use of subgroups
to delineate nodes available to be write leader.

The `node_group_type` value specifies the type when the subgroup is created.
Some sub-group types change the behavior of the nodes within the group. For
example, a [subscriber-only](subscriber_only) sub-group will make all the nodes
within the group into subscriber-only nodes.

Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: Joining a heterogeneous cluster
---


PGD 4.0 node can join a EDB Postgres Distributed cluster running 3.7.x at a specific
djw-m marked this conversation as resolved.
Show resolved Hide resolved
minimum maintenance release (such as 3.7.6) or a mix of 3.7 and 4.0 nodes.
This procedure is useful when you want to upgrade not just the PGD
major version but also the underlying PostgreSQL major
version. You can achieve this by joining a 3.7 node running on
PostgreSQL 12 or 13 to a EDB Postgres Distributed cluster running 3.6.x on
PostgreSQL 11. The new node can also
run on the same PostgreSQL major release as all of the nodes in the
existing cluster.

PGD ensures that the replication works correctly in all directions
even when some nodes are running 3.6 on one PostgreSQL major release and
other nodes are running 3.7 on another PostgreSQL major release. However,
we recommend that you quickly bring the cluster into a
homogenous state by parting the older nodes once enough new nodes
join the cluster. Don't run any DDLs that might
not be available on the older versions and vice versa.

A node joining with a different major PostgreSQL release can't use
physical backup taken with [`bdr_init_physical`](/pgd/latest/reference/nodes#bdr_init_physical), and the node must join
using the logical join method. Using this method is necessary because the major
PostgreSQL releases aren't on-disk compatible with each other.

When a 3.7 node joins the cluster using a 3.6 node as a
source, certain configurations, such as conflict resolution,
aren't copied from the source node. The node must be configured
after it joins the cluster.
Loading