Skip to content

Commit

Permalink
Merge pull request #2757 from EnterpriseDB/release/2022-06-07
Browse files Browse the repository at this point in the history
Release: 2022-06-07
  • Loading branch information
drothery-edb authored Jun 7, 2022
2 parents fe2d1ff + c469574 commit 8cc5fd5
Show file tree
Hide file tree
Showing 149 changed files with 4,050 additions and 3,255 deletions.
2 changes: 1 addition & 1 deletion product_docs/docs/bdr/4/camo.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ No designated CAMO partner must be configured in this mode.
!!! Warning
CAMO requires changes to the user's application
to take advantage of the advanced error handling. Enabling a parameter isn't enough to gain protection. Reference client implementations
are provided to customers upon request.
are provided to customers on request.

## Requirements

Expand Down
415 changes: 209 additions & 206 deletions product_docs/docs/bdr/4/crdt.mdx

Large diffs are not rendered by default.

791 changes: 396 additions & 395 deletions product_docs/docs/bdr/4/ddl.mdx

Large diffs are not rendered by default.

205 changes: 100 additions & 105 deletions product_docs/docs/bdr/4/durability.mdx

Large diffs are not rendered by default.

141 changes: 70 additions & 71 deletions product_docs/docs/bdr/4/eager.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,47 +3,47 @@ title: Eager Replication
---

To prevent conflicts after a commit, set the `bdr.commit_scope`
parameter to `global`. The default setting of `local` disables eager
replication, so BDR will apply changes and resolve potential conflicts
post-commit, as described in the [Conflicts chapter](conflicts).
parameter to `global`. The default setting of `local` disables Eager
Replication, so BDR applies changes and resolves potential conflicts
post-commit, as described in the [Conflicts](conflicts).

In this mode, BDR uses two-phase commit (2PC) internally to detect and
resolve conflicts prior to the local commit. It turns a normal
resolve conflicts prior to the local commit. It turns a normal
`COMMIT` of the application into an implicit two-phase commit,
requiring all peer nodes to prepare the transaction, before the origin
node continues to commit it. If at least one node is down or
unreachable during the prepare phase, the commit will time out after
requiring all peer nodes to prepare the transaction before the origin
node continues to commit it. If at least one node is down or
unreachable during the prepare phase, the commit times out after
`bdr.global_commit_timeout`, leading to an abort of the
transaction. Note that there is no restriction on the use of
temporary tables, as exists in explicit 2PC in PostgreSQL.
transaction. There's no restriction on the use of
temporary tables as exists in explicit 2PC in PostgreSQL.

Once prepared, Eager All-Node Replication employs Raft to reach a
commit decision. In case of failures, this allows a remaining
commit decision. In case of failures, this allows a remaining
majority of nodes to reach a congruent commit or abort decision so
they can finish the transaction. This unblocks the objects and
they can finish the transaction. This unblocks the objects and
resources locked by the transaction and allows the cluster to proceed.

In case all nodes remain operational, the origin will confirm the
commit to the client only after all nodes have committed, to ensure that the
In case all nodes remain operational, the origin confirms the
commit to the client only after all nodes have committed to ensure that the
transaction is immediately visible on all nodes after the commit.

## Requirements

Eager All-Node Replication uses prepared transactions internally;
therefore all replica nodes need to have a `max_prepared_transactions`
configured high enough to be able to handle all incoming transactions
(possibly in addition to local two-phase commit and CAMO transactions;
see [Configuration: Max Prepared Transactions](configuration#max-prepared-transactions)).
We recommend to configure it the same on all nodes, and high enough to
Eager All-Node Replication uses prepared transactions internally.
Therefore all replica nodes need to have a `max_prepared_transactions`
configured high enough to handle all incoming transactions,
possibly in addition to local two-phase commit and CAMO transactions.
(See [Configuration: Max-prepared transactions](configuration#max-prepared-transactions)).
We recommend configuring it the same on all nodes and high enough to
cover the maximum number of concurrent transactions across the cluster
for which CAMO or Eager All-Node Replication is used. Other than
for which CAMO or Eager All-Node Replication is used. Other than
that, no special configuration is required, and every BDR cluster can
run Eager All-Node transactions.

## Usage

To enable Eager All-Node Replication, the client needs to switch to
global commit scope at session level, or for individual transactions
global commit scope at session level or for individual transactions
as shown here:

```sql
Expand All @@ -56,7 +56,7 @@ SET LOCAL bdr.commit_scope = 'global';
... other commands possible...
```

The client can continue to simply issue a `COMMIT` at the end of the
The client can continue to issue a `COMMIT` at the end of the
transaction and let BDR manage the two phases:

```sql
Expand All @@ -65,97 +65,96 @@ COMMIT;

## Error handling

Given that BDR manages the transaction, the client only needs to check the
result of the `COMMIT` (as is advisable in any case, including single-node
Postgres).
Given that BDR manages the transaction, the client needs to check only the
result of the `COMMIT`. (This is advisable in any case, including single-node
Postgres.)

In case of an origin node failure, the remaining nodes will eventually
(after at least `bdr.global_commit_timeout`) decide to rollback the
globally prepared transaction. Raft prevents inconsistent commit vs.
rollback decisions. This, however, requires a majority of connected
nodes. Disconnected nodes keep the transactions prepared to be able
to eventually commit them (or rollback) as needed to reconcile with the
majority of nodes that may have decided and made further progress.
In case of an origin node failure, the remaining nodes eventually
(after at least `bdr.global_commit_timeout`) decide to roll back the
globally prepared transaction. Raft prevents inconsistent commit versus
rollback decisions. However, this requires a majority of connected
nodes. Disconnected nodes keep the transactions prepared
to eventually commit them (or roll back) as needed to reconcile with the
majority of nodes that might have decided and made further progress.

### Eager All-Node Replication with CAMO

Eager All-Node Replication goes beyond CAMO and implies it; there is no
need to additionally enable `bdr.enable_camo`, if `bdr.commit_scope`
is set to `global`. Nor does a CAMO pair need to be
configured via `bdr.add_camo_pair()`.
Eager All-Node Replication goes beyond CAMO and implies it. There's no
need to also enable `bdr.enable_camo` if `bdr.commit_scope`
is set to `global`. Nor does a CAMO pair need to be
configured with `bdr.add_camo_pair()`.

Any other active BDR node may be used in the role of a CAMO partner to
query a transaction's status'. However, this non-CAMO usage needs to
be indicated to the `bdr.logical_transaction_status` function with a
third argument of `require_camo_partner = false`. Otherwise, it may
complain about a missing CAMO configuration (which is not required for
You can use any other active BDR node in the role of a CAMO partner to
query a transaction's status. However, you must indicate this non-CAMO usage
to the `bdr.logical_transaction_status` function with a
third argument of `require_camo_partner = false`. Otherwise, it might
complain about a missing CAMO configuration (which isn't required for
Eager transactions).

Other than this difference in configuration and invocation of that
function, the client needs to adhere to the protocol
described for [CAMO](camo). See the [reference client implementations](camo_clients).
described for [CAMO](camo). Reference client implementations are provided to customers on request .

### Limitations

Transactions using Eager Replication cannot yet execute DDL,
Transactions using Eager Replication can't yet execute DDL,
nor do they support explicit two-phase commit.
These may be allowed in later releases.
Note that the TRUNCATE command is allowed.
These might be allowed in later releases.
The `TRUNCATE` command is allowed.

Replacing a crashed and unrecoverable BDR node with its physical
standby is not currently supported in combination with Eager All Node
standby isn't currently supported in combination with Eager All-Node
transactions.

BDR currently offers a global commit scope only; later releases
BDR currently offers a global commit scope only. Later releases
will support Eager Replication with fewer nodes for increased
availability.

It is not possible for Eager All Node replication to be combined with
`synchronous_replication_availability = 'async'`. Trying to configure
both will yield an error.
You can't combine Eager All-Node Replication with
`synchronous_replication_availability = 'async'`. Trying to configure
both causes an error.

The Decoding Worker feature is not currently supported in combination
with Eager All Node transactions. Installations using Eager must keep
`enable_wal_decoder` disabled for the BDR node group using Eager All
Node transactions.
The Decoding Worker feature isn't currently supported in combination
with Eager All-Node transactions. Installations using Eager must keep
`enable_wal_decoder` disabled for the BDR node group using Eager All-Node transactions.

Synchronous replication uses a mechanism for transaction confirmation
different from Eager. The two are not compatible and must not be used
together. Therefore, whenever using Eager All Node transactions,
please make sure none of the BDR nodes are configured in
`synchronous_standby_names`. Using synchronous replication to a
non-BDR node acting as a physical standby is well possible.
different from Eager. The two aren't compatible, and you must not use them
together. Therefore, whenever using Eager All-Node transactions,
make sure none of the BDR nodes are configured in
`synchronous_standby_names`. Using synchronous replication to a
non-BDR node acting as a physical standby is possible.

## Effects of Eager Replication in General
## Effects of Eager Replication in general

#### Increased Commit Latency
#### Increased commit latency

Adding a synchronization step means additional communication between
the nodes, resulting in additional latency at commit time. Eager All
Node Replication adds roughly two network round trips (to the furthest
peer node in the worst case). Logical standby nodes and nodes still
in the process of joining or catching up are not included, but will
the nodes, resulting in additional latency at commit time. Eager All-Node
Replication adds roughly two network roundtrips (to the furthest
peer node in the worst case). Logical standby nodes and nodes still
in the process of joining or catching up aren't included but
eventually receive changes.

Before a peer node can confirm its local preparation of the
transaction, it also needs to apply it locally. This further adds to
the commit latency, depending on the size of the transaction. Note
that this is independent of the `synchronous_commit` setting and
transaction, it also needs to apply it locally. This further adds to
the commit latency, depending on the size of the transaction.
This is independent of the `synchronous_commit` setting and
applies whenever `bdr.commit_scope` is set to `global`.

#### Increased Abort Rate
#### Increased abort rate

!!! Note
The performance of Eager Replication is currently known to be
unexpectedly slow (around 10 TPS only). This is expected to be
improved in the next release.

With single-node Postgres, or even with BDR in its default asynchronous
replication mode, errors at `COMMIT` time are rare. The additional
replication mode, errors at `COMMIT` time are rare. The additional
synchronization step adds a source of errors, so applications need to
be prepared to properly handle such errors (usually by applying a
retry loop).

The rate of aborts depends solely on the workload. Large transactions
The rate of aborts depends solely on the workload. Large transactions
changing many rows are much more likely to conflict with other
concurrent transactions.
Loading

0 comments on commit 8cc5fd5

Please sign in to comment.