Skip to content

Commit

Permalink
copy edits to PGD terminology rewrites
Browse files Browse the repository at this point in the history
  • Loading branch information
ebgitelman authored and djw-m committed Oct 10, 2023
1 parent e7c76bb commit 8b64321
Showing 1 changed file with 31 additions and 36 deletions.
67 changes: 31 additions & 36 deletions product_docs/docs/pgd/5/terminology.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,45 +2,43 @@
title: Terminology
---

There are many terms you will come across in EDB Postgres Distributed that you may be unfamiliar with. This page is a list of many of those terms with quick definitions.

This terminology list includes terms associated with EDB Postgres Distributed that you might be unfamiliar with.

#### Asynchronous replication

A type of replication that copies data to other PGD cluster members after the transaction completes on the origin node. Asynchronous replication can provide higher performance and lower latency than [synchronous replication](#synchronous-replication). However, asynchronous replication can see a lag in how long changes take to appear in the various cluster members. While the cluster will be [eventually consistent](#eventual-consistency), there will be potential for nodes to be apparently out of sync with each other.
A type of replication that copies data to other PGD cluster members after the transaction completes on the origin node. Asynchronous replication can provide higher performance and lower latency than [synchronous replication](#synchronous-replication). However, asynchronous replication can see a lag in how long changes take to appear in the various cluster members. While the cluster will be [eventually consistent](#eventual-consistency), there's potential for nodes to be apparently out of sync with each other.

#### Commit scopes

Rules for managing how transactions are committed between the nodes and groups of a PGD cluster. Used to configure [synchronous replication](#synchronous-replication), [Group Commit](#group-commit), [CAMO](#camo-or-commit-at-most-once), [Eager](#eager), lag control and other PGD features.
Rules for managing how transactions are committed between the nodes and groups of a PGD cluster. Used to configure [synchronous replication](#synchronous-replication), [Group Commit](#group-commit), [CAMO](#camo-or-commit-at-most-once), [Eager](#eager), lag control, and other PGD features.

#### CAMO or commit-at-most-once

High value transactions in some applications require that the application successfully commited exactly once, and in the event of failover and retrying, only once. To ensure this happens in PGD, CAMO can be enabled allowing the application to actively participate in the transaction.
High-value transactions in some applications require that the application successfully commits exactly once, and in the event of failover and retrying, only once. To ensure this happens in PGD, CAMO can be enabled, allowing the application to actively participate in the transaction.

#### Conflicts

As data is replicated across the nodes of a PGD cluster, there may be occasions when changes from one source clash with changes from another source. This is a conflict and can be handled with conflict resolution (rules which decide which source is correct or preferred), or avoided with conflict-free data types.
As data is replicated across the nodes of a PGD cluster, there might be occasions when changes from one source clash with changes from another source. This is a conflict and can be handled with conflict resolution. (Conflict resolution is a set of rules that decide which source is correct or preferred.) Conflicts can also be avoided with conflict-free data types.

#### Consensus

How [Raft](#raft) makes group-wide decisions. Given a number of nodes in a group, Raft looks for a consensus of the majority (number of nodes divided by 2 plus 1) voting for a decision. For example, when a write leader is being selected, a Raft consensus is sought over which node in the group will be the write leader. Consensus can only be reached if there is a quorum of voting members.
How [Raft](#raft) makes group-wide decisions. Given a number of nodes in a group, Raft looks for a consensus of the majority (number of nodes divided by 2 plus 1) voting for a decision. For example, when a write leader is being selected, a Raft consensus is sought over which node in the group will be the write leader. Consensus can be reached only if there's a quorum of voting members.

#### Cluster

Generically, a cluster is a group of multiple redundant systems arranged to appear to end users as one system. See also [PGD cluster](#pgd-cluster) and [Postgres cluster](#postgres-cluster).

#### DDL

Data Definition Language - The subset of SQL commands that deal with the defining and managing the structure of a database. DDL statements can create, modify and delete objects - schemas, tables and indexes - within the database. Common DDL commands are CREATE, ALTER and DROP.
#### Data definition language (DDL)

#### DML
The subset of SQL commands that deal with defining and managing the structure of a database. DDL statements can create, modify, and delete objects (that is, schemas, tables, and indexes) in the database. Common DDL commands are CREATE, ALTER, and DROP.

Data Manipulation Language - The subset of SQL commands that deal with manipulating the data held within a database. DML statements can create, modify and delete rows within tables in the database. Common DML commands are INSERT, UPDATE and DELETE.
#### Data manipulation language (DML)

The subset of SQL commands that deal with manipulating the data held in a database. DML statements can create, modify, and delete rows in tables in the database. Common DML commands are INSERT, UPDATE, and DELETE.

#### Eager

Eager is a synchronous commit mode that avoids conflicts by detecting incoming potentially conflicting transactions and “eagerly” aborts one of them to maintain consistency.
A synchronous commit mode that avoids conflicts by detecting incoming potentially conflicting transactions and “eagerly” aborts one of them to maintain consistency.

#### Eventual consistency

Expand All @@ -52,54 +50,53 @@ The automated process that recognizes a failure in a highly available database c

#### Group commit

A synchronous commit mode which requires more than one PGD node to successfully receive and confirm a transaction at commit time.
A synchronous commit mode that requires more than one PGD node to successfully receive and confirm a transaction at commit time.

#### Immediate consistency

A distributed computing model where all replicas are updated synchronously and simultaneously. This ensures that all reads after a write has completed will see the same value on all nodes. The downside of this approach is its negative impact on performance.
A distributed computing model where all replicas are updated synchronously and simultaneously. This model ensures that all reads after a write completes will see the same value on all nodes. The downside of this approach is its negative impact on performance.

#### Logical replication

A more efficient method of replicating changes in the database. While physical streaming replication duplicate the originating database’s disk blocks, logical replication instead takes the changes made, independent of the underlying physical storage format, and publishes them to all systems that have subscribed to see the changes. Each subscriber then applies the changes locally. Logical replication is not able to support most DDL
A more efficient method of replicating changes in the database. While physical streaming replication duplicates the originating database’s disk blocks, logical replication instead takes the changes made, independent of the underlying physical storage format, and publishes them to all systems that subscribed to see the changes. Each subscriber then applies the changes locally. Logical replication can't support most DDL commands.

#### Node

A general term for an element of a distributed system. A node can play host to any service. In PGD, there are [PGD Nodes](#pgd-node) which run a Postgres database and the BDR extension and optionally a PGD Proxy service.
A general term for an element of a distributed system. A node can play host to any service. In PGD, [PGD nodes](#pgd-node) run a Postgres database, the BDR extension, and optionally a PGD Proxy service.

Typically, for high availability, each node runs on separate physical hardware, but not necessarily. For example, a proxy might share a hardware node with a database.
Typically, for high availability, each node runs on separate physical hardware, but that's not always the case. For example, a proxy might share a hardware node with a database.

#### Node groups

PGD Nodes in PGD clusters can be organized into groups to reflect the logical operation of the cluster. For example, the data nodes in a particular physical location may be part of a dedicated node group for the location.

PGD nodes in PGD clusters can be organized into groups to reflect the logical operation of the cluster. For example, the data nodes in a particular physical location can be part of a dedicated node group for the location.

#### PGD cluster

A group of multiple redundant database systems and proxies arranged to avoid single points of failure while appearing to end users as one system. PGD clusters may be run on Docker instances, cloud instances or “bare” Linux hosts, or a combination of those platforms. A PGD cluster may also include backup and proxy nodes. The data nodes in a cluster are grouped together in a top level group and into various local [node groups](#node-groups).
A group of multiple redundant database systems and proxies arranged to avoid single points of failure while appearing to end users as one system. PGD clusters can be run on Docker instances, cloud instances or “bare” Linux hosts, or a combination of those platforms. A PGD cluster can also include backup and proxy nodes. The data nodes in a cluster are grouped together in a top-level group and into various local [node groups](#node-groups).

### PGD node

In a PGD cluster, there are nodes which run databases and participate in the PGD Cluster. A typical PGD node will run a Postgres database and the BDR extension and optionally a PGD Proxy service. PGD Nodes may also be referred to as data nodes which suggests they store data, though some PGD Nodes, specifically [witness nodes](#witness-nodes) do not do that.
In a PGD cluster are nodes that run databases and participate in the PGD cluster. A typical PGD node runs a Postgres database, the BDR extension, and optionally a PGD Proxy service. PGD modes are also referred to as *data nodes*, which suggests they store data. However, some PGD nodes, specifically [witness nodes](#witness-nodes), don't do that.

#### Physical replication

By making an exact copy of database disk blocks as they are modified to one or more standby cluster members, physical replication provides an easily implemented method to replicate servers. But there are restrictions on how it can be used. For example, only one master node can run write transactions. Also, the method requires that all cluster members are on the same major version of the database software with the same operating system and CPU architecture.
By making an exact copy of database disk blocks as they're modified to one or more standby cluster members, physical replication provides an easily implemented method to replicate servers. But there are restrictions on how it can be used. For example, only one master node can run write transactions. Also, the method requires that all cluster members are on the same major version of the database software with the same operating system and CPU architecture.

#### Postgres cluster

Traditionally, in PostgreSQL, a number of databases running on a single server is referred to as a cluster (of databases). This kind of Postgres cluster is not highly available. To get high availability and redundancy, you need a [PGD Cluster](#pgd-cluster).
Traditionally, in PostgreSQL, a number of databases running on a single server is referred to as a cluster (of databases). This kind of Postgres cluster isn't highly available. To get high availability and redundancy, you need a [PGD cluster](#pgd-cluster).

#### Quorum

When a [Raft](#Raft) [consensus](#consensus) is needed by a PGD cluster, there needs to be a minimum number of voting nodes participating in the vote. This number is called a quorum. For example, with a 5 node cluster, the quorum would be 3 nodes in the cluster voting. A consensus would be 5/2+1 nodes, 3 nodes voting the same way. If there were only 2 voting nodes, then a consensus would never be established.
When a [Raft](#Raft) [consensus](#consensus) is needed by a PGD cluster, a minimum number of voting nodes participating in the vote are needed. This number is called a quorum. For example, with a 5-node cluster, the quorum is 3 nodes in the cluster voting. A consensus is 5/2+1 nodes, 3 nodes voting the same way. If there are only 2 voting nodes, then a consensus is never established.

#### Raft
#### Replicated available fault tolerance (Raft)

Replicated, Available, Fault Tolerance. A consensus algorithm which uses votes from a quorum of machines in a distributed cluster to establish a consensus. PGD uses RAFT within groups (top level or local) to establish which node is the write leader.
A consensus algorithm that uses votes from a quorum of machines in a distributed cluster to establish a consensus. PGD uses Raft within groups (top-level or local) to establish the node that's the write leader.

#### Read scalability

The ability of a system to handle increasing read workloads. For example, PGD is able to introduce one or more read replica nodes to a cluster and have the application direct writes to the primary node and reads to the replica nodes. As the read workload grows, you can increase the number of read replica nodes to maintain performance.
The ability of a system to handle increasing read workloads. For example, PGD can introduce one or more read replica nodes to a cluster and have the application direct writes to the primary node and reads to the replica nodes. As the read workload grows, you can increase the number of read replica nodes to maintain performance.

#### Switchover

Expand All @@ -109,25 +106,23 @@ A planned change in connection between the application or proxies and the active

When changes are updated at all participating nodes at the same time, typically leveraging a two-phase commit. While this approach replicates changes and resolves conflicts before committing, a performance cost in latency occurs due to the coordination required across nodes.

#### Subscriber-Only nodes
#### Subscriber-only nodes

A PGD cluster is based around bidirectional replication, but in some use cases such as needing a read-only server, bidirectional replication is not needed. A Subscriber-Only Node is used in this case; it only subscribes to changes in the database to keep itself up to date and provide correct results to any run directly on the node. This can be used to enable horizontal read scalability in a PGD cluster.
A PGD cluster is based around bidirectional replication. But in some use cases, such as needing a read-only server, bidirectional replication isn't needed. A subscriber-only node is used in this case. It subscribes only to changes in the database to keep itself up to date and provide correct results to any run directly on the node. This feature can be used to enable horizontal read scalability in a PGD cluster.

#### Two-phase commit (2PC)

A multi-step process for achieving consistency across multiple database nodes. The first phase sees a transaction prepared on an originating node and sent to all participating nodes. Each participating node validates that it can apply the transaction and
signals its readiness to the originating node. This is the pre-commit or prepare phase. In the second phase, if all the participating nodes signal they are ready, the originating node proceeds to commit the transaction and signals the participating nodes to commit, too. This is the commit phase. If, in the pre-commit phase, any node signals it is not ready, the entire transaction is aborted. It's this process that ensures all nodes get the same changes.
signals its readiness to the originating node. This is the precommit or prepare phase. In the second phase, if all the participating nodes signal they're ready, the originating node proceeds to commit the transaction and signals the participating nodes to commit, too. This is the commit phase. If, in the precommit phase, any node signals it isn't ready, the entire transaction is aborted. This process ensures all nodes get the same changes.

#### Vertical scaling or scale up

A traditional computing approach of increasing a resource (CPU, memory, storage, network) to support a given workload until the physical limits of that architecture are reached, e.g., Oracle Exadata.
A traditional computing approach of increasing a resource (CPU, memory, storage, network) to support a given workload until the physical limits of that architecture are reached, for example, Oracle Exadata.

#### Witness nodes

Witness nodes primarily serve to help the cluster establish a consensus. An odd number of data nodes are needed to establish a consensus and, where resources are limited, a witness node can be used to participate in cluster decisions but not replicate the data. Not holding the data means it cannot operate as a standby server or provide majorities in synchronous commits.

Witness nodes primarily serve to help the cluster establish a consensus. An odd number of data nodes are needed to establish a consensus and, where resources are limited, a witness node can be used to participate in cluster decisions but not replicate the data. Not holding the data means it can't operate as a standby server or provide majorities in synchronous commits.

#### Write leader

In an Always-On Architecture, a node is selected as the correct connection endpoint for applications. This node is called the write leader, and once selected, proxy nodes route queries and updates to it. With only one node receiving writes, unintended multi-node writes can be avoided. The write leader is selected by consensus of a quorum of data nodes. If the write leader becomes unavailable, the data nodes select another node to become write leader. Nodes that aren't the write leader are referred to as *shadow nodes*.

In an Always-On architecture, a node is selected as the correct connection endpoint for applications. This node is called the write leader. Once selected, proxy nodes route queries and updates to it. With only one node receiving writes, unintended multi-node writes can be avoided. The write leader is selected by consensus of a quorum of data nodes. If the write leader becomes unavailable, the data nodes select another node to become write leader. Nodes that aren't the write leader are referred to as *shadow nodes*.

0 comments on commit 8b64321

Please sign in to comment.