-
Notifications
You must be signed in to change notification settings - Fork 254
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Content/harp/2.0/test import (#1914)
Final import before publication.
- Loading branch information
1 parent
502848e
commit cf9a0bf
Showing
23 changed files
with
2,211 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,263 @@ | ||
--- | ||
navTitle: Overview | ||
title: HARP Functionality Overview | ||
--- | ||
|
||
HARP is a new approach to High Availability primarily designed for BDR | ||
clusters, though it can also manage standard Primary + PSR topologies. It | ||
leverages consensus-driven Quorum to determine the correct connection end-point | ||
in a semi-exclusive manner to prevent unintended multi-node writes from an | ||
application. | ||
|
||
## The Importance of Quorum | ||
|
||
The central purpose of HARP is to enforce full Quorum on any Postgres cluster | ||
it manages. Quorum is merely a term generally applied to a voting body that | ||
mandates a certain minimum of attendees are available to make a decision. Or | ||
perhaps even more simply: Majority Rules. | ||
|
||
In order for any vote to end in a result other than a tie, an odd number of | ||
nodes must constitute the full cluster membership. Quorum however does not | ||
strictly demand this restriction; a simple majority will suffice. This means | ||
that in a cluster of N nodes, Quorum requires a minimum of N/2+1 nodes to hold | ||
a meaningful vote. | ||
|
||
All of this ensures the cluster is always in agreement regarding which node | ||
should be "in charge". For a BDR cluster consisting of multiple nodes, this | ||
determines which node is the primary write target. HARP designates this node | ||
as the Lead Master. | ||
|
||
## Reducing Write Targets | ||
|
||
The consequence of ignoring the concept of Quorum, or applying it | ||
insufficiently, may lead to a Split Brain scenario where the "correct" write | ||
target is ambiguous or unknowable. In a standard Postgres cluster, it is | ||
important that only a single node is ever writable and sending replication | ||
traffic to the remaining nodes. | ||
|
||
Even in Multi-Master capable approaches such as BDR, it can be beneficial to | ||
reduce the amount of necessary conflict management to derive identical data | ||
across the cluster. In clusters that consist of multiple BDR nodes per physical | ||
location or region, this usually means a single BDR node acts as a "Leader" and | ||
remaining nodes are "Shadows". These Shadow nodes are still writable, but doing | ||
so is discouraged unless absolutely necessary. | ||
|
||
By leveraging Quorum, it's possible for all nodes to agree exactly which | ||
Postgres node should represent the entire cluster, or a local BDR region. Any | ||
nodes that lose contact with the remainder of the Quorum, or are overruled by | ||
it, by definition cannot become the cluster Leader. | ||
|
||
This prevents Split Brain situations where writes unintentionally reach two | ||
Postgres nodes. Unlike technologies such as VPNs, Proxies, load balancers, or | ||
DNS, a Quorum-derived consensus cannot be circumvented by mis-configuration or | ||
network partitions. So long as it's possible to contact the Consensus layer to | ||
determine the state of the Quorum maintained by HARP, only one target is ever | ||
valid. | ||
|
||
## Basic Architecture | ||
|
||
The design of HARP comes in essentially two parts consisting of a Manager and | ||
a Proxy. The following diagram describes how these interact with a single | ||
Postgres instance: | ||
|
||
![HARP Unit](images/ha-unit.png) | ||
|
||
The Consensus Layer is an external entity where Harp Manager maintains | ||
information it learns about its assigned Postgres node, and HARP Proxy | ||
translates this information to a valid Postgres node target. Because Proxy | ||
obtains the node target from the Consensus Layer, several such instances may | ||
exist independently. | ||
|
||
While using BDR itself as the Consensus Layer, each server node resembles this | ||
variant instead. | ||
|
||
![HARP Unit w/BDR Consensus](images/ha-unit-bdr.png) | ||
|
||
In either case, each unit consists of the following elements: | ||
|
||
* A Postgres or EDB instance | ||
* A Consensus Layer resource, meant to track various attributes of the Postgres | ||
instance | ||
* A HARP Manager process to convey the state of the Postgres node to the | ||
Consensus Layer | ||
* A HARP Proxy service that directs traffic to the proper Lead Master node, | ||
as derived from the Consensus Layer | ||
|
||
Not every application stack has access to additional node resources | ||
specifically for the Proxy component, so it can be combined with the | ||
application server to simplify the stack itself. | ||
|
||
This is a typical design using two BDR nodes in a single Data Center organized in a Lead Master / Shadow Master configuration: | ||
|
||
![HARP Cluster](images/ha-ao.png) | ||
|
||
Note that when using BDR itself as the HARP Consensus Layer, at least three | ||
fully qualified BDR nodes must be present to ensure a quorum majority. | ||
|
||
![HARP Cluster w/BDR Consensus](images/ha-ao-bdr.png) | ||
|
||
(Not shown in the above diagram are connections between BDR nodes.) | ||
|
||
## How it Works | ||
|
||
When managing a BDR cluster, HARP maintains at most one "Leader" node per | ||
defined Location. Canonically this is referred to as the Lead Master. Other BDR | ||
nodes which are eligible to take this position are Shadow Master state until | ||
such a time they take the Leader role. | ||
|
||
Applications may contact the current Leader only through the Proxy service. | ||
Since the Consensus Layer requires Quorum agreement before conveying Leader | ||
state, any and all Proxy services will direct traffic to that node. | ||
|
||
At a high level, this is ultimately what prevents application interaction with | ||
multiple nodes simultaneously. | ||
|
||
### Determining a Leader | ||
|
||
As an example, consider the role of Lead Master within a locally subdivided | ||
BDR Always-On group as may exist within a single data center. When any | ||
Postgres or Manager resource is started, and after a configurable refresh | ||
interval, the following must occur: | ||
|
||
1. The Manager checks the status of its assigned Postgres resource. | ||
- If Postgres is not running, try again after configurable timeout. | ||
- If Postgres is running, continue. | ||
2. The Manager checks the status of the Leader lease in the Consensus Layer. | ||
- If the lease is unclaimed, acquire it and assign the identity of | ||
the Postgres instance assigned to this Manager. This lease duration is | ||
configurable, but setting it too low may result in unexpected leadership | ||
transitions. | ||
- If the lease is already claimed by us, renew the lease TTL. | ||
- Otherwise do nothing. | ||
|
||
Obviously a lot more happens here, but this simplified version should explain | ||
what's happening. The Leader lease can only be held by one node, and if it's | ||
held elsewhere, HARP Manager gives up and tries again later. | ||
|
||
!!! Note | ||
Depending on the chosen Consensus Layer, rather than repeatedly looping to | ||
check the status of the Leader lease, HARP will subscribe to notifications | ||
instead. In this case, it can respond immediately any time the state of the | ||
lease changes, rather than polling. Currently this functionality is | ||
restricted to the etcd Consensus Layer. | ||
|
||
This means HARP itself does not hold elections or manage Quorum; this is | ||
delegated to the Consensus Layer. The act of obtaining the lease must be | ||
acknowledged by a Quorum of the Consensus Layer, so if the request succeeds, | ||
that node leads the cluster in that Location. | ||
|
||
### Connection Routing | ||
|
||
Once the role of the Lead Master is established, connections are handled | ||
with a similar deterministic result as reflected by HARP Proxy. Consider a case | ||
where HAProxy needs to determine the connection target for a particular backend | ||
resource: | ||
|
||
1. HARP Proxy interrogates the Consensus layer for the current Lead Master in | ||
its configured location. | ||
2. If this is unset or in transition; | ||
- New client connections to Postgres are barred, but clients will | ||
accumulate and be in a paused state until a Lead Master appears. | ||
- Existing client connections are allowed to complete current transaction, | ||
and are then reverted to a similar pending state as new connections. | ||
3. Client connections are forwarded to the Lead Master. | ||
|
||
Note that the interplay demonstrated in this case does not require any | ||
interaction with either HARP Manager or Postgres. The Consensus Layer itself | ||
is the source of all truth from the Proxy's perspective. | ||
|
||
### Colocation | ||
|
||
The arrangement of the work units is such that their organization is required | ||
to follow these principles: | ||
|
||
1. The Manager and Postgres units must exist concomitantly within the same | ||
node. | ||
2. The contents of the Consensus Layer dictate the prescriptive role of all | ||
operational work units. | ||
|
||
This delegates cluster Quorum responsibilities to the Consensus Layer itself, | ||
while HARP leverages it for critical role assignments and key/value storage. | ||
Neither storage or retrieval will succeed if the Consensus Layer is inoperable | ||
or unreachable, thus preventing rogue Postgres nodes from accepting | ||
connections. | ||
|
||
As a result, the Consensus Layer should generally exist outside of HARP or HARP | ||
managed nodes for maximum safety. Our reference diagrams reflect this in order | ||
to encourage such separation, though it is not required. | ||
|
||
!!! Note | ||
In order to operate and manage cluster state, BDR contains its own | ||
implementation of the Raft Consensus model. HARP may be configured to | ||
leverage this same layer to reduce reliance on external dependencies and | ||
to preserve server resources. However, there are certain drawbacks to this | ||
approach that are discussed in further depth in the section on the | ||
[Consensus Layer](09_consensus-layer). | ||
|
||
## Recommended Architecture and Use | ||
|
||
HARP was primarily designed to represent a BDR Always-On architecture which | ||
resides within two (or more) Data Centers and consists of at least five BDR | ||
nodes. This does not count any Logical Standby nodes. | ||
|
||
The current and standard representation of this can be seen in the following | ||
diagram: | ||
|
||
![BDR Always-On Reference Architecture](images/bdr-ao-spec.png) | ||
|
||
In this diagram, HARP Manager would exist on BDR Nodes 1-4. The initial state | ||
of the cluster would be that BDR Node 1 is the Lead master of DC A, and BDR | ||
Node 3 is the Lead Master of DC B. | ||
|
||
This would result in any HARP Proxy resource in DC A connecting to BDR Node 1, | ||
and likewise the HARP Proxy resource in DC B connecting to BDR Node 3. | ||
|
||
!!! Note | ||
While this diagram only shows a single HARP Proxy per DC, this is merely | ||
illustrative and should not be considered a Single Point of Failure. Any | ||
number of HARP Proxy nodes may exist, and they will all direct application | ||
traffic to the same node. | ||
|
||
### Location Configuration | ||
|
||
In order for multiple BDR nodes to be eligible to take the Lead Master lock in | ||
a location, a Location must be defined within the `config.yml` configuration | ||
file. | ||
|
||
To reproduce the diagram above, we would have these lines in the `config.yml` | ||
configuration for BDR Nodes 1 and 2: | ||
|
||
```yaml | ||
location: dca | ||
``` | ||
And for BDR Nodes 3 and 4: | ||
```yaml | ||
location: dcb | ||
``` | ||
This applies to any HARP Proxy nodes which are designated in those respective | ||
data centers as well. | ||
### BDR 3.7 Compatibility | ||
BDR 3.7 and above offers more direct Location definition by assigning a | ||
Location to the BDR node itself. This is done by calling the following SQL | ||
API function while connected to the BDR node. So for BDR Nodes 1 and 2, we | ||
might do this: | ||
```sql | ||
SELECT bdr.set_node_location('dca'); | ||
``` | ||
|
||
And for BDR Nodes 3 and 4: | ||
|
||
```sql | ||
SELECT bdr.set_node_location('dcb'); | ||
``` | ||
|
||
Afterwards, future versions of HARP Manager would derive the `location` field | ||
directly from BDR itself. This HARP functionality is not available yet, so we | ||
recommend using this and the setting in `config.yml` until HARP reports | ||
compatibility with this BDR API method. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
--- | ||
navTitle: Installation | ||
title: Installation | ||
--- | ||
|
||
A standard installation of HARP includes two system services: | ||
|
||
* HARP Manager (`harp_manager`) on the node being managed | ||
* HARP Proxy (`harp_router`) elsewhere | ||
|
||
There are generally two ways to install and configure these services to manage | ||
Postgres for proper Quorum-based connection routing. | ||
|
||
## Software Versions | ||
|
||
HARP does have dependencies on external software. These must fit a minimum | ||
version as listed here. | ||
|
||
| Software | Min Ver | | ||
|-----------|---------| | ||
| etcd | 3.4 | | ||
| PgBouncer | 1.14 | | ||
|
||
## TPAExec | ||
|
||
The easiest way to install and configure HARP is to use EDB's TPAexec utility | ||
for cluster deployment and management. For details on this software, see the | ||
[TPAexec product page](https://access.2ndquadrant.com/customer_portal/sw/tpa/). | ||
|
||
!!! Note | ||
TPAExec is currently only available through an EULA specifically dedicated | ||
to BDR cluster deployments. If you are unable to access the above URL, | ||
please contact your sales or account representative for more information. | ||
|
||
TPAexec itself must be configured to recognize that cluster routing should be | ||
managed through HARP by ensuring the TPA `config.yml` file contains these | ||
attributes: | ||
|
||
```yaml | ||
cluster_vars: | ||
failover_manager: harp | ||
``` | ||
!!! Note | ||
Versions of TPAexec prior to 21.1 require a slightly different approach: | ||
```yaml | ||
cluster_vars: | ||
enable_harp: true | ||
``` | ||
After this, HARP will be installed by invoking the regular `tpaexec` commands | ||
for making cluster modifications: | ||
|
||
```bash | ||
tpaexec provision ${CLUSTER_DIR} | ||
tpaexec deploy ${CLUSTER_DIR} | ||
``` | ||
|
||
No other modifications should be necessary, barring cluster-specific | ||
considerations. | ||
|
||
|
||
## Package Installation | ||
|
||
Currently CentOS/RHEL packages are provided via the EDB packaging | ||
infrastructure. For details, see the [HARP product | ||
page](https://access.2ndquadrant.com/customer_portal/sw/harp/). | ||
|
||
### etcd Packages | ||
|
||
Currently `etcd` packages for many popular Linux distributions are not | ||
available via their standard public repositories. EDB has therefore packaged | ||
`etcd` for RHEL and CentOS versions 7 and 8, Debian, and variants such as | ||
Ubuntu LTS. Again, access to our HARP package repository is necessary to use | ||
these libraries. | ||
|
||
## Consensus layer | ||
|
||
HARP requires a distributed consensus layer to operate. Currently this must be | ||
either `bdr` or `etcd`. If using fewer than 3 BDR nodes, it may become | ||
necessary to rely on `etcd`. Otherwise any BDR service outage will reduce the | ||
consensus layer to a single node and thus prevent node consensus and disable | ||
Postgres routing. | ||
|
||
### etcd | ||
|
||
If using `etcd` as the consensus layer, `etcd` must be installed either | ||
directly on the Postgres nodes, or in some separate location they can access. | ||
|
||
To set `etcd` as the consensus layer, include this in the HARP `config.yml` | ||
configuration file: | ||
|
||
```yaml | ||
dcs: | ||
driver: etcd | ||
endpoints: | ||
- host1:2379 | ||
- host2:2379 | ||
- host3:2379 | ||
``` | ||
|
||
When using TPAExec, all configured etcd endpoints will be entered here | ||
automatically. | ||
|
||
### BDR | ||
|
||
The `bdr` native consensus layer is available from BDR 3.6.21 and 3.7.3. This | ||
Consensus Layer model requires no supplementary software when managing routing | ||
for a BDR cluster. | ||
|
||
As previously mentioned, to ensure Quorum is possible in the cluster, always | ||
use more than two nodes so BDR's consensus layer remains responsive during node | ||
maintenance or outages. | ||
|
||
To set BDR as the consensus layer, include this in the `config.yml` | ||
configuration file: | ||
|
||
```yaml | ||
dcs: | ||
driver: bdr | ||
endpoints: | ||
- host=host1 dbname=bdrdb user=harp_user | ||
- host=host2 dbname=bdrdb user=harp_user | ||
- host=host3 dbname=bdrdb user=harp_user | ||
``` | ||
|
||
As can be seen here, the endpoints for a BDR consensus layer follow the | ||
standard Postgres DSN connection format. |
Oops, something went wrong.