Skip to content

Commit

Permalink
Content/harp/2.0/test import (#1914)
Browse files Browse the repository at this point in the history
Final import before publication.
  • Loading branch information
jericson-edb authored Oct 13, 2021
1 parent 502848e commit cf9a0bf
Show file tree
Hide file tree
Showing 23 changed files with 2,211 additions and 0 deletions.
263 changes: 263 additions & 0 deletions product_docs/docs/harp/2.0/02_overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
---
navTitle: Overview
title: HARP Functionality Overview
---

HARP is a new approach to High Availability primarily designed for BDR
clusters, though it can also manage standard Primary + PSR topologies. It
leverages consensus-driven Quorum to determine the correct connection end-point
in a semi-exclusive manner to prevent unintended multi-node writes from an
application.

## The Importance of Quorum

The central purpose of HARP is to enforce full Quorum on any Postgres cluster
it manages. Quorum is merely a term generally applied to a voting body that
mandates a certain minimum of attendees are available to make a decision. Or
perhaps even more simply: Majority Rules.

In order for any vote to end in a result other than a tie, an odd number of
nodes must constitute the full cluster membership. Quorum however does not
strictly demand this restriction; a simple majority will suffice. This means
that in a cluster of N nodes, Quorum requires a minimum of N/2+1 nodes to hold
a meaningful vote.

All of this ensures the cluster is always in agreement regarding which node
should be "in charge". For a BDR cluster consisting of multiple nodes, this
determines which node is the primary write target. HARP designates this node
as the Lead Master.

## Reducing Write Targets

The consequence of ignoring the concept of Quorum, or applying it
insufficiently, may lead to a Split Brain scenario where the "correct" write
target is ambiguous or unknowable. In a standard Postgres cluster, it is
important that only a single node is ever writable and sending replication
traffic to the remaining nodes.

Even in Multi-Master capable approaches such as BDR, it can be beneficial to
reduce the amount of necessary conflict management to derive identical data
across the cluster. In clusters that consist of multiple BDR nodes per physical
location or region, this usually means a single BDR node acts as a "Leader" and
remaining nodes are "Shadows". These Shadow nodes are still writable, but doing
so is discouraged unless absolutely necessary.

By leveraging Quorum, it's possible for all nodes to agree exactly which
Postgres node should represent the entire cluster, or a local BDR region. Any
nodes that lose contact with the remainder of the Quorum, or are overruled by
it, by definition cannot become the cluster Leader.

This prevents Split Brain situations where writes unintentionally reach two
Postgres nodes. Unlike technologies such as VPNs, Proxies, load balancers, or
DNS, a Quorum-derived consensus cannot be circumvented by mis-configuration or
network partitions. So long as it's possible to contact the Consensus layer to
determine the state of the Quorum maintained by HARP, only one target is ever
valid.

## Basic Architecture

The design of HARP comes in essentially two parts consisting of a Manager and
a Proxy. The following diagram describes how these interact with a single
Postgres instance:

![HARP Unit](images/ha-unit.png)

The Consensus Layer is an external entity where Harp Manager maintains
information it learns about its assigned Postgres node, and HARP Proxy
translates this information to a valid Postgres node target. Because Proxy
obtains the node target from the Consensus Layer, several such instances may
exist independently.

While using BDR itself as the Consensus Layer, each server node resembles this
variant instead.

![HARP Unit w/BDR Consensus](images/ha-unit-bdr.png)

In either case, each unit consists of the following elements:

* A Postgres or EDB instance
* A Consensus Layer resource, meant to track various attributes of the Postgres
instance
* A HARP Manager process to convey the state of the Postgres node to the
Consensus Layer
* A HARP Proxy service that directs traffic to the proper Lead Master node,
as derived from the Consensus Layer

Not every application stack has access to additional node resources
specifically for the Proxy component, so it can be combined with the
application server to simplify the stack itself.

This is a typical design using two BDR nodes in a single Data Center organized in a Lead Master / Shadow Master configuration:

![HARP Cluster](images/ha-ao.png)

Note that when using BDR itself as the HARP Consensus Layer, at least three
fully qualified BDR nodes must be present to ensure a quorum majority.

![HARP Cluster w/BDR Consensus](images/ha-ao-bdr.png)

(Not shown in the above diagram are connections between BDR nodes.)

## How it Works

When managing a BDR cluster, HARP maintains at most one "Leader" node per
defined Location. Canonically this is referred to as the Lead Master. Other BDR
nodes which are eligible to take this position are Shadow Master state until
such a time they take the Leader role.

Applications may contact the current Leader only through the Proxy service.
Since the Consensus Layer requires Quorum agreement before conveying Leader
state, any and all Proxy services will direct traffic to that node.

At a high level, this is ultimately what prevents application interaction with
multiple nodes simultaneously.

### Determining a Leader

As an example, consider the role of Lead Master within a locally subdivided
BDR Always-On group as may exist within a single data center. When any
Postgres or Manager resource is started, and after a configurable refresh
interval, the following must occur:

1. The Manager checks the status of its assigned Postgres resource.
- If Postgres is not running, try again after configurable timeout.
- If Postgres is running, continue.
2. The Manager checks the status of the Leader lease in the Consensus Layer.
- If the lease is unclaimed, acquire it and assign the identity of
the Postgres instance assigned to this Manager. This lease duration is
configurable, but setting it too low may result in unexpected leadership
transitions.
- If the lease is already claimed by us, renew the lease TTL.
- Otherwise do nothing.

Obviously a lot more happens here, but this simplified version should explain
what's happening. The Leader lease can only be held by one node, and if it's
held elsewhere, HARP Manager gives up and tries again later.

!!! Note
Depending on the chosen Consensus Layer, rather than repeatedly looping to
check the status of the Leader lease, HARP will subscribe to notifications
instead. In this case, it can respond immediately any time the state of the
lease changes, rather than polling. Currently this functionality is
restricted to the etcd Consensus Layer.

This means HARP itself does not hold elections or manage Quorum; this is
delegated to the Consensus Layer. The act of obtaining the lease must be
acknowledged by a Quorum of the Consensus Layer, so if the request succeeds,
that node leads the cluster in that Location.

### Connection Routing

Once the role of the Lead Master is established, connections are handled
with a similar deterministic result as reflected by HARP Proxy. Consider a case
where HAProxy needs to determine the connection target for a particular backend
resource:

1. HARP Proxy interrogates the Consensus layer for the current Lead Master in
its configured location.
2. If this is unset or in transition;
- New client connections to Postgres are barred, but clients will
accumulate and be in a paused state until a Lead Master appears.
- Existing client connections are allowed to complete current transaction,
and are then reverted to a similar pending state as new connections.
3. Client connections are forwarded to the Lead Master.

Note that the interplay demonstrated in this case does not require any
interaction with either HARP Manager or Postgres. The Consensus Layer itself
is the source of all truth from the Proxy's perspective.

### Colocation

The arrangement of the work units is such that their organization is required
to follow these principles:

1. The Manager and Postgres units must exist concomitantly within the same
node.
2. The contents of the Consensus Layer dictate the prescriptive role of all
operational work units.

This delegates cluster Quorum responsibilities to the Consensus Layer itself,
while HARP leverages it for critical role assignments and key/value storage.
Neither storage or retrieval will succeed if the Consensus Layer is inoperable
or unreachable, thus preventing rogue Postgres nodes from accepting
connections.

As a result, the Consensus Layer should generally exist outside of HARP or HARP
managed nodes for maximum safety. Our reference diagrams reflect this in order
to encourage such separation, though it is not required.

!!! Note
In order to operate and manage cluster state, BDR contains its own
implementation of the Raft Consensus model. HARP may be configured to
leverage this same layer to reduce reliance on external dependencies and
to preserve server resources. However, there are certain drawbacks to this
approach that are discussed in further depth in the section on the
[Consensus Layer](09_consensus-layer).

## Recommended Architecture and Use

HARP was primarily designed to represent a BDR Always-On architecture which
resides within two (or more) Data Centers and consists of at least five BDR
nodes. This does not count any Logical Standby nodes.

The current and standard representation of this can be seen in the following
diagram:

![BDR Always-On Reference Architecture](images/bdr-ao-spec.png)

In this diagram, HARP Manager would exist on BDR Nodes 1-4. The initial state
of the cluster would be that BDR Node 1 is the Lead master of DC A, and BDR
Node 3 is the Lead Master of DC B.

This would result in any HARP Proxy resource in DC A connecting to BDR Node 1,
and likewise the HARP Proxy resource in DC B connecting to BDR Node 3.

!!! Note
While this diagram only shows a single HARP Proxy per DC, this is merely
illustrative and should not be considered a Single Point of Failure. Any
number of HARP Proxy nodes may exist, and they will all direct application
traffic to the same node.

### Location Configuration

In order for multiple BDR nodes to be eligible to take the Lead Master lock in
a location, a Location must be defined within the `config.yml` configuration
file.

To reproduce the diagram above, we would have these lines in the `config.yml`
configuration for BDR Nodes 1 and 2:

```yaml
location: dca
```
And for BDR Nodes 3 and 4:
```yaml
location: dcb
```
This applies to any HARP Proxy nodes which are designated in those respective
data centers as well.
### BDR 3.7 Compatibility
BDR 3.7 and above offers more direct Location definition by assigning a
Location to the BDR node itself. This is done by calling the following SQL
API function while connected to the BDR node. So for BDR Nodes 1 and 2, we
might do this:
```sql
SELECT bdr.set_node_location('dca');
```

And for BDR Nodes 3 and 4:

```sql
SELECT bdr.set_node_location('dcb');
```

Afterwards, future versions of HARP Manager would derive the `location` field
directly from BDR itself. This HARP functionality is not available yet, so we
recommend using this and the setting in `config.yml` until HARP reports
compatibility with this BDR API method.
129 changes: 129 additions & 0 deletions product_docs/docs/harp/2.0/03_installation.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
navTitle: Installation
title: Installation
---

A standard installation of HARP includes two system services:

* HARP Manager (`harp_manager`) on the node being managed
* HARP Proxy (`harp_router`) elsewhere

There are generally two ways to install and configure these services to manage
Postgres for proper Quorum-based connection routing.

## Software Versions

HARP does have dependencies on external software. These must fit a minimum
version as listed here.

| Software | Min Ver |
|-----------|---------|
| etcd | 3.4 |
| PgBouncer | 1.14 |

## TPAExec

The easiest way to install and configure HARP is to use EDB's TPAexec utility
for cluster deployment and management. For details on this software, see the
[TPAexec product page](https://access.2ndquadrant.com/customer_portal/sw/tpa/).

!!! Note
TPAExec is currently only available through an EULA specifically dedicated
to BDR cluster deployments. If you are unable to access the above URL,
please contact your sales or account representative for more information.

TPAexec itself must be configured to recognize that cluster routing should be
managed through HARP by ensuring the TPA `config.yml` file contains these
attributes:

```yaml
cluster_vars:
failover_manager: harp
```
!!! Note
Versions of TPAexec prior to 21.1 require a slightly different approach:
```yaml
cluster_vars:
enable_harp: true
```
After this, HARP will be installed by invoking the regular `tpaexec` commands
for making cluster modifications:

```bash
tpaexec provision ${CLUSTER_DIR}
tpaexec deploy ${CLUSTER_DIR}
```

No other modifications should be necessary, barring cluster-specific
considerations.


## Package Installation

Currently CentOS/RHEL packages are provided via the EDB packaging
infrastructure. For details, see the [HARP product
page](https://access.2ndquadrant.com/customer_portal/sw/harp/).

### etcd Packages

Currently `etcd` packages for many popular Linux distributions are not
available via their standard public repositories. EDB has therefore packaged
`etcd` for RHEL and CentOS versions 7 and 8, Debian, and variants such as
Ubuntu LTS. Again, access to our HARP package repository is necessary to use
these libraries.

## Consensus layer

HARP requires a distributed consensus layer to operate. Currently this must be
either `bdr` or `etcd`. If using fewer than 3 BDR nodes, it may become
necessary to rely on `etcd`. Otherwise any BDR service outage will reduce the
consensus layer to a single node and thus prevent node consensus and disable
Postgres routing.

### etcd

If using `etcd` as the consensus layer, `etcd` must be installed either
directly on the Postgres nodes, or in some separate location they can access.

To set `etcd` as the consensus layer, include this in the HARP `config.yml`
configuration file:

```yaml
dcs:
driver: etcd
endpoints:
- host1:2379
- host2:2379
- host3:2379
```

When using TPAExec, all configured etcd endpoints will be entered here
automatically.

### BDR

The `bdr` native consensus layer is available from BDR 3.6.21 and 3.7.3. This
Consensus Layer model requires no supplementary software when managing routing
for a BDR cluster.

As previously mentioned, to ensure Quorum is possible in the cluster, always
use more than two nodes so BDR's consensus layer remains responsive during node
maintenance or outages.

To set BDR as the consensus layer, include this in the `config.yml`
configuration file:

```yaml
dcs:
driver: bdr
endpoints:
- host=host1 dbname=bdrdb user=harp_user
- host=host2 dbname=bdrdb user=harp_user
- host=host3 dbname=bdrdb user=harp_user
```

As can be seen here, the endpoints for a BDR consensus layer follow the
standard Postgres DSN connection format.
Loading

0 comments on commit cf9a0bf

Please sign in to comment.