diff --git a/product_docs/docs/efm/4/04_configuring_efm/01_cluster_properties.mdx b/product_docs/docs/efm/4/04_configuring_efm/01_cluster_properties.mdx index f1956762c6b..166785e6172 100644 --- a/product_docs/docs/efm/4/04_configuring_efm/01_cluster_properties.mdx +++ b/product_docs/docs/efm/4/04_configuring_efm/01_cluster_properties.mdx @@ -15,7 +15,7 @@ Each node in a Failover Manager cluster has a properties file (by default, named After completing the Failover Manager installation, make a working copy of the template before modifying the file contents: ```text -# cp /etc/edb/efm-4.4/efm.properties.in /etc/edb/efm-4.4/efm.properties +# cp /etc/edb/efm-4.5/efm.properties.in /etc/edb/efm-4.5/efm.properties ``` After copying the template file, change the owner of the file to efm: @@ -438,10 +438,10 @@ Set the `is.witness` property to `true` to indicate that the current node is a w is.witness= ``` -The EDB Postgres Advanced Server `pg_is_in_recovery()` function is a Boolean function that reports the recovery state of a database. The function returns `true` if the database is in recovery or `false` if the database isn't in recovery. When an agent starts, it connects to the local database and invokes the `pg_is_in_recovery()` function. +The EDB Postgres Advanced Server `pg_is_in_recovery()` function is a Boolean function that reports the recovery state of a database. The function returns `true` if the database is in recovery or `false` if the database isn't in recovery. When an agent starts, it connects to the local database and invokes the `pg_is_in_recovery()` function. -- If the server responds true, the agent assumes the role of standby. -- If the server responds false, the agent assumes the role of primary. +- If the server responds true, the agent assumes the role of standby. +- If the server responds false, the agent assumes the role of primary. - If there's no local database, the agent assumes an idle state. !!! Note @@ -904,7 +904,7 @@ restart.connection.timeout=60 -Use the `auto.resume.period` property to specify the number of seconds for an agent to attempt to resume monitoring that database. This property applies after a monitored database fails and an agent has assumed an idle state or when starting in IDLE mode. +Use the `auto.resume.period` property to specify the number of seconds for an agent to attempt to resume monitoring that database. This property applies after a monitored database fails and an agent has assumed an idle state or when starting in IDLE mode. ```ini # Period in seconds for IDLE agents to try to resume monitoring @@ -972,7 +972,7 @@ check.vip.before.promotion=true -Use the `pgpool.enable` property to specify if you want to enable the Failover Manager and Pgpool integration for high availability. If you want to enable Pgpool integration in a non-sudo mode (running as the DB owner), the PCPPASS file must be owned by the DB owner operating system user and you must set the file permissions to 600. +Use the `pgpool.enable` property to specify if you want to enable the Failover Manager and Pgpool integration for high availability. If you want to enable Pgpool integration in a non-sudo mode (running as the DB owner), the PCPPASS file must be owned by the DB owner operating system user and you must set the file permissions to 600. ```ini @@ -1036,13 +1036,13 @@ script.load.balancer.detach= Use the `detach.on.agent.failure` property to indicate that you don't want to detach a node from the load balancer in a scenario where the primary agent fails but the database is still reachable. The default value is `true.` ```ini -# If set to true, Failover Manager will detach the node from load -# balancer if the primary agent fails but the database is still -# reachable. In most scenarios this is NOT the desired situation. In -# scenarios where the detach script should run with a failed primary -# agent, even when the primary database is still healthy this parameter -# should be set to true. If no value specified it defaults to true (for -# backwards compatibility). +# If set to true, Failover Manager will detach the node from load +# balancer if the primary agent fails but the database is still +# reachable. In most scenarios this is NOT the desired situation. In +# scenarios where the detach script should run with a failed primary +# agent, even when the primary database is still healthy this parameter +# should be set to true. If no value specified it defaults to true (for +# backwards compatibility). # This is not applicable for standbys. detach.on.agent.failure= ``` diff --git a/product_docs/docs/efm/4/04_configuring_efm/02_encrypting_database_password.mdx b/product_docs/docs/efm/4/04_configuring_efm/02_encrypting_database_password.mdx index 42dd338364c..2aba26c608e 100644 --- a/product_docs/docs/efm/4/04_configuring_efm/02_encrypting_database_password.mdx +++ b/product_docs/docs/efm/4/04_configuring_efm/02_encrypting_database_password.mdx @@ -35,7 +35,7 @@ This example shows using the `encrypt` utility to encrypt a password for the `ac # efm encrypt acctg This utility will generate an encrypted password for you to place in your Failover Manager cluster property file: -/etc/edb/efm-4.4/acctg.properties +/etc/edb/efm-4.5/acctg.properties Please enter the password and hit enter: Please enter the password again to confirm: The encrypted password is: 516b36fb8031da17cfbc010f7d09359c @@ -49,16 +49,16 @@ db.password.encrypted=516b36fb8031da17cfbc010f7d09359c After receiving your encrypted password, paste the password into the properties file and start the Failover Manager service. If there's a problem with the encrypted password, the Failover Manager service doesn't start: ```text -[witness@localhost ~]# systemctl start edb-efm-4.4 -Job for edb-efm-4.4.service failed because the control process exited with error code. See "systemctl status edb-efm-4.4.service" and "journalctl -xe" for details. +[witness@localhost ~]# systemctl start edb-efm-4.5 +Job for edb-efm-4.5.service failed because the control process exited with error code. See "systemctl status edb-efm-4.5.service" and "journalctl -xe" for details. ``` -If you receive this message when starting the Failover Manager service, see the startup log `/var/log/efm-4.4/startup-efm.log` for more information. +If you receive this message when starting the Failover Manager service, see the startup log `/var/log/efm-4.5/startup-efm.log` for more information. If you are using RHEL/CentOS 7.x or RHEL/Rocky Linux/AlmaLinux 8.x, startup information is also available with the following command: ```shell -systemctl status edb-efm-4.4 +systemctl status edb-efm-4.5 ``` To prevent a cluster from inadvertently connecting to the database of another cluster, the cluster name is incorporated into the encrypted password. If you modify the cluster name, you must re-encrypt the database password and update the cluster properties file. diff --git a/product_docs/docs/efm/4/04_configuring_efm/03_cluster_members.mdx b/product_docs/docs/efm/4/04_configuring_efm/03_cluster_members.mdx index 9948c81a5b2..ece05ca706c 100644 --- a/product_docs/docs/efm/4/04_configuring_efm/03_cluster_members.mdx +++ b/product_docs/docs/efm/4/04_configuring_efm/03_cluster_members.mdx @@ -15,7 +15,7 @@ Each node in a Failover Manager cluster has a cluster members file (by default n After completing the Failover Manager installation, make a working copy of the template: ```shell -cp /etc/edb/efm-4.4/efm.nodes.in /etc/edb/efm-4.4/efm.nodes +cp /etc/edb/efm-4.5/efm.nodes.in /etc/edb/efm-4.5/efm.nodes ``` After copying the template file, change the owner of the file to efm: diff --git a/product_docs/docs/efm/4/04_configuring_efm/04_extending_efm_permissions.mdx b/product_docs/docs/efm/4/04_configuring_efm/04_extending_efm_permissions.mdx index b4b81c97bd3..f30a9cfd9df 100644 --- a/product_docs/docs/efm/4/04_configuring_efm/04_extending_efm_permissions.mdx +++ b/product_docs/docs/efm/4/04_configuring_efm/04_extending_efm_permissions.mdx @@ -36,18 +36,18 @@ The `efm-42` file is located in `/etc/sudoers.d` and contains the following entr # If you run your db service under a non-default account, you will need to copy # this file to grant the proper permissions and specify the account in your efm # cluster properties file by changing the 'db.service.owner' property. -efm ALL=(postgres) NOPASSWD: /usr/edb/efm-4.4/bin/efm_db_functions -efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-4.4/bin/efm_db_functions +efm ALL=(postgres) NOPASSWD: /usr/edb/efm-4.5/bin/efm_db_functions +efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-4.5/bin/efm_db_functions # Allow user 'efm' to sudo efm_root_functions as 'root' to write/delete the PID file, # validate the db.service.owner property, etc. -efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.4/bin/efm_root_functions +efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.5/bin/efm_root_functions # Allow user 'efm' to sudo efm_address as root for VIP tasks. -efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.4/bin/efm_address +efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.5/bin/efm_address # Allow user 'efm' to sudo efm_pgpool_functions as root for pgpool tasks. -efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.4/bin/efm_pgpool_functions +efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.5/bin/efm_pgpool_functions # relax tty requirement for user 'efm' Defaults:efm !requiretty @@ -89,9 +89,9 @@ To run Failover Manager without sudo, you must select a database process owner w ```shell su - enterprisedb - cp /etc/edb/efm-4.4/efm.properties.in .properties + cp /etc/edb/efm-4.5/efm.properties.in .properties - cp /etc/edb/efm-4.4/efm.nodes.in /.nodes + cp /etc/edb/efm-4.5/efm.nodes.in /.nodes ``` Then, modify the cluster properties file, providing the name of the user in the `db.service.owner` property. Also make sure that the `db.service.name` property is blank. Without sudo, you can't run services without root access. @@ -99,7 +99,7 @@ Then, modify the cluster properties file, providing the name of the user in the After modifying the configuration, the new user can control Failover Manager with the following command: ```shell -/usr/edb/efm-4.4/bin/runefm.sh start|stop .properties +/usr/edb/efm-4.5/bin/runefm.sh start|stop .properties ``` Where `` specifies the full path of the cluster properties file. The user provides the full path to the properties file whenever the nondefault user is controlling agents or using the `efm` script. diff --git a/product_docs/docs/efm/4/04_configuring_efm/06_configuring_for_eager_failover.mdx b/product_docs/docs/efm/4/04_configuring_efm/06_configuring_for_eager_failover.mdx index 108cbb41657..a39bc31b1f2 100644 --- a/product_docs/docs/efm/4/04_configuring_efm/06_configuring_for_eager_failover.mdx +++ b/product_docs/docs/efm/4/04_configuring_efm/06_configuring_for_eager_failover.mdx @@ -6,9 +6,9 @@ redirects: -In default run mode, if a primary Failover Manager process fails, there's no failover protection until the agent restarts. To avoid this case, you can set up the primary node through `systemd` to cause a failover when the primary agent exits, which is called Eager Failover. +In default run mode, if a primary Failover Manager process fails, there's no failover protection until the agent restarts. To avoid this case, you can set up the primary node through `systemd` to cause a failover when the primary agent exits, which is called Eager Failover. -You can set up Eager Failover by performing the following steps. The example uses EDB Postgres Advanced Server version 12 and Failover Manager version 4.2. +You can set up Eager Failover by performing the following steps. The example uses EDB Postgres Advanced Server version 12 and Failover Manager version 4.5. @@ -20,29 +20,29 @@ You can set up Eager Failover by performing the following steps. The example us ``` If you don't set this property before starting Failover Manager, shutting down a Failover Manager agent shuts down the database without failover. - + - With Eager Failover enabled, using the `efm stop-cluster` command stops all of the Failover Manager agents and shuts down the primary database. Since the agents aren't running, there's no failover. To avoid thihs scenario, you can disable the command using the `enable.stop.cluster` property. - + ```ini enable.stop.cluster=false ``` - Ensure that the database server and the local Failover Manager agent are running. - + - As root, create `/etc/systemd/system/edb-as-12.service` file and include: - + ```ini .include /lib/systemd/system/edb-as-12.service [Unit] - BindsTo=edb-efm-4.2.service + BindsTo=edb-efm-4.5.service ``` - Run the following command to reload the configuration files: - + ```shell systemctl daemon-reload ``` - + With these changes, when the Failover Manager agent is stopped or ended, the rest of the cluster treats this situation as a failure and attempts a failover. @@ -51,7 +51,7 @@ With these changes, when the Failover Manager agent is stopped or ended, the res - If you want to stop Failover Manager without stopping the database, comment out the following line in `/etc/systemd/system/edb-as-12.service`: ```ini - BindsTo=edb-efm-4.2.service + BindsTo=edb-efm-4.5.service ``` - Run the following command to reload the configuration files: ```shell @@ -60,10 +60,10 @@ With these changes, when the Failover Manager agent is stopped or ended, the res ## Upgrading Failover Manager in Eager Failover mode -To upgrade Failover Manager without stopping EDB Postgres Advanced Server, temporarily disable the Eager Failover mode. +To upgrade Failover Manager without stopping EDB Postgres Advanced Server, temporarily disable the Eager Failover mode. + +1. [Disable Eager Failover](#disabling_the_eager_failover) -1. [Disable Eager Failover](#disabling_the_eager_failover) - 2. [Stop and upgrade Failover Manager](../12_upgrading_existing_cluster/#upgrading_existing_cluster) 3. [Enable Eager Failover](#enabling_the_eager_failover) @@ -72,27 +72,27 @@ To upgrade Failover Manager without stopping EDB Postgres Advanced Server, tempo - Since the `systemd` command isn't used to manage the database while running Failover Manager with a non-sudo setup, Eager Failover is supported only in sudo mode. It isn't supported in a non-sudo mode. -- Eager Failover isn't suitable for situations in which a VIP wouldn't be released by the old primary. +- Eager Failover isn't suitable for situations in which a VIP wouldn't be released by the old primary. - Eager Failover is suitable in the following situations: - With the EDB Postgres Advanced Server high-availability setup. - In a setup using client connection failover with [jdbc](https://jdbc.postgresql.org/documentation/head/connect.html#connection-failover) or libpq [(target-session-attrs)](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS). - - When custom scripting triggered by `script.fence` would fence the old primary server (STONITH). Some examples are to shut down the VM with VMWare vCenter integration, openstack integration, or lights-out management. + - When custom scripting triggered by `script.fence` would fence the old primary server (STONITH). Some examples are to shut down the VM with VMWare vCenter integration, openstack integration, or lights-out management. - When custom scripting triggered by `script.fence` would use ssh to deactivate the VIP. !!! Note - Setting `check.vip.before.promotion=false` is required to allow the new primary to attach the VIP before the old primary releases it. + Setting `check.vip.before.promotion=false` is required to allow the new primary to attach the VIP before the old primary releases it. - Use care when using `primary.shutdown.as.failure=true`. See the description of the [primary.shutdown.as.failure](01_cluster_properties/#primary_shutdown_as_failure) property for information on how to safely bring down the database if needed. - With every failover, a primary ends up being a failed primary, which doesn't automatically recover as an operational standby. Therefore, make sure the cluster contains multiple promotable standbys, and the total number of standbys is at least two more than the value specified for the `minimum.standbys` property. This is a general recommendation, but it becomes more pressing when using Eager Failover. -- If the database server is stopped, restarting the database also starts Failover Manager. - +- If the database server is stopped, restarting the database also starts Failover Manager. + !!! Note - If there's a problem starting Failover Manager, such as a bad property value, the database server starts and shuts down again without displaying any warning that it isn't running. - If the Failover Manager process was previously ended, the lock file still exists, and the agent can't restart automatically. - If problems occur when starting the database server or the Failover Manager agent, check the Failover Manager startup log for information. -- As a result of running the `stop-cluster` command, Failover Manager stops on all the nodes. In Eager Failover mode, the `stop-cluster` command also stops EDB Postgres Advanced Server without a failover. Set `enable.stop.cluster=false` to make sure the `stop-cluster` command can't be invoked unintentionally. - +- As a result of running the `stop-cluster` command, Failover Manager stops on all the nodes. In Eager Failover mode, the `stop-cluster` command also stops EDB Postgres Advanced Server without a failover. Set `enable.stop.cluster=false` to make sure the `stop-cluster` command can't be invoked unintentionally. + diff --git a/product_docs/docs/efm/4/05_using_efm.mdx b/product_docs/docs/efm/4/05_using_efm.mdx index b4f25e00172..f2ef88aee40 100644 --- a/product_docs/docs/efm/4/05_using_efm.mdx +++ b/product_docs/docs/efm/4/05_using_efm.mdx @@ -1,6 +1,6 @@ --- title: "Using Failover Manager" -redirects: +redirects: - ../efm_user/05_using_efm legacyRedirectsGenerated: # This list is generated by a script. If you need add entries, use the `legacyRedirects` key. @@ -54,7 +54,7 @@ If a new primary or standby node joins a cluster, all of the existing nodes also ### Adding nodes to a cluster -You can add a node to a Failover Manager cluster at any time. When you add a node to a cluster, you must modify the cluster to allow the new node, and then tell the new node how to find the cluster. +You can add a node to a Failover Manager cluster at any time. When you add a node to a cluster, you must modify the cluster to allow the new node, and then tell the new node how to find the cluster. 1. Unless `auto.allow.hosts` is set to `true`, use the `efm allow-node` command to add the address of the new node to the Failover Manager allowed node host list. When invoking the command, specify the cluster name and the address of the new node: @@ -221,7 +221,7 @@ The following parameters must be unique in each cluster properties file: `db.data.dir` `virtual.ip` (if used) - + `db.service.name` (if used) In each cluster properties file, the `db.port` parameter specifies a unique value for each cluster. The `db.user` and `db.database` parameter can have the same value or a unique value. For example, the `acctg.properties` file can specify: @@ -282,7 +282,7 @@ Environment=CLUSTER=acctg Also update the value of the `PIDfile` parameter to specify the new cluster name. For example: ```ini -PIDFile=/var/run/efm-4.4/acctg.pid +PIDFile=/var/run/efm-4.5/acctg.pid ``` After copying the service scripts, enable the services: diff --git a/product_docs/docs/efm/4/08_controlling_efm_service.mdx b/product_docs/docs/efm/4/08_controlling_efm_service.mdx index dbe704d82d0..bddd4715be7 100644 --- a/product_docs/docs/efm/4/08_controlling_efm_service.mdx +++ b/product_docs/docs/efm/4/08_controlling_efm_service.mdx @@ -1,6 +1,6 @@ --- title: "Controlling the Failover Manager service" -redirects: +redirects: - ../efm_user/08_controlling_efm_service legacyRedirectsGenerated: # This list is generated by a script. If you need add entries, use the `legacyRedirects` key. @@ -40,12 +40,12 @@ Stop the Failover Manager on the current node. This command must be invoked by r The `status` command returns the status of the Failover Manager agent on which it is invoked. You can invoke the status command on any node to instruct Failover Manager to return status and server startup information. ```text -[root@ONE ~]}> systemctl status edb-efm-4.4 - edb-efm-4.4.service - EnterpriseDB Failover Manager 4.4 - Loaded: loaded (/usr/lib/systemd/system/edb-efm-4.4.service; disabled; vendor preset: disabled) +[root@ONE ~]}> systemctl status edb-efm-4.5 + edb-efm-4.5.service - EnterpriseDB Failover Manager 4.5 + Loaded: loaded (/usr/lib/systemd/system/edb-efm-4.5.service; disabled; vendor preset: disabled) Active: active (running) since Wed 2013-02-14 14:02:16 EST; 4s ago - Process: 58125 ExecStart=/bin/bash -c /usr/edb/edb-efm-4.4/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS) + Process: 58125 ExecStart=/bin/bash -c /usr/edb/edb-efm-4.5/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS) Main PID: 58180 (java) - CGroup: /system.slice/edb-efm-4.4.service - └─58180 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/bin/java -cp /usr/edb/edb-efm-4.4/lib/EFM-4.4.0.jar -Xmx128m... + CGroup: /system.slice/edb-efm-4.5.service + └─58180 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/bin/java -cp /usr/edb/edb-efm-4.5/lib/EFM-4.5.0.jar -Xmx128m... ``` diff --git a/product_docs/docs/efm/4/12_upgrading_existing_cluster.mdx b/product_docs/docs/efm/4/12_upgrading_existing_cluster.mdx index 2569a252e69..0a2f8face50 100644 --- a/product_docs/docs/efm/4/12_upgrading_existing_cluster.mdx +++ b/product_docs/docs/efm/4/12_upgrading_existing_cluster.mdx @@ -12,43 +12,43 @@ legacyRedirectsGenerated: Failover Manager provides a utility to assist you when upgrading a cluster managed by Failover Manager. To upgrade an existing cluster, you must: -1. Install Failover Manager 4.4 on each node of the cluster. For detailed information about installing Failover Manager, see [Installing Failover Manager](03_installing_efm/#installing_efm). +1. Install Failover Manager 4.5 on each node of the cluster. For detailed information about installing Failover Manager, see [Installing Failover Manager](03_installing_efm/#installing_efm). -2. After installing Failover Manager, invoke the `efm upgrade-conf` utility to create the `.properties` and `.nodes` files for Failover Manager 4.4. The Failover Manager installer installs the upgrade utility ([efm upgrade-conf](07_using_efm_utility/#efm_upgrade_conf)) to the `/usr/edb/efm-4.4/bin directory`. To invoke the utility, assume root privileges, and invoke the command: +2. After installing Failover Manager, invoke the `efm upgrade-conf` utility to create the `.properties` and `.nodes` files for Failover Manager 4.5. The Failover Manager installer installs the upgrade utility ([efm upgrade-conf](07_using_efm_utility/#efm_upgrade_conf)) to the `/usr/edb/efm-4.5/bin directory`. To invoke the utility, assume root privileges, and invoke the command: -```shell -efm upgrade-conf -``` + ```shell + efm upgrade-conf + ``` -The efm `upgrade-conf` utility locates the `.properties` and `.nodes` files of preexisting clusters and copies the parameter values to a new configuration file for use by Failover Manager. The utility saves the updated copy of the configuration files in the `/etc/edb/efm-4.4` directory. + The efm `upgrade-conf` utility locates the `.properties` and `.nodes` files of preexisting clusters and copies the parameter values to a new configuration file for use by Failover Manager. The utility saves the updated copy of the configuration files in the `/etc/edb/efm-4.5` directory. -3. Modify the `.properties` and `.nodes` files for Failover Manager 4.4, specifying any new preferences. Use your choice of editor to modify any additional properties in the properties file (located in the `/etc/edb/efm-4.4` directory) before starting the service for that node. For detailed information about property settings, see [The cluster properties file](04_configuring_efm/01_cluster_properties/#cluster_properties). +3. Modify the `.properties` and `.nodes` files for Failover Manager 4.5, specifying any new preferences. Use your choice of editor to modify any additional properties in the properties file (located in the `/etc/edb/efm-4.5` directory) before starting the service for that node. For detailed information about property settings, see [The cluster properties file](04_configuring_efm/01_cluster_properties/#cluster_properties). !!! Note `db.bin` is a required property. When modifying the properties file, ensure that the `db.bin` property specifies the location of the Postgres `bin` directory. 4. If you're using Eager Failover, you must disable it before stopping the Failover Manager cluster. For more information, see [Disabling Eager Failover](04_configuring_efm/06_configuring_for_eager_failover/#disabling-eager-failover). -5. Use a version-specific command to stop the old Failover Manager cluster. For example, you can use the following command to stop a version 4.1 cluster: +5. Use a version-specific command to stop the old Failover Manager cluster. For example, you can use the following command to stop a version 4.4 cluster: -```shell -/usr/efm-4.1/bin/efm stop-cluster efm -``` + ```shell + /usr/efm-4.4/bin/efm stop-cluster efm + ``` -1. Start the new [Failover Manager service](08_controlling_efm_service/#controlling_efm_service) (`edb-efm-4.4`) on each node of the cluster. +6. Start the new [Failover Manager service](08_controlling_efm_service/#controlling_efm_service) (`edb-efm-4.5`) on each node of the cluster. The following example shows invoking the upgrade utility to create the `.properties` and `.nodes` files for a Failover Manager installation: ```text -[root@k8s-worker ~]# /usr/edb/efm-4.4/bin/efm upgrade-conf efm -Checking directory /etc/edb/efm-4.1 +[root@k8s-worker ~]# /usr/edb/efm-4.5/bin/efm upgrade-conf efm +Checking directory /etc/edb/efm-4.4 Processing efm.properties file The following properties were added in addition to those in the previous installed version: priority.standbys detach.on.agent.failure -Checking directory /etc/edb/efm-4.1 +Checking directory /etc/edb/efm-4.4 Processing efm.nodes file Upgrade of files is finished. The owner and group for properties and nodes files have been set as 'efm'. @@ -57,38 +57,38 @@ Upgrade of files is finished. The owner and group for properties and nodes files If you're [using a Failover Manager configuration without sudo](04_configuring_efm/04_extending_efm_permissions/#running_efm_without_sudo), include the `-source` flag and specify the name of the directory in which the configuration files reside when invoking `upgrade-conf`. If the directory isn't the configuration default directory, the upgraded files are created in the directory from which the `upgrade-conf` command was invoked. -!!! Note +!!! Note If you're using a unit file, manually update the file to reflect the new Failover Manager service name when you perform an upgrade. ## Uninstalling Failover Manager !!! Note If you are using custom scripts, check to see if they are calling any Failover Manager scripts. For example, a script that runs after promotion to perform various tasks and then calls Failover Manager's `efm_address` script to acquire a virtual IP address. If you have any custom scripts calling Failover Manager scripts, update the custom scripts to use the newly installed version of the Failover Manager script before uninstalling the older version of the Failover Manager script. - -After upgrading to Failover Manager 4.4, you can use your native package manager to remove previous installations of Failover Manager. For example, use the following command to remove Failover Manager 4.1 and any unneeded dependencies: + +After upgrading to Failover Manager 4.5, you can use your native package manager to remove previous installations of Failover Manager. For example, use the following command to remove Failover Manager 4.4 and any unneeded dependencies: - On RHEL or CentOS 7.x: ```shell -yum remove edb-efm41 +yum remove edb-efm44 ``` - On RHEL or Rocky Linux or AlmaLinux 8.x: ```shell -dnf remove edb-efm41 +dnf remove edb-efm44 ``` - On Debian or Ubuntu: ```shell -apt-get remove edb-efm41 +apt-get remove edb-efm44 ``` - On SLES: ```shell -zypper remove edb-efm41 +zypper remove edb-efm44 ``` ## Performing a database update (minor version) diff --git a/product_docs/docs/efm/4/efm_quick_start/index.mdx b/product_docs/docs/efm/4/efm_quick_start/index.mdx index 8f09c5d6d88..4bb0365661c 100644 --- a/product_docs/docs/efm/4/efm_quick_start/index.mdx +++ b/product_docs/docs/efm/4/efm_quick_start/index.mdx @@ -21,7 +21,7 @@ Using EDB Postgres Advanced Server as an example (Failover Manager also works wi - Install Failover Manager on each primary and standby node. During EDB Postgres Advanced Server installation, you configured an EDB repository on each database host. You can use the EDB repository and the `yum install` command to install Failover Manager on each node of the cluster: ```shell - yum install edb-efm44 + yum install edb-efm45 ``` During the installation process, the installer creates a user named efm that has privileges to invoke scripts that control the Failover Manager service for clusters owned by enterprisedb or postgres. The example that follows creates a cluster named `efm`. @@ -31,7 +31,7 @@ Start the configuration process on a primary or standby node. Then, copy the con 1. Create working configuration files. Copy the provided sample files to create Failover Manager configuration files, and correct the ownership and version number if you are installing a different version: ```shell - cd /etc/edb/efm-4.4 + cd /etc/edb/efm-4.5 cp efm.properties.in efm.properties @@ -45,7 +45,7 @@ Start the configuration process on a primary or standby node. Then, copy the con 1. Create an [encrypted password](/efm/latest/efm_user/04_configuring_efm/02_encrypting_database_password/) needed for the properties file: ```shell - /usr/edb/efm-4.4/bin/efm encrypt efm + /usr/edb/efm-4.5/bin/efm encrypt efm ``` Follow the onscreen instructions to produce the encrypted version of your database password. @@ -83,22 +83,22 @@ Start the configuration process on a primary or standby node. Then, copy the con The Failover Manager agent doesn't validate the addresses in the `efm.nodes` file. The agent expects that some of the addresses in the file can't be reached (for example, that another agent hasn’t been started yet). -1. Configure the other nodes. Copy the `efm.properties` and `efm.nodes` files to `/etc/edb/efm-4.4` on the other nodes in your sample cluster. After copying the files, change the file ownership so the files are owned by efm:efm. The `efm.properties` file can be the same on every node, except for the following properties: +1. Configure the other nodes. Copy the `efm.properties` and `efm.nodes` files to `/etc/edb/efm-4.5` on the other nodes in your sample cluster. After copying the files, change the file ownership so the files are owned by efm:efm. The `efm.properties` file can be the same on every node, except for the following properties: - Modify the `bind.address` property to use the node’s local address. - + - Set `is.witness` to `true` if the node is a witness node. If the node is a witness node, the properties relating to a local database installation are ignored. -1. Start the Failover Manager cluster. On any node, start the Failover Manager agent. The agent is named `edb-efm-4.4`; you can use your platform-specific service command to control the service. For example, on a RHEL 7.x or Rocky Linux/AlmaLinux/RHEL 8.x host, use the command: +1. Start the Failover Manager cluster. On any node, start the Failover Manager agent. The agent is named `edb-efm-4.5`; you can use your platform-specific service command to control the service. For example, on a RHEL 7.x or Rocky Linux/AlmaLinux/RHEL 8.x host, use the command: ```shell - systemctl start edb-efm-4.4 + systemctl start edb-efm-4.5 ``` 1. After the agent starts, run the following command to see the status of the single-node cluster. The addresses of the other nodes appear in the `Allowed node host` list. ```shell - /usr/edb/efm-4.4/bin/efm cluster-status efm + /usr/edb/efm-4.5/bin/efm cluster-status efm ``` 1. Start the agent on the other nodes. Run the `efm cluster-status efm` command on any node to see the cluster status. @@ -106,7 +106,7 @@ Start the configuration process on a primary or standby node. Then, copy the con If any agent fails to start, see the startup log for information about what went wrong: ```shell - cat /var/log/efm-4.4/startup-efm.log + cat /var/log/efm-4.5/startup-efm.log ``` ## Perform a switchover @@ -114,7 +114,7 @@ Start the configuration process on a primary or standby node. Then, copy the con If the cluster status output shows that the primary and standby nodes are in sync, you can perform a switchover: ```shell - /usr/edb/efm-4.4/bin/efm promote efm -switchover + /usr/edb/efm-4.5/bin/efm promote efm -switchover ``` The command promotes a standby and reconfigures the primary database as a new standby in the cluster. To switch back, run the command again. @@ -124,5 +124,5 @@ The command promotes a standby and reconfigures the primary database as a new st For quick access to online help, use: ```shell -/usr/edb/efm-4.4/bin/efm --help +/usr/edb/efm-4.5/bin/efm --help ``` diff --git a/product_docs/docs/epas/10/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx b/product_docs/docs/epas/10/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx index 10c9dc6f620..8b3194a27d6 100644 --- a/product_docs/docs/epas/10/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx +++ b/product_docs/docs/epas/10/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx @@ -32,9 +32,7 @@ However, certain data types have alternative names referred to as *aliases*, whi For example, there are fixed length character data types that can be specified as `CHAR` or `CHARACTER`. There are variable length character data types that can be specified as `CHAR VARYING, CHARACTER VARYING, VARCHAR,` or `VARCHAR2`. For integers, there are `BINARY_INTEGER, PLS_INTEGER,` and `INTEGER` data types. For numbers, there are `NUMBER, NUMERIC, DEC,` and `DECIMAL` data types. -For detailed information about the data types supported by Advanced Server, see the *Database Compatibility for Oracle Developers Reference Guide*, available from EnterpriseDB at: - -[https://www.enterprisedb.com/docs](/epas/10/epas_compat_reference/) +For detailed information about the data types supported by Advanced Server, see [Data types](/epas/10/epas_compat_reference/02_the_sql_language/02_data_types/). Thus, when attempting to create overloaded subprograms, the formal parameter data types are not considered different if the specified data types are aliases of each other. diff --git a/product_docs/docs/epas/11/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx b/product_docs/docs/epas/11/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx index c6e0d50927c..818bd16ccd3 100644 --- a/product_docs/docs/epas/11/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx +++ b/product_docs/docs/epas/11/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx @@ -32,9 +32,7 @@ However, certain data types have alternative names referred to as *aliases*, whi For example, there are fixed length character data types that can be specified as `CHAR` or `CHARACTER`. There are variable length character data types that can be specified as `CHAR VARYING, CHARACTER VARYING, VARCHAR,` or `VARCHAR2`. For integers, there are `BINARY_INTEGER, PLS_INTEGER,` and `INTEGER` data types. For numbers, there are `NUMBER, NUMERIC, DEC,` and `DECIMAL` data types. -For detailed information about the data types supported by Advanced Server, see the *Database Compatibility for Oracle Developers Reference Guide*, available from EnterpriseDB at: - -[https://www.enterprisedb.com/docs](/epas/11/epas_compat_reference/) +For detailed information about the data types supported by Advanced Server, see [Data types](/epas/11/epas_compat_reference/02_the_sql_language/02_data_types/). Thus, when attempting to create overloaded subprograms, the formal parameter data types are not considered different if the specified data types are aliases of each other. diff --git a/product_docs/docs/epas/12/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx b/product_docs/docs/epas/12/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx index 8e239d8bab2..779219fa001 100644 --- a/product_docs/docs/epas/12/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx +++ b/product_docs/docs/epas/12/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx @@ -32,9 +32,7 @@ However, certain data types have alternative names referred to as *aliases*, whi For example, there are fixed length character data types that can be specified as `CHAR` or `CHARACTER`. There are variable length character data types that can be specified as `CHAR VARYING, CHARACTER VARYING, VARCHAR,` or `VARCHAR2`. For integers, there are `BINARY_INTEGER, PLS_INTEGER,` and `INTEGER` data types. For numbers, there are `NUMBER, NUMERIC, DEC,` and `DECIMAL` data types. -For detailed information about the data types supported by Advanced Server, see the *Database Compatibility for Oracle Developers Reference Guide*, available from EnterpriseDB at: - -[https://www.enterprisedb.com/docs](/epas/latest/epas_compat_reference/) +For detailed information about the data types supported by Advanced Server, see [Data types](/epas/12/epas_compat_reference/02_the_sql_language/02_data_types/). Thus, when attempting to create overloaded subprograms, the formal parameter data types are not considered different if the specified data types are aliases of each other. diff --git a/product_docs/docs/epas/13/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx b/product_docs/docs/epas/13/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx index 50e1b1bec12..db351440acd 100644 --- a/product_docs/docs/epas/13/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx +++ b/product_docs/docs/epas/13/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx @@ -35,9 +35,7 @@ However, certain data types have alternative names referred to as *aliases*, whi For example, there are fixed length character data types that can be specified as `CHAR` or `CHARACTER`. There are variable length character data types that can be specified as `CHAR VARYING, CHARACTER VARYING, VARCHAR,` or `VARCHAR2`. For integers, there are `BINARY_INTEGER, PLS_INTEGER,` and `INTEGER` data types. For numbers, there are `NUMBER, NUMERIC, DEC,` and `DECIMAL` data types. -For detailed information about the data types supported by Advanced Server, see the *Database Compatibility for Oracle Developers Reference Guide*, available from EDB at: - -[https://www.enterprisedb.com/docs](/epas/latest/epas_compat_reference/) +For detailed information about the data types supported by Advanced Server, see [Data types](/epas/13/epas_compat_reference/02_the_sql_language/02_data_types/). Thus, when attempting to create overloaded subprograms, the formal parameter data types are not considered different if the specified data types are aliases of each other. diff --git a/product_docs/docs/epas/14/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx b/product_docs/docs/epas/14/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx index 8083f48edca..bd8bd10f401 100644 --- a/product_docs/docs/epas/14/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx +++ b/product_docs/docs/epas/14/epas_compat_spl/02_spl_programs/07_subprograms_subprocedures_and_subfunctions/06_overloading_subprograms.mdx @@ -31,9 +31,7 @@ However, certain data types have alternative names referred to as *aliases*, whi For example, there are fixed length character data types that can be specified as `CHAR` or `CHARACTER`. There are variable length character data types that can be specified as `CHAR VARYING, CHARACTER VARYING, VARCHAR,` or `VARCHAR2`. For integers, there are `BINARY_INTEGER, PLS_INTEGER,` and `INTEGER` data types. For numbers, there are `NUMBER, NUMERIC, DEC,` and `DECIMAL` data types. -For detailed information about the data types supported by EDB Postgres Advanced Server, see the *Database Compatibility for Oracle Developers Reference Guide,* available from EDB at: - -[https://www.enterprisedb.com/docs](/epas/latest/epas_compat_reference/) +For detailed information about the data types supported by EDB Postgres Advanced Server, see [Data types](/epas/latest/epas_compat_reference/02_the_sql_language/02_data_types/). Thus, when attempting to create overloaded subprograms, the formal parameter data types are not considered different if the specified data types are aliases of each other. diff --git a/product_docs/docs/pgd/4/bdr/index.mdx b/product_docs/docs/pgd/4/bdr/index.mdx index 43d88128cd7..1dcce6d955d 100644 --- a/product_docs/docs/pgd/4/bdr/index.mdx +++ b/product_docs/docs/pgd/4/bdr/index.mdx @@ -2,7 +2,6 @@ title: BDR (Bi-Directional Replication) navTitle: BDR navigation: - - overview - appusage - configuration - nodes @@ -25,19 +24,138 @@ navigation: - twophase - catalogs - functions +redirects: +- /pgd/4/bdr/overview + --- -## Overview BDR is a PostgreSQL extension providing multi-master replication and data distribution with advanced conflict management, data-loss protection, and throughput up to 5X faster than native logical replication, and enables -distributed PostgreSQL clusters with high availability up to five 9s. +distributed Postgres clusters with high availability up to five 9s. + +BDR provides loosely coupled, multi-master logical replication +using a mesh topology. This means that you can write to any server and the +changes are sent directly, row-by-row, to all the +other servers that are part of the same BDR group. + +By default, BDR uses asynchronous replication, applying changes on +the peer nodes only after the local commit. Multiple synchronous replication +options are also available. + +## Basic architecture + +### Multiple groups + +A BDR node is a member of at least one *node group*, and in the most +basic architecture there is a single node group for the whole BDR +cluster. + +### Multiple masters + +Each node (database) participating in a BDR group both receives +changes from other members and can be written to directly by the user. + +This is distinct from hot or warm standby, where only one master +server accepts writes, and all the other nodes are standbys that +replicate either from the master or from another standby. + +You don't have to write to all the masters all of the time. +A frequent configuration directs writes mostly to just one master. + +### Asynchronous, by default + +Changes made on one BDR node aren't replicated to other nodes until +they're committed locally. As a result, the data isn't exactly the +same on all nodes at any given time. Some nodes have data that +hasn't yet arrived at other nodes. PostgreSQL's block-based replication +solutions default to asynchronous replication as well. In BDR, +because there are multiple masters and, as a result, multiple data streams, +data on different nodes might differ even when +`synchronous_commit` and `synchronous_standby_names` are used. + +### Mesh topology + +BDR is structured around a mesh network where every node connects to every +other node and all nodes exchange data directly with each other. There's no +forwarding of data in BDR except in special circumstances such as adding and removing nodes. +Data can arrive from outside the EDB Postgres Distributed cluster or +be sent onwards using native PostgreSQL logical replication. + +### Logical replication + +Logical replication is a method of replicating data rows and their changes +based on their replication identity (usually a primary key). +We use the term *logical* in contrast to *physical* replication, which uses +exact block addresses and byte-by-byte replication. Index changes aren't +replicated, thereby avoiding write amplification and reducing bandwidth. + +Logical replication starts by copying a snapshot of the data from the +source node. Once that is done, later commits are sent to other nodes as +they occur in real time. Changes are replicated without re-executing SQL, +so the exact data written is replicated quickly and accurately. + +Nodes apply data in the order in which commits were made on the source node, +ensuring transactional consistency is guaranteed for the changes from +any single node. Changes from different nodes are applied independently of +other nodes to ensure the rapid replication of changes. + +Replicated data is sent in binary form, when it's safe to do so. + +### High availability + +Each master node can be protected by one or more standby nodes, so any node +that goes down can be quickly replaced and continue. Each standby node can +be either a logical or a physical standby node. + +Replication continues between currently connected nodes even if one or more +nodes are currently unavailable. When the node recovers, replication +can restart from where it left off without missing any changes. + +Nodes can run different release levels, negotiating the required protocols +to communicate. As a result, EDB Postgres Distributed clusters can use rolling upgrades, even +for major versions of database software. + +DDL is replicated across nodes by default. DDL execution can +be user controlled to allow rolling application upgrades, if desired. + +## Architectural options and performance + +### Always On architectures + +A number of different architectures can be configured, each of which has +different performance and scalability characteristics. + +The group is the basic building block consisting of 2+ nodes +(servers). In a group, each node is in a different availability zone, with dedicated router +and backup, giving immediate switchover and high availability. Each group has a +dedicated replication set defined on it. If the group loses a node, you can easily +repair or replace it by copying an existing node from the group. -Detailed overview about how BDR works is described in the -[Architectural Overview](overview) chapter. +The Always On architectures are built from either one group in a single location +or two groups in two separate locaions. Each group provides HA and IS. When two +groups are leveraged in remote locations, they together also provide disaster recovery (DR). -## Supported PostgreSQL database servers +Tables are created across both groups, so any change goes to all nodes, not just to +nodes in the local group. + +One node in each group is the target for the main application. All other nodes are described as +shadow nodes (or "read-write replica"), waiting to take over when needed. If a node +loses contact, we switch immediately to a shadow node to continue processing. If a +group fails, we can switch to the other group. Scalability isn't the goal of this +architecture. + +Since we write mainly to only one node, the possibility of contention between is +reduced to almost zero. As a result, performance impact is much reduced. + +Secondary applications might execute against the shadow nodes, although these are +reduced or interrupted if the main application begins using that node. + +In the future, one node will be elected as the main replicator to other groups, limiting CPU +overhead of replication as the cluster grows and minimizing the bandwidth to other groups. + +### Supported PostgreSQL database servers BDR is compatible with Postgres, EDB Postgres Extended Server, and EDB Postgres Advanced Server distributions and can be deployed as a @@ -59,3 +177,85 @@ patterns don't necessarily work as well in multi-node setup as they do on a single instance. There are also some limitations in what can be safely replicated in multi-node setting. [Application usage](appusage) goes into detail on how BDR behaves from an application development perspective. + +### Characteristics affecting BDR performance + +By default, BDR keeps one copy of each table on each node in the group, and any +changes propagate to all nodes in the group. + +Since copies of data are everywhere, SELECTs need only ever access the local node. +On a read-only cluster, performance on any one node isn't affected by the +number of nodes. Thus, adding nodes increases linearly the total possible SELECT +throughput. + +If an INSERT, UPDATE, and DELETE (DML) is performed locally, then the changes +propagate to all nodes in the group. The overhead of DML apply is less than the +original execution, so if you run a pure write workload on multiple nodes +concurrently, a multi-node cluster can handle more TPS than a single node. + +Conflict handling has a cost that acts to reduce the throughput. The throughput +then depends on how much contention the application displays in practice. +Applications with very low contention perform better than a single node. +Applications with high contention can perform worse than a single node. +These results are consistent with any multi-master technology. They aren't particular to BDR. + +Synchronous replilcation options can send changes concurrently to multiple nodes +so that the replication lag is minimized. Adding more nodes means using more CPU for +replication, so peak TPS reduces slightly as each node is added. + +If the workload tries to use all CPU resources, then this resource constrains +replication, which can then affect the replication lag. + +In summary, adding more master nodes to a BDR group doesn't result in significant write +throughput increase when most tables are replicated because all the writes will +be replayed on all nodes. Because BDR writes are in general more effective +than writes coming from Postgres clients by way of SQL, some performance increase +can be achieved. Read throughput generally scales linearly with the number of +nodes. + +## Deployment + +BDR is intended to be deployed in one of a small number of known-good configurations, +using either TPAexec or a configuration management approach +and deployment architecture approved by Technical Support. + +Manual deployment isn't recommended and might not be supported. + +Refer to the `TPAexec Architecture User Manual` for your architecture. + +Log messages and documentation are currently available only in English. + +## Clocks and timezones + +BDR is designed to operate with nodes in multiple timezones, allowing a +truly worldwide database cluster. Individual servers don't need to be configured +with matching timezones, although we do recommend using `log_timezone = UTC` to +ensure the human-readable server log is more accessible and comparable. + +Synchronize server clocks using NTP or other solutions. + +Clock synchronization isn't critical to performance, as it is with some +other solutions. Clock skew can impact origin conflict detection, although +BDR provides controls to report and manage any skew that exists. BDR also +provides row-version conflict detection, as described in [Conflict detection](conflicts). + + +## Limits + +BDR can run hundreds of nodes on good-enough hardware and network. However, +for mesh-based deployments, we generally don't recommend running more than +32 nodes in one cluster. +Each master node can be protected by multiple physical or logical standby nodes. +There's no specific limit on the number of standby nodes, +but typical usage is to have 2–3 standbys per master. Standby nodes don't +add connections to the mesh network, so they aren't included in the +32-node recommendation. + +BDR currently has a hard limit of no more than 1000 active nodes, as this is the +current maximum Raft connections allowed. + +BDR places a limit that at most 10 databases in any one PostgreSQL instance +can be BDR nodes across different BDR node groups. However, BDR works best if +you use only one BDR database per PostgreSQL instance. + +The minimum recommended number of nodes in a group is three to provide fault tolerance for BDR's consensus mechanism. With just two nodes, consensus would fail if one of the nodes was unresponsive. Consensus is required for some BDR operations such as distributed sequence generation. For more information about the consensus mechanism used by EDB Postgres Distributed, see [Architectural details](/pgd/latest/architectures/#architecture-details). diff --git a/product_docs/docs/pgd/4/bdr/overview.mdx b/product_docs/docs/pgd/4/bdr/overview.mdx deleted file mode 100644 index 378ec54ab83..00000000000 --- a/product_docs/docs/pgd/4/bdr/overview.mdx +++ /dev/null @@ -1,223 +0,0 @@ ---- -navTitle: Overview -title: Architectural overview - - ---- - -BDR provides loosely coupled, multi-master logical replication -using a mesh topology. This means that you can write to any server and the -changes are sent directly, row-by-row, to all the -other servers that are part of the same BDR group. - -![node diagram](img/nodes.png) - -By default, BDR uses asynchronous replication, applying changes on -the peer nodes only after the local commit. An optional -[Eager All-Node Replication](eager) feature allows for committing -on all nodes using consensus. - -## Basic architecture - -### Multiple groups - -A BDR node is a member of at least one *node group*, and in the most -basic architecture there is a single node group for the whole BDR -cluster. - -### Multiple masters - -Each node (database) participating in a BDR group both receives -changes from other members and can be written to directly by the user. - -This is distinct from hot or warm standby, where only one master -server accepts writes, and all the other nodes are standbys that -replicate either from the master or from another standby. - -You don't have to write to all the masters all of the time. -A frequent configuration directs writes mostly to just one master. - -### Asynchronous, by default - -Changes made on one BDR node aren't replicated to other nodes until -they're committed locally. As a result, the data isn't exactly the -same on all nodes at any given time. Some nodes have data that -hasn't yet arrived at other nodes. PostgreSQL's block-based replication -solutions default to asynchronous replication as well. In BDR, -because there are multiple masters and, as a result, multiple data streams, -data on different nodes might differ even when -`synchronous_commit` and `synchronous_standby_names` are used. - -### Mesh topology - -BDR is structured around a mesh network where every node connects to every -other node and all nodes exchange data directly with each other. There's no -forwarding of data in BDR except in special circumstances such as adding and removing nodes. -Data can arrive from outside the EDB Postgres Distributed cluster or -be sent onwards using native PostgreSQL logical replication. - -### Logical replication - -Logical replication is a method of replicating data rows and their changes -based on their replication identity (usually a primary key). -We use the term *logical* in contrast to *physical* replication, which uses -exact block addresses and byte-by-byte replication. Index changes aren't -replicated, thereby avoiding write amplification and reducing bandwidth. - -Logical replication starts by copying a snapshot of the data from the -source node. Once that is done, later commits are sent to other nodes as -they occur in real time. Changes are replicated without re-executing SQL, -so the exact data written is replicated quickly and accurately. - -Nodes apply data in the order in which commits were made on the source node, -ensuring transactional consistency is guaranteed for the changes from -any single node. Changes from different nodes are applied independently of -other nodes to ensure the rapid replication of changes. - -Replicated data is sent in binary form, when it's safe to do so. - -### High availability - -Each master node can be protected by one or more standby nodes, so any node -that goes down can be quickly replaced and continue. Each standby node can -be either a logical or a physical standby node. - -Replication continues between currently connected nodes even if one or more -nodes are currently unavailable. When the node recovers, replication -can restart from where it left off without missing any changes. - -Nodes can run different release levels, negotiating the required protocols -to communicate. As a result, EDB Postgres Distributed clusters can use rolling upgrades, even -for major versions of database software. - -DDL is replicated across nodes by default. DDL execution can -be user controlled to allow rolling application upgrades, if desired. - -### Limits - -BDR can run hundreds of nodes on good-enough hardware and network. However, -for mesh-based deployments, we generally don't recommend running more than -32 nodes in one cluster. -Each master node can be protected by multiple physical or logical standby nodes. -There's no specific limit on the number of standby nodes, -but typical usage is to have 2–3 standbys per master. Standby nodes don't -add connections to the mesh network, so they aren't included in the -32-node recommendation. - -BDR currently has a hard limit of no more than 1000 active nodes, as this is the -current maximum Raft connections allowed. - -BDR places a limit that at most 10 databases in any one PostgreSQL instance -can be BDR nodes across different BDR node groups. However, BDR works best if -you use only one BDR database per PostgreSQL instance. - -The minimum recommended number of nodes in a EDB Postgres Distributed cluster is three to provide fault tolerance for BDR's consensus mechanism. With just two nodes, consensus would fail if one of the nodes was unresponsive. Consensus is required for some BDR operations such as distributed sequence generation. For more information about the consensus mechanism used by EDB Postgres Distributed, see [Architectural details](/pgd/latest/architectures/#architecture-details). - -## Architectural options and performance - -### Characterizing BDR performance - -BDR can be configured in a number of different architectures, each of which has -different performance and scalability characteristics. - -The group is the basic building block of a BDR group consisting of 2+ nodes -(servers). In a group, each node is in a different availability zone, with dedicated router -and backup, giving immediate switchover and high availability. Each group has a -dedicated replication set defined on it. If the group loses a node, you can easily -repair or replace it by copying an existing node from the group. - -Adding more master nodes to a BDR group doesn't result in significant write -throughput increase when most tables are replicated because BDR has to replay -all the writes on all nodes. Because BDR writes are in general more effective -than writes coming from Postgres clients by way of SQL, some performance increase -can be achieved. Read throughput generally scales linearly with the number of -nodes. - -The following architectures are available: - -- Multi-master/single group -- BDR AlwaysOn - -The simplest architecture is to have just one group. - -### BDR multi-master in one group - -By default, BDR keeps one copy of each table on each node in the group, and any -changes propagate to all nodes in the group. - -Since copies of data are everywhere, SELECTs need only ever access the local node. -On a read-only cluster, performance on any one node isn't affected by the -number of nodes. Thus, adding nodes increases linearly the total possible SELECT -throughput. - -If an INSERT, UPDATE, and DELETE (DML) is performed locally, then the changes -propagate to all nodes in the group. The overhead of DML apply is less than the -original execution, so if you run a pure write workload on multiple nodes -concurrently, a multi-node cluster can handle more TPS than a single node. - -Conflict handling has a cost that acts to reduce the throughput. The throughput -then depends on how much contention the application displays in practice. -Applications with very low contention perform better than a single node. -Applications with high contention can perform worse than a single node. -These results are consistent with any multi-master technology. They aren't particular to BDR. - -Eager Replication can avoid conflicts but is inherently more expensive. - -Changes are sent concurrently to all nodes so that the replication lag is minimized. -Adding more nodes means using more CPU for replication, so peak TPS reduces -slightly as each node is added. - -If the workload tries to use all CPU resources, then this resource constrains -replication, which can then affect the replication lag. - -### BDR AlwaysOn - -The AlwaysOn architecture is built from two groups in two separate regions. Each group -provides HA and IS, but together they also provide disaster recovery (DR), so we refer -to this architecture as AlwaysOn with very high availability. - -Tables are created across both groups, so any change goes to all nodes, not just to -nodes in the local group. - -One node is the target for the main application. All other nodes are described as -shadow nodes (or "read-write replica"), waiting to take over when needed. If a node -loses contact, we switch immediately to a shadow node to continue processing. If a -group fails, we can switch to the other group. Scalability isn't the goal of this -architecture. - -Since we write mainly to only one node, the possibility of contention between is -reduced to almost zero. As a result, performance impact is much reduced. - -CAMO is Eager Replication in the local group, lazy with regard to other groups. - -Secondary applications might execute against the shadow nodes, although these are -reduced or interrupted if the main application begins using that node. - -In the future, one node will be elected as the main replicator to other groups, limiting CPU -overhead of replication as the cluster grows and minimizing the bandwidth to other groups. - -## Deployment - -BDR is intended to be deployed in one of a small number of known-good configurations, -using either TPAexec or a configuration management approach -and deployment architecture approved by Technical Support. - -Manual deployment isn't recommended and might not be supported. - -Refer to the `TPAexec Architecture User Manual` for your architecture. - -Log messages and documentation are currently available only in English. - -## Clocks and timezones - -BDR is designed to operate with nodes in multiple timezones, allowing a -truly worldwide database cluster. Individual servers don't need to be configured -with matching timezones, although we do recommend using `log_timezone = UTC` to -ensure the human-readable server log is more accessible and comparable. - -Synchronize server clocks using NTP or other solutions. - -Clock synchronization isn't critical to performance, as it is with some -other solutions. Clock skew can impact origin conflict detection, although -BDR provides controls to report and manage any skew that exists. BDR also -provides row-version conflict detection, as described in [Conflict detection](conflicts). diff --git a/product_docs/docs/pgd/4/deployments/tpaexec/quick_start.mdx b/product_docs/docs/pgd/4/deployments/tpaexec/quick_start.mdx index a201991f10a..63a29ee4889 100644 --- a/product_docs/docs/pgd/4/deployments/tpaexec/quick_start.mdx +++ b/product_docs/docs/pgd/4/deployments/tpaexec/quick_start.mdx @@ -28,18 +28,21 @@ architecture using Amazon EC2. ``` tpaexec provision myedbdpcluster ``` - Since we specified AWS as the platform (the default platform), TCAexec provisions EC2 instances, VPCs, subnets, routing tables, internet gateways, security groups, EBS volumes, elastic IPs, and so on. + Since we specified AWS as the platform (the default platform), TPAexec provisions EC2 instances, VPCs, subnets, routing tables, internet gateways, security groups, EBS volumes, elastic IPs, and so on. -1. Deploy the needed packages, configuration and setup the actual EDB Postgres Distributed cluster: +1. Deploy the cluster: ``` tpaexec deploy myedbdpcluster ``` + TPAexec installs the needed packages, appliies the configuration and sets up the actual EDB Postgres Distributed cluster + +1. Test the cluster: + + After the successful run of the `deploy` command the cluster is ready to use. You can connect to it via `psql` or any other database client. -After the successful run of the `deploy` command the cluster is ready to use. You can connect to it via `psql` or any other database client. - -It's also possible to run a test that ensures the cluster is running as expected: -``` -tpaexec test myedbdpcluster -``` + It's also possible to run a test that ensures the cluster is running as expected: + ``` + tpaexec test myedbdpcluster + ``` diff --git a/product_docs/docs/pgd/4/harp/02_overview.mdx b/product_docs/docs/pgd/4/harp/02_overview.mdx deleted file mode 100644 index 766b525fed4..00000000000 --- a/product_docs/docs/pgd/4/harp/02_overview.mdx +++ /dev/null @@ -1,246 +0,0 @@ ---- -navTitle: Overview -title: HARP functionality overview ---- - -HARP is a new approach to high availability for BDR -clusters. It leverages a consensus-driven quorum to determine the correct connection endpoint -in a semi-exclusive manner to prevent unintended multi-node writes from an -application. - -## The importance of quorum - -The central purpose of HARP is to enforce full quorum on any Postgres cluster -it manages. Quorum is a term applied to a voting body that -mandates a certain minimum of attendees are available to make a decision. More simply: majority rules. - -For any vote to end in a result other than a tie, an odd number of -nodes must constitute the full cluster membership. Quorum, however, doesn't -strictly demand this restriction; a simple majority is enough. This means -that in a cluster of N nodes, quorum requires a minimum of N/2+1 nodes to hold -a meaningful vote. - -All of this ensures the cluster is always in agreement regarding the node -that is "in charge." For a EDB Postgres Distributed cluster consisting of multiple nodes, this -determines the node that is the primary write target. HARP designates this node -as the lead master. - -## Reducing write targets - -The consequence of ignoring the concept of quorum, or not applying it -well enough, can lead to a "split brain" scenario where the "correct" write -target is ambiguous or unknowable. In a standard Postgres cluster, it's -important that only a single node is ever writable and sending replication -traffic to the remaining nodes. - -Even in multi-master-capable approaches such as BDR, it can be helpful to -reduce the amount of necessary conflict management to derive identical data -across the cluster. In clusters that consist of multiple BDR nodes per physical -location or region, this usually means a single BDR node acts as a "leader" and -remaining nodes are "shadow." These shadow nodes are still writable, but writing to them is discouraged unless absolutely necessary. - -By leveraging quorum, it's possible for all nodes to agree on the exact -Postgres node to represent the entire cluster or a local BDR region. Any -nodes that lose contact with the remainder of the quorum, or are overruled by -it, by definition can't become the cluster leader. - -This restriction prevents split-brain situations where writes unintentionally reach two -Postgres nodes. Unlike technologies such as VPNs, proxies, load balancers, or -DNS, you can't circumvent a quorum-derived consensus by misconfiguration or -network partitions. So long as it's possible to contact the consensus layer to -determine the state of the quorum maintained by HARP, only one target is ever -valid. - -## Basic architecture - -The design of HARP comes in essentially two parts, consisting of a manager and -a proxy. The following diagram describes how these interact with a single -Postgres instance: - -![HARP Unit](images/ha-unit.png) - -The consensus layer is an external entity where Harp Manager maintains -information it learns about its assigned Postgres node, and HARP Proxy -translates this information to a valid Postgres node target. Because Proxy -obtains the node target from the consensus layer, several such instances can -exist independently. - -While using BDR as the consensus layer, each server node resembles this -variant instead: - -![HARP Unit w/BDR Consensus](images/ha-unit-bdr.png) - -In either case, each unit consists of the following elements: - -* A Postgres or EDB instance -* A consensus layer resource, meant to track various attributes of the Postgres - instance -* A HARP Manager process to convey the state of the Postgres node to the - consensus layer -* A HARP Proxy service that directs traffic to the proper lead master node, - as derived from the consensus layer - -Not every application stack has access to additional node resources -specifically for the Proxy component, so it can be combined with the -application server to simplify the stack. - -This is a typical design using two BDR nodes in a single data center organized in a lead master/shadow master configuration: - -![HARP Cluster](images/ha-ao.png) - -When using BDR as the HARP consensus layer, at least three -fully qualified BDR nodes must be present to ensure a quorum majority. (Not shown in the diagram are connections between BDR nodes.) - -![HARP Cluster w/BDR Consensus](images/ha-ao-bdr.png) - -## How it works - -When managing a EDB Postgres Distributed cluster, HARP maintains at most one leader node per -defined location. This is referred to as the lead master. Other BDR -nodes that are eligible to take this position are shadow master state until they take the leader role. - -Applications can contact the current leader only through the proxy service. -Since the consensus layer requires quorum agreement before conveying leader -state, proxy services direct traffic to that node. - -At a high level, this mechanism prevents simultaneous application interaction with -multiple nodes. - -### Determining a leader - -As an example, consider the role of lead master in a locally subdivided -BDR Always-On group as can exist in a single data center. When any -Postgres or Manager resource is started, and after a configurable refresh -interval, the following must occur: - -1. The Manager checks the status of its assigned Postgres resource. - - If Postgres isn't running, try again after configurable timeout. - - If Postgres is running, continue. -2. The Manager checks the status of the leader lease in the consensus layer. - - If the lease is unclaimed, acquire it and assign the identity of - the Postgres instance assigned to this manager. This lease duration is - configurable, but setting it too low can result in unexpected leadership - transitions. - - If the lease is already claimed by us, renew the lease TTL. - - Otherwise do nothing. - -A lot more occurs, but this simplified version explains -what's happening. The leader lease can be held by only one node, and if it's -held elsewhere, HARP Manager gives up and tries again later. - -!!! Note - Depending on the chosen consensus layer, rather than repeatedly looping to - check the status of the leader lease, HARP subscribes to notifications. In this case, it can respond immediately any time the state of the - lease changes rather than polling. Currently this functionality is - restricted to the etcd consensus layer. - -This means HARP itself doesn't hold elections or manage quorum, which is -delegated to the consensus layer. A quorum of the consensus layer must acknowledge the act of obtaining the lease, so if the request succeeds, -that node leads the cluster in that location. - -### Connection routing - -Once the role of the lead master is established, connections are handled -with a similar deterministic result as reflected by HARP Proxy. Consider a case -where HARP Proxy needs to determine the connection target for a particular backend -resource: - -1. HARP Proxy interrogates the consensus layer for the current lead master in - its configured location. -2. If this is unset or in transition: - - New client connections to Postgres are barred, but clients - accumulate and are in a paused state until a lead master appears. - - Existing client connections are allowed to complete current transactions - and are then reverted to a similar pending state as new connections. -3. Client connections are forwarded to the lead master. - -The interplay shown in this case doesn't require any -interaction with either HARP Manager or Postgres. The consensus layer -is the source of all truth from the proxy's perspective. - -### Colocation - -The arrangement of the work units is such that their organization must follow these principles: - -1. The manager and Postgres units must exist concomitantly in the same - node. -2. The contents of the consensus layer dictate the prescriptive role of all - operational work units. - -This arrangement delegates cluster quorum responsibilities to the consensus layer, -while HARP leverages it for critical role assignments and key/value storage. -Neither storage nor retrieval succeeds if the consensus layer is inoperable -or unreachable, thus preventing rogue Postgres nodes from accepting -connections. - -As a result, the consensus layer generally exists outside of HARP or HARP-managed nodes for maximum safety. Our reference diagrams show this separation, although it isn't required. - -!!! Note - To operate and manage cluster state, BDR contains its own - implementation of the Raft Consensus model. You can configure HARP to - leverage this same layer to reduce reliance on external dependencies and - to preserve server resources. However, certain drawbacks to this - approach are discussed in - [Consensus layer](09_consensus-layer). - -## Recommended architecture and use - -HARP was primarily designed to represent a BDR Always-On architecture that -resides in two or more data centers and consists of at least five BDR -nodes. This configuration doesn't count any logical standby nodes. - -The following diagram shows the current and standard representation: - -![BDR Always-On Reference Architecture](images/bdr-ao-spec.png) - -In this diagram, HARP Manager exists on BDR Nodes 1-4. The initial state -of the cluster is that BDR Node 1 is the lead master of DC A, and BDR -Node 3 is the lead master of DC B. - -This configuration results in any HARP Proxy resource in DC A connecting to BDR Node 1 -and the HARP Proxy resource in DC B connecting to BDR Node 3. - -!!! Note - While this diagram shows only a single HARP Proxy per DC, this is - an example only and should not be considered a single point of failure. Any - number of HARP Proxy nodes can exist, and they all direct application - traffic to the same node. - -### Location configuration - -For multiple BDR nodes to be eligible to take the lead master lock in -a location, you must define a location in the `config.yml` configuration -file. - -To reproduce the BDR Always-On reference architecture shown in the diagram, include these lines in the `config.yml` -configuration for BDR Nodes 1 and 2: - -```yaml -location: dca -``` - -For BDR Nodes 3 and 4, add: - -```yaml -location: dcb -``` - -This applies to any HARP Proxy nodes that are designated in those respective -data centers as well. - -### BDR 3.7 compatibility - -BDR 3.7 and later offers more direct location definition by assigning a -location to the BDR node. This is done by calling the following SQL -API function while connected to the BDR node. So for BDR Nodes 1 and 2, you -might do this: - -```sql -SELECT bdr.set_node_location('dca'); -``` - -And for BDR Nodes 3 and 4: - -```sql -SELECT bdr.set_node_location('dcb'); -``` diff --git a/product_docs/docs/pgd/4/harp/index.mdx b/product_docs/docs/pgd/4/harp/index.mdx index acfd2e51864..72cc03969cf 100644 --- a/product_docs/docs/pgd/4/harp/index.mdx +++ b/product_docs/docs/pgd/4/harp/index.mdx @@ -2,21 +2,242 @@ navTitle: HARP title: "High Availability Routing for Postgres (HARP)" directoryDefaults: - description: "High Availability Routing for Postgres (HARP) is a cluster-management tool for Bi-directional Replication (BDR) clusters." + description: "High Availability Routing for Postgres (HARP) is a cluster-management tool for EDB Postgres Distributed clusters." +redirects: +- /pgd/4/harp/02_overview --- -High Availability Routing for Postgres (HARP) is a cluster-management tool for -[Bi-directional Replication (BDR)](../bdr/) clusters. The core design of -the tool is to route all application traffic in a single data center or -region to only one node at a time. This node, designated the lead master, acts -as the principle write target to reduce the potential for data conflicts. +High Availability Routing for Postgres (HARP) is new approach for managing high availabiliity for +EDB Postgres Distributed clusters versions 3.6 or later. All application traffic within a single location +(data center or region) is routed to only one BDR node at a time in a semi-exlusive manner. This node, +designated the lead master, acts as the principle write target to reduce the potential for data conflicts. -HARP leverages a distributed consensus model to determine availability of the -BDR nodes in the cluster. On failure or unavailability of the lead master, HARP -elects a new lead master and redirects application traffic. +HARP leverages a distributed consensus model to determine availability of the BDR nodes in the cluster. +On failure or unavailability of the lead master, HARP elects a new lead master and redirects application traffic. -Together with the core capabilities of BDR, this mechanism of routing +Together with the core capabilities of the BDR extension, this mechanism of routing application traffic to the lead master node enables fast failover and switchover without risk of data loss. -HARP requires BDR versions 3.6 and later. + +## The importance of quorum + +The central purpose of HARP is to enforce full quorum on any EDB Postgres Distriibuted cluster +it manages. Quorum is a term applied to a voting body that mandates a certain minimum of attendees +are available to make a decision. More simply: majority rules. + +For any vote to end in a result other than a tie, an odd number of +nodes must constitute the full cluster membership. Quorum, however, doesn't +strictly demand this restriction; a simple majority is enough. This means +that in a cluster of N nodes, quorum requires a minimum of N/2+1 nodes to hold +a meaningful vote. + +All of this ensures the cluster is always in agreement regarding the node +that is "in charge." For a EDB Postgres Distributed cluster consisting of multiple nodes, this +determines the node that is the primary write target. HARP designates this node +as the lead master. + +## Reducing write targets + +The consequence of ignoring the concept of quorum, or not applying it +well enough, can lead to a "split brain" scenario where the "correct" write +target is ambiguous or unknowable. In a standard EDB Postgres Distributed cluster, it's +important that only a single node is ever writable and sending replication +traffic to the remaining nodes. + +Even in multi-master-capable approaches such as BDR, it can be helpful to +reduce the amount of necessary conflict management to derive identical data +across the cluster. In clusters that consist of multiple BDR nodes per physical +location or region, this usually means a single BDR node acts as a "leader" and +remaining nodes are "shadow." These shadow nodes are still writable, but writing to +them is discouraged unless absolutely necessary. + +By leveraging quorum, it's possible for all nodes to agree on the exact +Postgres node to represent the entire cluster or a local BDR region. Any +nodes that lose contact with the remainder of the quorum, or are overruled by +it, by definition can't become the cluster leader. + +This restriction prevents split-brain situations where writes unintentionally reach two +Postgres nodes. Unlike technologies such as VPNs, proxies, load balancers, or +DNS, you can't circumvent a quorum-derived consensus by misconfiguration or +network partitions. So long as it's possible to contact the consensus layer to +determine the state of the quorum maintained by HARP, only one target is ever +valid. + +## Basic architecture + +The design of HARP comes in essentially two parts, consisting of a manager and +a proxy. The following diagram describes how these interact with a single +Postgres instance: + +![HARP Unit](images/ha-unit.png) + +The consensus layer is an external entity where Harp Manager maintains +information it learns about its assigned Postgres node, and HARP Proxy +translates this information to a valid Postgres node target. Because Proxy +obtains the node target from the consensus layer, several such instances can +exist independently. + +While using BDR as the consensus layer, each server node resembles this +variant instead: + +![HARP Unit w/BDR Consensus](images/ha-unit-bdr.png) + +In either case, each unit consists of the following elements: + +* A Postgres instance +* A consensus layer resource, meant to track various attributes of the Postgres + instance +* A HARP Manager process to convey the state of the Postgres node to the + consensus layer +* A HARP Proxy service that directs traffic to the proper lead master node, + as derived from the consensus layer + +Not every application stack has access to additional node resources +specifically for the Proxy component, so it can be combined with the +application server to simplify the stack. + +This is a typical design using two BDR nodes in a single data center organized in a lead master/shadow master configuration: + +![HARP Cluster](images/ha-ao.png) + +When using BDR as the HARP consensus layer, at least three +fully qualified BDR nodes must be present to ensure a quorum majority. (Not shown in the diagram are connections between BDR nodes.) + +![HARP Cluster w/BDR Consensus](images/ha-ao-bdr.png) + +## How it works + +When managing a EDB Postgres Distributed cluster, HARP maintains at most one leader node per +defined location. This is referred to as the lead master. Other BDR +nodes that are eligible to take this position are shadow master state until they take the leader role. + +Applications can contact the current leader only through the proxy service. +Since the consensus layer requires quorum agreement before conveying leader +state, proxy services direct traffic to that node. + +At a high level, this mechanism prevents simultaneous application interaction with +multiple nodes. + +### Determining a leader + +As an example, consider the role of lead master in a locally subdivided +BDR Always-On group as can exist in a single data center. When any +Postgres or Manager resource is started, and after a configurable refresh +interval, the following must occur: + +1. The Manager checks the status of its assigned Postgres resource. + - If Postgres isn't running, try again after configurable timeout. + - If Postgres is running, continue. +2. The Manager checks the status of the leader lease in the consensus layer. + - If the lease is unclaimed, acquire it and assign the identity of + the Postgres instance assigned to this manager. This lease duration is + configurable, but setting it too low can result in unexpected leadership + transitions. + - If the lease is already claimed by us, renew the lease TTL. + - Otherwise do nothing. + +A lot more occurs, but this simplified version explains +what's happening. The leader lease can be held by only one node, and if it's +held elsewhere, HARP Manager gives up and tries again later. + +!!! Note + Depending on the chosen consensus layer, rather than repeatedly looping to + check the status of the leader lease, HARP subscribes to notifications. In this case, it can respond immediately any time the state of the + lease changes rather than polling. Currently this functionality is + restricted to the etcd consensus layer. + +This means HARP itself doesn't hold elections or manage quorum, which is +delegated to the consensus layer. A quorum of the consensus layer must acknowledge the act of obtaining the lease, so if the request succeeds, +that node leads the cluster in that location. + +### Connection routing + +Once the role of the lead master is established, connections are handled +with a similar deterministic result as reflected by HARP Proxy. Consider a case +where HARP Proxy needs to determine the connection target for a particular backend +resource: + +1. HARP Proxy interrogates the consensus layer for the current lead master in + its configured location. +2. If this is unset or in transition: + - New client connections to Postgres are barred, but clients + accumulate and are in a paused state until a lead master appears. + - Existing client connections are allowed to complete current transactions + and are then reverted to a similar pending state as new connections. +3. Client connections are forwarded to the lead master. + +The interplay shown in this case doesn't require any +interaction with either HARP Manager or Postgres. The consensus layer +is the source of all truth from the proxy's perspective. + +### Colocation + +The arrangement of the work units is such that their organization must follow these principles: + +1. The manager and Postgres units must exist concomitantly in the same + node. +2. The contents of the consensus layer dictate the prescriptive role of all + operational work units. + +This arrangement delegates cluster quorum responsibilities to the consensus layer, +while HARP leverages it for critical role assignments and key/value storage. +Neither storage nor retrieval succeeds if the consensus layer is inoperable +or unreachable, thus preventing rogue Postgres nodes from accepting +connections. + +As a result, the consensus layer generally exists outside of HARP or HARP-managed nodes for maximum safety. Our reference diagrams show this separation, although it isn't required. + +!!! Note + To operate and manage cluster state, BDR contains its own + implementation of the Raft Consensus model. You can configure HARP to + leverage this same layer to reduce reliance on external dependencies and + to preserve server resources. However, certain drawbacks to this + approach are discussed in + [Consensus layer](09_consensus-layer). + +## Recommended architecture and use + +HARP was primarily designed to represent a BDR Always On architecture that +resides in two or more data centers and consists of at least five BDR +nodes. This configuration doesn't count any logical standby nodes. + +The following diagram shows the current and standard representation: + +![BDR Always On Reference Architecture](images/bdr-ao-spec.png) + +In this diagram, HARP Manager exists on BDR Nodes 1-4. The initial state +of the cluster is that BDR Node 1 is the lead master of DC A, and BDR +Node 3 is the lead master of DC B. + +This configuration results in any HARP Proxy resource in DC A connecting to BDR Node 1 +and the HARP Proxy resource in DC B connecting to BDR Node 3. + +!!! Note + While this diagram shows only a single HARP Proxy per DC, this is + an example only and should not be considered a single point of failure. Any + number of HARP Proxy nodes can exist, and they all direct application + traffic to the same node. + +### Location configuration + +For multiple BDR nodes to be eligible to take the lead master lock in +a location, you must define a location in the `config.yml` configuration +file. + +To reproduce the BDR Always-On reference architecture shown in the diagram, include these lines in the `config.yml` +configuration for BDR Nodes 1 and 2: + +```yaml +location: dca +``` + +For BDR Nodes 3 and 4, add: + +```yaml +location: dcb +``` + +This applies to any HARP Proxy nodes that are designated in those respective +data centers as well. +