Skip to content

Commit

Permalink
Merge pull request #3144 from EnterpriseDB/release/2022-09-09
Browse files Browse the repository at this point in the history
Release: 2022-09-09
  • Loading branch information
drothery-edb authored Sep 9, 2022
2 parents ab03aa2 + b09ffd2 commit 1a901a6
Show file tree
Hide file tree
Showing 19 changed files with 545 additions and 600 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Each node in a Failover Manager cluster has a properties file (by default, named
After completing the Failover Manager installation, make a working copy of the template before modifying the file contents:

```text
# cp /etc/edb/efm-4.4/efm.properties.in /etc/edb/efm-4.4/efm.properties
# cp /etc/edb/efm-4.5/efm.properties.in /etc/edb/efm-4.5/efm.properties
```

After copying the template file, change the owner of the file to efm:
Expand Down Expand Up @@ -438,10 +438,10 @@ Set the `is.witness` property to `true` to indicate that the current node is a w
is.witness=
```

The EDB Postgres Advanced Server `pg_is_in_recovery()` function is a Boolean function that reports the recovery state of a database. The function returns `true` if the database is in recovery or `false` if the database isn't in recovery. When an agent starts, it connects to the local database and invokes the `pg_is_in_recovery()` function.
The EDB Postgres Advanced Server `pg_is_in_recovery()` function is a Boolean function that reports the recovery state of a database. The function returns `true` if the database is in recovery or `false` if the database isn't in recovery. When an agent starts, it connects to the local database and invokes the `pg_is_in_recovery()` function.

- If the server responds true, the agent assumes the role of standby.
- If the server responds false, the agent assumes the role of primary.
- If the server responds true, the agent assumes the role of standby.
- If the server responds false, the agent assumes the role of primary.
- If there's no local database, the agent assumes an idle state.

!!! Note
Expand Down Expand Up @@ -904,7 +904,7 @@ restart.connection.timeout=60

<div id="auto_resume_period" class="registered_link"></div>

Use the `auto.resume.period` property to specify the number of seconds for an agent to attempt to resume monitoring that database. This property applies after a monitored database fails and an agent has assumed an idle state or when starting in IDLE mode.
Use the `auto.resume.period` property to specify the number of seconds for an agent to attempt to resume monitoring that database. This property applies after a monitored database fails and an agent has assumed an idle state or when starting in IDLE mode.

```ini
# Period in seconds for IDLE agents to try to resume monitoring
Expand Down Expand Up @@ -972,7 +972,7 @@ check.vip.before.promotion=true

<div id="pgpool_enable" class="registered_link"></div>

Use the `pgpool.enable` property to specify if you want to enable the Failover Manager and Pgpool integration for high availability. If you want to enable Pgpool integration in a non-sudo mode (running as the DB owner), the PCPPASS file must be owned by the DB owner operating system user and you must set the file permissions to 600.
Use the `pgpool.enable` property to specify if you want to enable the Failover Manager and Pgpool integration for high availability. If you want to enable Pgpool integration in a non-sudo mode (running as the DB owner), the PCPPASS file must be owned by the DB owner operating system user and you must set the file permissions to 600.


```ini
Expand Down Expand Up @@ -1036,13 +1036,13 @@ script.load.balancer.detach=
Use the `detach.on.agent.failure` property to indicate that you don't want to detach a node from the load balancer in a scenario where the primary agent fails but the database is still reachable. The default value is `true.`

```ini
# If set to true, Failover Manager will detach the node from load
# balancer if the primary agent fails but the database is still
# reachable. In most scenarios this is NOT the desired situation. In
# scenarios where the detach script should run with a failed primary
# agent, even when the primary database is still healthy this parameter
# should be set to true. If no value specified it defaults to true (for
# backwards compatibility).
# If set to true, Failover Manager will detach the node from load
# balancer if the primary agent fails but the database is still
# reachable. In most scenarios this is NOT the desired situation. In
# scenarios where the detach script should run with a failed primary
# agent, even when the primary database is still healthy this parameter
# should be set to true. If no value specified it defaults to true (for
# backwards compatibility).
# This is not applicable for standbys.
detach.on.agent.failure=
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ This example shows using the `encrypt` utility to encrypt a password for the `ac
# efm encrypt acctg
This utility will generate an encrypted password for you to place in
your Failover Manager cluster property file:
/etc/edb/efm-4.4/acctg.properties
/etc/edb/efm-4.5/acctg.properties
Please enter the password and hit enter:
Please enter the password again to confirm:
The encrypted password is: 516b36fb8031da17cfbc010f7d09359c
Expand All @@ -49,16 +49,16 @@ db.password.encrypted=516b36fb8031da17cfbc010f7d09359c
After receiving your encrypted password, paste the password into the properties file and start the Failover Manager service. If there's a problem with the encrypted password, the Failover Manager service doesn't start:

```text
[witness@localhost ~]# systemctl start edb-efm-4.4
Job for edb-efm-4.4.service failed because the control process exited with error code. See "systemctl status edb-efm-4.4.service" and "journalctl -xe" for details.
[witness@localhost ~]# systemctl start edb-efm-4.5
Job for edb-efm-4.5.service failed because the control process exited with error code. See "systemctl status edb-efm-4.5.service" and "journalctl -xe" for details.
```

If you receive this message when starting the Failover Manager service, see the startup log `/var/log/efm-4.4/startup-efm.log` for more information.
If you receive this message when starting the Failover Manager service, see the startup log `/var/log/efm-4.5/startup-efm.log` for more information.

If you are using RHEL/CentOS 7.x or RHEL/Rocky Linux/AlmaLinux 8.x, startup information is also available with the following command:

```shell
systemctl status edb-efm-4.4
systemctl status edb-efm-4.5
```

To prevent a cluster from inadvertently connecting to the database of another cluster, the cluster name is incorporated into the encrypted password. If you modify the cluster name, you must re-encrypt the database password and update the cluster properties file.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Each node in a Failover Manager cluster has a cluster members file (by default n
After completing the Failover Manager installation, make a working copy of the template:

```shell
cp /etc/edb/efm-4.4/efm.nodes.in /etc/edb/efm-4.4/efm.nodes
cp /etc/edb/efm-4.5/efm.nodes.in /etc/edb/efm-4.5/efm.nodes
```

After copying the template file, change the owner of the file to efm:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,18 @@ The `efm-42` file is located in `/etc/sudoers.d` and contains the following entr
# If you run your db service under a non-default account, you will need to copy
# this file to grant the proper permissions and specify the account in your efm
# cluster properties file by changing the 'db.service.owner' property.
efm ALL=(postgres) NOPASSWD: /usr/edb/efm-4.4/bin/efm_db_functions
efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-4.4/bin/efm_db_functions
efm ALL=(postgres) NOPASSWD: /usr/edb/efm-4.5/bin/efm_db_functions
efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-4.5/bin/efm_db_functions
# Allow user 'efm' to sudo efm_root_functions as 'root' to write/delete the PID file,
# validate the db.service.owner property, etc.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.4/bin/efm_root_functions
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.5/bin/efm_root_functions
# Allow user 'efm' to sudo efm_address as root for VIP tasks.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.4/bin/efm_address
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.5/bin/efm_address
# Allow user 'efm' to sudo efm_pgpool_functions as root for pgpool tasks.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.4/bin/efm_pgpool_functions
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.5/bin/efm_pgpool_functions
# relax tty requirement for user 'efm'
Defaults:efm !requiretty
Expand Down Expand Up @@ -89,17 +89,17 @@ To run Failover Manager without sudo, you must select a database process owner w
```shell
su - enterprisedb

cp /etc/edb/efm-4.4/efm.properties.in <directory/cluster_name>.properties
cp /etc/edb/efm-4.5/efm.properties.in <directory/cluster_name>.properties

cp /etc/edb/efm-4.4/efm.nodes.in <directory>/<cluster_name>.nodes
cp /etc/edb/efm-4.5/efm.nodes.in <directory>/<cluster_name>.nodes
```

Then, modify the cluster properties file, providing the name of the user in the `db.service.owner` property. Also make sure that the `db.service.name` property is blank. Without sudo, you can't run services without root access.

After modifying the configuration, the new user can control Failover Manager with the following command:

```shell
/usr/edb/efm-4.4/bin/runefm.sh start|stop <directory/cluster_name>.properties
/usr/edb/efm-4.5/bin/runefm.sh start|stop <directory/cluster_name>.properties
```

Where `<directory/cluster_name.properties>` specifies the full path of the cluster properties file. The user provides the full path to the properties file whenever the nondefault user is controlling agents or using the `efm` script.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ redirects:
<div id="configuring_for_eager_failover" class="registered_link"></div>


In default run mode, if a primary Failover Manager process fails, there's no failover protection until the agent restarts. To avoid this case, you can set up the primary node through `systemd` to cause a failover when the primary agent exits, which is called Eager Failover.
In default run mode, if a primary Failover Manager process fails, there's no failover protection until the agent restarts. To avoid this case, you can set up the primary node through `systemd` to cause a failover when the primary agent exits, which is called Eager Failover.

You can set up Eager Failover by performing the following steps. The example uses EDB Postgres Advanced Server version 12 and Failover Manager version 4.2.
You can set up Eager Failover by performing the following steps. The example uses EDB Postgres Advanced Server version 12 and Failover Manager version 4.5.

<div id="enabling_the_eager_failover" class="registered_link"></div>

Expand All @@ -20,29 +20,29 @@ You can set up Eager Failover by performing the following steps. The example us
```

If you don't set this property before starting Failover Manager, shutting down a Failover Manager agent shuts down the database without failover.

- With Eager Failover enabled, using the `efm stop-cluster` command stops all of the Failover Manager agents and shuts down the primary database. Since the agents aren't running, there's no failover. To avoid thihs scenario, you can disable the command using the `enable.stop.cluster` property.

```ini
enable.stop.cluster=false
```

- Ensure that the database server and the local Failover Manager agent are running.

- As root, create `/etc/systemd/system/edb-as-12.service` file and include:

```ini
.include /lib/systemd/system/edb-as-12.service
[Unit]
BindsTo=edb-efm-4.2.service
BindsTo=edb-efm-4.5.service
```

- Run the following command to reload the configuration files:

```shell
systemctl daemon-reload
```

With these changes, when the Failover Manager agent is stopped or ended, the rest of the cluster treats this situation as a failure and attempts a failover.

<div id="disabling_the_eager_failover" class="registered_link"></div>
Expand All @@ -51,7 +51,7 @@ With these changes, when the Failover Manager agent is stopped or ended, the res

- If you want to stop Failover Manager without stopping the database, comment out the following line in `/etc/systemd/system/edb-as-12.service`:
```ini
BindsTo=edb-efm-4.2.service
BindsTo=edb-efm-4.5.service
```
- Run the following command to reload the configuration files:
```shell
Expand All @@ -60,10 +60,10 @@ With these changes, when the Failover Manager agent is stopped or ended, the res

## Upgrading Failover Manager in Eager Failover mode

To upgrade Failover Manager without stopping EDB Postgres Advanced Server, temporarily disable the Eager Failover mode.
To upgrade Failover Manager without stopping EDB Postgres Advanced Server, temporarily disable the Eager Failover mode.

1. [Disable Eager Failover](#disabling_the_eager_failover)

1. [Disable Eager Failover](#disabling_the_eager_failover)

2. [Stop and upgrade Failover Manager](../12_upgrading_existing_cluster/#upgrading_existing_cluster)

3. [Enable Eager Failover](#enabling_the_eager_failover)
Expand All @@ -72,27 +72,27 @@ To upgrade Failover Manager without stopping EDB Postgres Advanced Server, tempo

- Since the `systemd` command isn't used to manage the database while running Failover Manager with a non-sudo setup, Eager Failover is supported only in sudo mode. It isn't supported in a non-sudo mode.

- Eager Failover isn't suitable for situations in which a VIP wouldn't be released by the old primary.
- Eager Failover isn't suitable for situations in which a VIP wouldn't be released by the old primary.

- Eager Failover is suitable in the following situations:

- With the EDB Postgres Advanced Server high-availability setup.
- In a setup using client connection failover with [jdbc](https://jdbc.postgresql.org/documentation/head/connect.html#connection-failover) or libpq [(target-session-attrs)](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS).
- When custom scripting triggered by `script.fence` would fence the old primary server (STONITH). Some examples are to shut down the VM with VMWare vCenter integration, openstack integration, or lights-out management.
- When custom scripting triggered by `script.fence` would fence the old primary server (STONITH). Some examples are to shut down the VM with VMWare vCenter integration, openstack integration, or lights-out management.
- When custom scripting triggered by `script.fence` would use ssh to deactivate the VIP.

!!! Note
Setting `check.vip.before.promotion=false` is required to allow the new primary to attach the VIP before the old primary releases it.
Setting `check.vip.before.promotion=false` is required to allow the new primary to attach the VIP before the old primary releases it.

- Use care when using `primary.shutdown.as.failure=true`. See the description of the [primary.shutdown.as.failure](01_cluster_properties/#primary_shutdown_as_failure) property for information on how to safely bring down the database if needed.

- With every failover, a primary ends up being a failed primary, which doesn't automatically recover as an operational standby. Therefore, make sure the cluster contains multiple promotable standbys, and the total number of standbys is at least two more than the value specified for the `minimum.standbys` property. This is a general recommendation, but it becomes more pressing when using Eager Failover.
- If the database server is stopped, restarting the database also starts Failover Manager.
- If the database server is stopped, restarting the database also starts Failover Manager.

!!! Note
- If there's a problem starting Failover Manager, such as a bad property value, the database server starts and shuts down again without displaying any warning that it isn't running.
- If the Failover Manager process was previously ended, the lock file still exists, and the agent can't restart automatically.
- If problems occur when starting the database server or the Failover Manager agent, check the Failover Manager startup log for information.

- As a result of running the `stop-cluster` command, Failover Manager stops on all the nodes. In Eager Failover mode, the `stop-cluster` command also stops EDB Postgres Advanced Server without a failover. Set `enable.stop.cluster=false` to make sure the `stop-cluster` command can't be invoked unintentionally.
- As a result of running the `stop-cluster` command, Failover Manager stops on all the nodes. In Eager Failover mode, the `stop-cluster` command also stops EDB Postgres Advanced Server without a failover. Set `enable.stop.cluster=false` to make sure the `stop-cluster` command can't be invoked unintentionally.

8 changes: 4 additions & 4 deletions product_docs/docs/efm/4/05_using_efm.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Using Failover Manager"
redirects:
redirects:
- ../efm_user/05_using_efm
legacyRedirectsGenerated:
# This list is generated by a script. If you need add entries, use the `legacyRedirects` key.
Expand Down Expand Up @@ -54,7 +54,7 @@ If a new primary or standby node joins a cluster, all of the existing nodes also

### Adding nodes to a cluster

You can add a node to a Failover Manager cluster at any time. When you add a node to a cluster, you must modify the cluster to allow the new node, and then tell the new node how to find the cluster.
You can add a node to a Failover Manager cluster at any time. When you add a node to a cluster, you must modify the cluster to allow the new node, and then tell the new node how to find the cluster.

1. Unless `auto.allow.hosts` is set to `true`, use the `efm allow-node` command to add the address of the new node to the Failover Manager allowed node host list. When invoking the command, specify the cluster name and the address of the new node:

Expand Down Expand Up @@ -221,7 +221,7 @@ The following parameters must be unique in each cluster properties file:
`db.data.dir`

`virtual.ip` (if used)

`db.service.name` (if used)

In each cluster properties file, the `db.port` parameter specifies a unique value for each cluster. The `db.user` and `db.database` parameter can have the same value or a unique value. For example, the `acctg.properties` file can specify:
Expand Down Expand Up @@ -282,7 +282,7 @@ Environment=CLUSTER=acctg
Also update the value of the `PIDfile` parameter to specify the new cluster name. For example:

```ini
PIDFile=/var/run/efm-4.4/acctg.pid
PIDFile=/var/run/efm-4.5/acctg.pid
```

After copying the service scripts, enable the services:
Expand Down
14 changes: 7 additions & 7 deletions product_docs/docs/efm/4/08_controlling_efm_service.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Controlling the Failover Manager service"
redirects:
redirects:
- ../efm_user/08_controlling_efm_service
legacyRedirectsGenerated:
# This list is generated by a script. If you need add entries, use the `legacyRedirects` key.
Expand Down Expand Up @@ -40,12 +40,12 @@ Stop the Failover Manager on the current node. This command must be invoked by r
The `status` command returns the status of the Failover Manager agent on which it is invoked. You can invoke the status command on any node to instruct Failover Manager to return status and server startup information.

```text
[root@ONE ~]}> systemctl status edb-efm-4.4
edb-efm-4.4.service - EnterpriseDB Failover Manager 4.4
Loaded: loaded (/usr/lib/systemd/system/edb-efm-4.4.service; disabled; vendor preset: disabled)
[root@ONE ~]}> systemctl status edb-efm-4.5
edb-efm-4.5.service - EnterpriseDB Failover Manager 4.5
Loaded: loaded (/usr/lib/systemd/system/edb-efm-4.5.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2013-02-14 14:02:16 EST; 4s ago
Process: 58125 ExecStart=/bin/bash -c /usr/edb/edb-efm-4.4/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS)
Process: 58125 ExecStart=/bin/bash -c /usr/edb/edb-efm-4.5/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS)
Main PID: 58180 (java)
CGroup: /system.slice/edb-efm-4.4.service
└─58180 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/bin/java -cp /usr/edb/edb-efm-4.4/lib/EFM-4.4.0.jar -Xmx128m...
CGroup: /system.slice/edb-efm-4.5.service
└─58180 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/bin/java -cp /usr/edb/edb-efm-4.5/lib/EFM-4.5.0.jar -Xmx128m...
```
Loading

0 comments on commit 1a901a6

Please sign in to comment.