Skip to content

Commit

Permalink
Initial updates for efm 4.8.
Browse files Browse the repository at this point in the history
There are still some things marked "TODO" and more work needed on the release notes (including a release date).
  • Loading branch information
EFM-Bobby committed Sep 18, 2023
1 parent 66efad8 commit d8c69e4
Show file tree
Hide file tree
Showing 32 changed files with 124 additions and 86 deletions.
14 changes: 7 additions & 7 deletions install_template/templates/products/failover-manager/base.njk
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
{% set packageName %}edb-efm<4x>{% endset %}
{% import "platformBase/_deploymentConstants.njk" as deploy %}
{% block frontmatter %}
{#
{#
If you modify deployment path here, please first copy the old expression
and add it to the list under "redirects:" below - this ensures we don't
break any existing links.
and add it to the list under "redirects:" below - this ensures we don't
break any existing links.
#}
deployPath: efm/{{ product.version }}/installing/linux_{{platform.arch}}/efm_{{deploy.map_platform[platform.name]}}.mdx
redirects:
Expand All @@ -16,12 +16,12 @@ redirects:
- Install Postgres on the same host (not needed for witness nodes).

- See [Installing EDB Postgres Advanced Server](/epas/latest/epas_inst_linux)

- See [PostgreSQL Downloads](https://www.postgresql.org/download/)
{{ super() }}
{% endblock product_prerequisites %}
{% block postinstall %}
Where `<4x>` is the version of Failover Manager that you are installing. For example, if you are installing version 4.7, the package name would be `edb-efm47`.
Where `<4x>` is the version of Failover Manager that you are installing. For example, if you are installing version 4.8, the package name would be `edb-efm48`.

The installation process creates a user named efm that has privileges to invoke scripts that control the Failover Manager service for clusters owned by enterprisedb or postgres.

Expand All @@ -35,8 +35,8 @@ After installing on each node of the cluster:
2. Modify the [cluster members file](../../04_configuring_efm/03_cluster_members/#cluster_members) on each node.
3. If applicable, configure and test virtual IP address settings and any scripts that are identified in the cluster properties file.
4. Start the agent on each node of the cluster. For more information, see [Controlling the Failover Manager service](../../08_controlling_efm_service/).
{% endblock postinstall %}
{% endblock postinstall %}



xxxxxx
xxxxxx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Each node in a Failover Manager cluster has a properties file (by default, named
After completing the Failover Manager installation, make a working copy of the template before modifying the file contents:

```text
# cp /etc/edb/efm-4.7/efm.properties.in /etc/edb/efm-4.7/efm.properties
# cp /etc/edb/efm-4.8/efm.properties.in /etc/edb/efm-4.8/efm.properties
```

After copying the template file, change the owner of the file to efm:
Expand Down Expand Up @@ -690,7 +690,7 @@ auto.failover=true

<div id="auto_reconfigure" class="registered_link"></div>

Use the `auto.reconfigure` property to instruct Failover Manager to enable or disable automatic reconfiguration of remaining standby servers after the primary standby is promoted to primary. Set the property to `true` (the default) to enable automatic reconfiguration or `false` to disable automatic reconfiguration. This property isn't required on a dedicated witness node. If you're using EDB Postgres Advanced Server or PostgreSQL version 11 or earlier, the `recovery.conf` file is backed up during the reconfiguration process.
Use the `auto.reconfigure` property to instruct Failover Manager to enable or disable automatic reconfiguration of remaining standby servers after the primary standby is promoted to primary. Set the property to `true` (the default) to enable automatic reconfiguration or `false` to disable automatic reconfiguration. This property isn't required on a dedicated witness node. If you're using EDB Postgres Advanced Server or PostgreSQL version 11, the `recovery.conf` file is backed up during the reconfiguration process.

```ini
# After a standby is promoted, Failover Manager will attempt to
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ This example shows using the `encrypt` utility to encrypt a password for the `ac
# efm encrypt acctg
This utility will generate an encrypted password for you to place in
your Failover Manager cluster property file:
/etc/edb/efm-4.7/acctg.properties
/etc/edb/efm-4.8/acctg.properties
Please enter the password and hit enter:
Please enter the password again to confirm:
The encrypted password is: 516b36fb8031da17cfbc010f7d09359c
Expand All @@ -49,16 +49,16 @@ db.password.encrypted=516b36fb8031da17cfbc010f7d09359c
After receiving your encrypted password, paste the password into the properties file and start the Failover Manager service. If there's a problem with the encrypted password, the Failover Manager service doesn't start:

```text
[witness@localhost ~]# systemctl start edb-efm-4.7
Job for edb-efm-4.7.service failed because the control process exited with error code. See "systemctl status edb-efm-4.7.service" and "journalctl -xe" for details.
[witness@localhost ~]# systemctl start edb-efm-4.8
Job for edb-efm-4.8.service failed because the control process exited with error code. See "systemctl status edb-efm-4.8.service" and "journalctl -xe" for details.
```

If you receive this message when starting the Failover Manager service, see the startup log `/var/log/efm-4.7/startup-efm.log` for more information.
If you receive this message when starting the Failover Manager service, see the startup log `/var/log/efm-4.8/startup-efm.log` for more information.

If you are using RHEL/CentOS 7.x or RHEL/Rocky Linux/AlmaLinux 8.x, startup information is also available with the following command:

```shell
systemctl status edb-efm-4.7
systemctl status edb-efm-4.8
```

To prevent a cluster from inadvertently connecting to the database of another cluster, the cluster name is incorporated into the encrypted password. If you modify the cluster name, you must re-encrypt the database password and update the cluster properties file.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Each node in a Failover Manager cluster has a cluster members file (by default n
After completing the Failover Manager installation, make a working copy of the template:

```shell
cp /etc/edb/efm-4.7/efm.nodes.in /etc/edb/efm-4.7/efm.nodes
cp /etc/edb/efm-4.8/efm.nodes.in /etc/edb/efm-4.8/efm.nodes
```

After copying the template file, change the owner of the file to efm:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,18 @@ The `efm-42` file is located in `/etc/sudoers.d` and contains the following entr
# If you run your db service under a non-default account, you will need to copy
# this file to grant the proper permissions and specify the account in your efm
# cluster properties file by changing the 'db.service.owner' property.
efm ALL=(postgres) NOPASSWD: /usr/edb/efm-4.7/bin/efm_db_functions
efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-4.7/bin/efm_db_functions
efm ALL=(postgres) NOPASSWD: /usr/edb/efm-4.8/bin/efm_db_functions
efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-4.8/bin/efm_db_functions
# Allow user 'efm' to sudo efm_root_functions as 'root' to write/delete the PID file,
# validate the db.service.owner property, etc.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.7/bin/efm_root_functions
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.8/bin/efm_root_functions
# Allow user 'efm' to sudo efm_address as root for VIP tasks.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.7/bin/efm_address
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.8/bin/efm_address
# Allow user 'efm' to sudo efm_pgpool_functions as root for pgpool tasks.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.7/bin/efm_pgpool_functions
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-4.8/bin/efm_pgpool_functions
# relax tty requirement for user 'efm'
Defaults:efm !requiretty
Expand Down Expand Up @@ -89,17 +89,17 @@ To run Failover Manager without sudo, you must select a database process owner w
```shell
su - enterprisedb

cp /etc/edb/efm-4.7/efm.properties.in <directory/cluster_name>.properties
cp /etc/edb/efm-4.8/efm.properties.in <directory/cluster_name>.properties

cp /etc/edb/efm-4.7/efm.nodes.in <directory>/<cluster_name>.nodes
cp /etc/edb/efm-4.8/efm.nodes.in <directory>/<cluster_name>.nodes
```

Then, modify the cluster properties file, providing the name of the user in the `db.service.owner` property. Also make sure that the `db.service.name` property is blank. Without sudo, you can't run services without root access.

After modifying the configuration, the new user can control Failover Manager with the following command:

```shell
/usr/edb/efm-4.7/bin/runefm.sh start|stop <directory/cluster_name>.properties
/usr/edb/efm-4.8/bin/runefm.sh start|stop <directory/cluster_name>.properties
```

Where `<directory/cluster_name.properties>` specifies the full path of the cluster properties file. The user provides the full path to the properties file whenever the nondefault user is controlling agents or using the `efm` script.
Expand Down
15 changes: 8 additions & 7 deletions product_docs/docs/efm/4/05_using_efm.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ By default, [some of the efm commands](07_using_efm_utility/#using_efm_utility)
- [efm allow-node](07_using_efm_utility/#efm_allow_node)
- [efm disallow-node](07_using_efm_utility/#efm_disallow_node)
- [efm promote](07_using_efm_utility/#efm_promote)
- [efm reset-members](07_using_efm_utility/#efm_reset_members)
- [efm resume](07_using_efm_utility/#efm_resume)
- [efm set-priority](07_using_efm_utility/#efm_set_priority)
- [efm stop-cluster](07_using_efm_utility/#efm_stop_cluster)
Expand Down Expand Up @@ -129,7 +130,7 @@ Where:

During switchover:

- For server versions 11 and prior, the `recovery.conf` file is copied from an existing standby to the primary node. For server version 12 and later, the `primary_conninfo` and `restore_command` parameters are copied and stored in memory.
- For server versions 11, the `recovery.conf` file is copied from an existing standby to the primary node. For server version 12 and later, the `primary_conninfo` and `restore_command` parameters are copied and stored in memory.
- The primary database is stopped.
- If you are using a VIP, the address is released from the primary node.
- A standby is promoted to replace the primary node and acquires the VIP.
Expand Down Expand Up @@ -166,7 +167,7 @@ To stop the Failover Manager agent on RHEL/CentOS 7.x or RHEL/Rocky Linux/AlmaLi

`systemctl stop edb-efm-4.<x>`

Until you invoke the `efm disallow-node` command (removing the node's address from the Allowed Node host list), you can use the `service edb-efm-4.<x> start` command to restart the node later without first running the `efm allow-node` command again.
Until you invoke the `efm disallow-node` or `efm reset-members` command (removing the node's address from the Allowed Node host list), you can use the `service edb-efm-4.<x> start` command to restart the agent later without first running the `efm allow-node` command again. If you are not planning to add the agent back to the cluster, it is recommended to use the [efm reset-members](07_using_efm_utility/#efm_reset_members) command after this agent has stopped.

<div id="stopping_efm_cluster" class="registered_link"></div>
Stopping an agent doesn't signal the cluster that the agent has failed unless the [primary.shutdown.as.failure](04_configuring_efm/01_cluster_properties/#primary_shutdown_as_failure) property is set to `true`.
Expand Down Expand Up @@ -267,11 +268,11 @@ After creating the `acctg.properties` and `sales.properties` files, create a ser

If you're using RHEL/CentOS 7.x or RHEL/Rocky Linux/AlmaLinux 8.x, copy the service file `/usr/lib/systemd/system/edb-efm-4.<x>.service` to `/etc/systemd/system` with a new name that's unique for each cluster.

For example, if you have two clusters named `acctg` and `sales` managed by Failover Manager 4.7, the unit file names might be `efm-acctg.service` and `efm-sales.service`. You can create them with:
For example, if you have two clusters named `acctg` and `sales` managed by Failover Manager 4.8, the unit file names might be `efm-acctg.service` and `efm-sales.service`. You can create them with:

```shell
cp /usr/lib/systemd/system/edb-efm-4.7.service /etc/systemd/system/efm-acctg.service
cp /usr/lib/systemd/system/edb-efm-4.7.service /etc/systemd/system/efm-sales.service
cp /usr/lib/systemd/system/edb-efm-4.8.service /etc/systemd/system/efm-acctg.service
cp /usr/lib/systemd/system/edb-efm-4.8.service /etc/systemd/system/efm-sales.service
```

Then use `systemctl edit` to edit the `CLUSTER` variable in each unit file, changing the specified cluster name from `efm` to the new cluster name.
Expand All @@ -282,15 +283,15 @@ In this example, edit the `acctg` cluster by running `systemctl edit efm-acctg.s
```ini
[Service]
Environment=CLUSTER=acctg
PIDFile=/run/efm-4.7/acctg.pid
PIDFile=/run/efm-4.8/acctg.pid
```

Edit the `sales` cluster by running `systemctl edit efm-sales.service` and write:

```ini
[Service]
Environment=CLUSTER=sales
PIDFile=/run/efm-4.7/sales.pid
PIDFile=/run/efm-4.8/sales.pid
```

!!!Note
Expand Down
11 changes: 11 additions & 0 deletions product_docs/docs/efm/4/07_using_efm_utility.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ This command must be invoked by efm, a member of the efm group, or root.

Invoke the `efm disallow-node` command to remove the specified node from the allowed hosts list and prevent the node from joining a cluster. Provide the name of the cluster and the address of the node when calling the `efm disallow-node` command. This command must be invoked by efm, a member of the efm group, or root.

!!! Note
If you have removed the node from the cluster and are not planning to add it again, the [efm reset-members](07_using_efm_utility/#efm_reset_members) command can be used instead.

## efm cluster-status

<div id="efm_cluster_status" class="registered_link"></div>
Expand Down Expand Up @@ -117,6 +120,14 @@ This command must be invoked by efm, a member of the efm group, or root.
!!! Note
This command instructs the service to ignore the value specified in the `auto.failover` parameter in the cluster properties file.

## efm reset-members

<div id="efm_reset_members" class="registered_link"></div>

`efm reset-members <cluster_name>`

Invoke the `efm reset-members` command to TODO. update .nodes files, update allowed nodes. standby priority list may need to be updated.

## efm resume

<div id="efm_resume" class="registered_link"></div>
Expand Down
14 changes: 7 additions & 7 deletions product_docs/docs/efm/4/08_controlling_efm_service.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,12 @@ Stop the Failover Manager on the current node. This command must be invoked by r
The `status` command returns the status of the Failover Manager agent on which it is invoked. You can invoke the status command on any node to instruct Failover Manager to return status and server startup information.

```text
[root@ONE ~]}> systemctl status edb-efm-4.7
edb-efm-4.7.service - EnterpriseDB Failover Manager 4.7
Loaded: loaded (/usr/lib/systemd/system/edb-efm-4.7.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2013-02-14 14:02:16 EST; 4s ago
Process: 58125 ExecStart=/bin/bash -c /usr/edb/edb-efm-4.7/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS)
[root@ONE ~]}> systemctl status edb-efm-4.8
edb-efm-4.8.service - EnterpriseDB Failover Manager 4.8
Loaded: loaded (/usr/lib/systemd/system/edb-efm-4.8.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2023-09-18 09:57:13 EDT; 1min 32s ago
Process: 58125 ExecStart=/bin/bash -c /usr/edb/edb-efm-4.8/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS)
Main PID: 58180 (java)
CGroup: /system.slice/edb-efm-4.7.service
└─58180 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/bin/java -cp /usr/edb/edb-efm-4.7/lib/EFM-4.7.jar -Xmx128m...
CGroup: /system.slice/edb-efm-4.8.service
└─58180 /usr/lib/jvm/java-11-openjdk-11.0.20.0.8-1.el7_9.x86_64/bin/java -cp /usr/edb/efm-4.8/lib/EFM-4.8.jar -Xmx128m...
```
12 changes: 9 additions & 3 deletions product_docs/docs/efm/4/13_troubleshooting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,19 @@ Failover Manager is tested with OpenJDK. We strongly recommend using OpenJDK. Yo

```shell
# java -version
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
openjdk version "11.0.20" 2023-07-18 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.20.0.8-1.el7_9) (build 11.0.20+8-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.20.0.8-1.el7_9) (build 11.0.20+8-LTS, mixed mode, sharing)
```
!!! Note
There's a temporary issue with OpenJDK version 11 on RHEL and its derivatives. When starting Failover Manager, you might see an error like the following:

`java.lang.Error: java.io.FileNotFoundException: /usr/lib/jvm/java-11-openjdk-11.0.20.0.8-2.el8.x86_64/lib/tzdb.dat (No such file or directory)`

If you see this message, the workaround is to manually install the missing package using the command `sudo dnf install tzdata-java`.

## Unexpected connection attempts from outside the cluster

TODO: something outside the cluster attempts to connect, Failover Manager will show source address. Need example.Manager

TODO: if from another running cluster, run reset-members there. need example output
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Configuring a replication scenario can be complex. For detailed information abou
You might want to use a `.pgpass` file to enable md5 authentication for the replication user. This might not be the safest authentication method for your environment. For more information about the supported authentication options, see the [PostgreSQL core documentation](https://www.postgresql.org/docs/current/client-authentication.html).

!!! Note
From Version 3.10 onwards, Failover Manager uses `pg_ctl` utility for standby promotion. You don't need to set the `trigger_file` or `promote_trigger_file` parameter for promotion of a standby server.
Failover Manager uses `pg_ctl` utility for standby promotion. You don't need to set the `trigger_file` or `promote_trigger_file` parameter for promotion of a standby server.

## Limited support for cascading replication

Expand Down
Loading

0 comments on commit d8c69e4

Please sign in to comment.