Skip to content

Commit

Permalink
Merge pull request #1964 from EnterpriseDB/content/pem/8/ha_guide_update
Browse files Browse the repository at this point in the history
Updated the HA guide according to Ashesh's Comments on the google doc
  • Loading branch information
nidhibhammar authored Nov 18, 2021
2 parents 0225ccf + 729d3c3 commit f26c942
Showing 1 changed file with 176 additions and 99 deletions.
275 changes: 176 additions & 99 deletions product_docs/docs/pem/8/pem_ha_setup/setup_ha_using_efm.mdx
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: "High Availability Using EFM"
title: "High Availability Using Failover Manager"
---

Postgres Enterprise Manager (PEM) assists database administrators, system architects, and performance analysts in administering, monitoring, and tuning Postgres database servers.

EDB Postgres Failover Manager (EFM) is a high availability tool from EDB that enables a Postgres primary node to automatically failover to a standby node in the event of a software or hardware failure on the primary.
Failover Manager (EFM) is a high availability tool from EDB that enables a Postgres primary node to automatically failover to a standby node in the event of a software or hardware failure on the primary.

The examples in the following sections use these IP addresses:

Expand All @@ -14,30 +14,42 @@ The examples in the following sections use these IP addresses:
- 172.16.161.203 - EFM Witness Node
- 172.16.161.245 - PEM VIP (used by agents and users to connect)

# Initial product installation and configuration
The following needs to use the VIP address:

1. Install the following on the primary and one or more standbys
- The PEM Agent binding of the monitored database servers
- Accessing the PEM Web Client
- Accessing the Webserver services

- [EDB Postgres Advanced Server 13](https://www.enterprisedb.com/docs/epas/latest/) (as backend database server for PEM)
- [PEM Server](https://www.enterprisedb.com/docs/pem/latest/)
# Initial Product Installation and Configuration

1. Install the following on the primary and one or more standbys:

- [EDB Postgres Advanced Server 13](https://www.enterprisedb.com/docs/epas/latest/epas_inst_linux/) (backend database for PEM Server)
- [PEM Server](https://www.enterprisedb.com/docs/pem/latest/pem_inst_guide_linux/)
- [EDB Failover Manager 4.1](https://www.enterprisedb.com/docs/efm/latest/efm_user/03_installing_efm/)

Refer to the installation instructions in the product documentation using the links above or refer to the instructions given on the [EDBrepos website](https://repos.enterprisedb.com). You need to replace the `USERNAME:PASSWORD` with your username and password in the instructions to access the EDB repositories.
Refer to the installation instructions in the product documentation using the links above or refer to the instructions given on the [EDB repos website](https://repos.enterprisedb.com). You need to replace the `USERNAME:PASSWORD` with your username and password in the instructions to access the EDB repositories.

Ensure that the database server is configured to use `scram-sha-256` authentication method as the PEM Server configuration script does not work with `trust` authentication.

You need to install the `java-1.8.0-openjdk` package to install EFM.

2. Configure the PEM Server on all the primary and one or more standbys independently. For more detail, see Configuring the PEM Server section in the [PEM Installation Guides](https://www.enterprisedb.com/docs/pem/latest/).
2. Configure the PEM Server on the primary server with initial configuration of type 1 (i.e. Web Services and Database):

```text
/usr/edb/pem/bin/configure-pem-server.sh -t 1
```
For more detail on configuration types see, [Configuring the PEM Server](https://www.enterprisedb.com/docs/pem/latest/pem_inst_guide_linux/04_installing_postgres_enterprise_manager/05_configuring_the_pem_server_on_linux/).

3. Add the following ports in the firewall on all the primary and standbys to allow the access:
3. Add the following ports in the firewall on the primary and all the standby servers to allow the access:

- `8443` - for PEM Server (https)
- `5444` - for EPAS 13
- `7800` - for EFM
- `7908` - for EFM Admin

For example:

```text
$ sudo firewall-cmd --zone=public --add-port=5444/tcp --permanent
success
Expand All @@ -51,9 +63,9 @@ The examples in the following sections use these IP addresses:
success
```

# Set up the primary node for streaming replication.
# Set Up the Primary Node for Streaming Replication.

1. Create the replication role using the following command
1. Create the replication role using the following command.

```text
$ /usr/edb/as13/bin/psql -h 172.16.161.200 -p 5444 -U enterprisedb edb -c “CREATE ROLE repl REPLICATION LOGIN PASSWORD 'password'”;
Expand All @@ -73,19 +85,19 @@ The examples in the following sections use these IP addresses:

For more detailed information on configuring parameters for streaming replication refer to the [PostgreSQL documentation](https://www.postgresql.org/docs/13/warm-standby.html#STREAMING-REPLICATION).

!!! Note
The configuration parameters may be different for different versions of database server. You can email support at [[email protected]](mailto:[email protected]) for any help in setting up these parameters.
!!! Note
The configuration parameters may be different for different versions of database server. You can email support at [[email protected]](mailto:[email protected]) for any help in setting up these parameters.

3. Add the following entry in the host-based authentication (`/var/lib/edb/as13/data/pg_hba.conf`) file to allow replication user to connect from all the standbys:
3. Add the following entry in the host-based authentication (`/var/lib/edb/as13/data/pg_hba.conf`) file to allow the replication user to connect from all the standbys:

```text
hostssl replication repl 172.16.161.0/24 scram-sha-256
hostssl replication repl 172.16.161.201/24 scram-sha-256
```

!!! Note
You can provide ip address with a more appropriate cidr range.
You can change the cidr range of the IP address, if needed.

4. Modify the host-based authentication (`/var/lib/edb/as13/data/pg_hba.conf`) file for the `pem_user` role to connect to all databases using the `scram-sha-256` authentication method.
4. Modify the host-based authentication (`/var/lib/edb/as13/data/pg_hba.conf`) file for the `pem_user` role to connect to all databases using the `scram-sha-256` authentication method as below:

```text
# Allow local PEM agents and admins to connect to PEM server
Expand All @@ -97,23 +109,24 @@ The examples in the following sections use these IP addresses:
hostssl pem +pem_agent 0.0.0.0/0 cert
```

5. Restart the EPAS 13 Server
5. Restart the EPAS 13 Server.

```text
systemctl restart edb-as-13.service
```

# Set up the Standby Nodes for Streaming Replication
# Set Up the Standby Nodes for Streaming Replication

1. Stop the PEM backend database server (EPAS 13) service and the `pemagent` service on all the standby nodes:
1. Stop the service for EPAS 13 on all the standby nodes:

```text
$ systemctl stop edb-as-13.service
$ systemctl stop pemagent.service
```

2. Remove the data directory of the PEM backend database server on all the standby nodes:
!!! Note
This example uses the `pg_basebackup` utility to create the replicas of the PEM backend database server on the standby servers. When using `pg_basebackup`, you need to stop the existing database server and remove the existing data directories.

2. Remove the data directory of the database server on all the standby nodes:

```text
$ sudo su - enterprisedb
Expand Down Expand Up @@ -142,41 +155,100 @@ The examples in the following sections use these IP addresses:
-D /var/lib/edb/as13/data -U repl -v -P -Fp -R -p 5444
```

The backup command creates the `postgresql.auto.conf` and the `standby.signal` file on the standby nodes.
The backup command creates the `postgresql.auto.conf` and the `standby.signal` file on the standby nodes. The `postgresql.auto.conf` file content is as below:

```text
sudo su - enterprisedb cat /var/lib/edb/as13/data/postgresql.auto.conf
# Do not edit this file manually
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = ‘user=repl passfile=’’/var/lib/edb/.pgpass’’ channel_binding=prefer host=172.16.161.200 port=5444 sslmode=prefer sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsvrname=postgres target_session_attrs=any’
```

5. Edit the following parameter in `postgresql.con` file on all the standby nodes:
5. Edit the following parameter in `postgresql.conf` file on each of the standby nodes:

```text
hot_standby = on
```

6. Start the PEM backend database server (EPAS 13) on each of the standby nodes.
6. Start the EPAS 13 database server on each of the standby nodes:

```text
$ systemctl enable edb-as-13
$ systemctl start edb-as-13
```

7. Copy the PEM agent certificates and keys from `/root/.pem` from the primary node to the standby nodes at same location. Set the following permissions to the PEM Agent Certificate file:

```text
$ sudo chmod 600 /root/.pem/agent1.key
$ sudo chmod 640 /root/.pem/agent1.crt
```
7. Copy the following files from the primary node to the standby nodes at the same location, overwriting any existing files, and set the following permissions on the files:

- `/etc/httpd/conf.d/edb-pem.conf`
- `/etc/httpd/conf.d/edb-ssl-pem.conf`
- `/root/.pem/agent1.crt`
- `/root/.pem/agent1.key`
- `/usr/edb/pem/agent/etc/agent.cfg`
- `/usr/edb/pem/share/.install-config`
- `/usr/edb/pem/web/pem.wsgi`
- `/usr/edb/pem/web/config_setup.py`

For example,

```text
$ mkdir -p /root/.pem
$ chown root:root /root/.pem
$ chmod 0755 /root/.pem
$ mkdir -p /var/lib/pemhome/.pem
$ chown pem:pem /var/lib/pemhome/.pem
$ chmod 0700 /var/lib/pemhome/.pem
$ mkdir -p /usr/edb/pem/logs
$ chown root:root /usr/edb/pem/logs
$ chmod 0755 /usr/edb/pem/logs
$ for file in /etc/httpd/conf.d/edb-pem.conf \
/etc/httpd/conf.d/edb-ssl-pem.conf \
/root/.pem/agent1.crt \
/usr/edb/pem/agent/etc/agent.cfg \
/usr/edb/pem/share/.install-config \
/usr/edb/pem/web/pem.wsgi \
/usr/edb/pem/web/config_setup.py; do \
chown root:root ${file}; \
chmod 0644 ${file}; \
done;
$ chmod 0600 /root/.pem/agent1.key
$ chown root:root /root/.pem/agent1.key
```

This ensures that webserver is configured on the standby and is disabled by default. It is enabled automatically at switchover by EFM.

!!! Note
You need to keep the certificates in sync on master and standbys manually whenever the certificates are updated.

# Set up EFM to manage failover on all hosts
8. Run the `configure-selinux.sh` script to configure the SELinux policy for PEM as following:

```text
$ /usr/edb/pem/bin/configure-selinux.sh
getenforce found, now executing 'getenforce' command
Configure the httpd to work with the SELinux
Allow the httpd to connect the database (httpd_can_network_connect_db = on)
Allow the httpd to connect the network (httpd_can_network_connect = on)
Allow the httpd to work with cgi (httpd_enable_cgi = on)
Allow to read & write permission on the 'pem' user home directory
SELinux policy is configured for PEM
$ sudo chmod 640 /root/.pem/agent1.crt
```

!!! Note
At this point you should have a PEM Primary Server and two standbys that are ready to take over from the primary whenever needed.


# Set Up EFM to Manage Failover on All Hosts

1. Prepare the Primary Node to support EFM:

- Create a database user `efm` to connect to the database servers.
- Grant the execute privileges on the functions related to WAL logs and the monitoring privileges to the user.
- Add entries in pg_hba.conf to allow the `efm` database user to connect to the database server from all nodes on all the hosts.
- Reload the configurations on all the database servers.

For example,

```text
$ cat > /tmp/efm-role.sql << _EOF_
Expand Down Expand Up @@ -273,7 +345,7 @@ The examples in the following sections use these IP addresses:
$ sudo chmod a+r /etc/edb/efm-4.1/efm.properties
```

7. Encrypt the `efm` user's password using efm utility:
7. Encrypt the efm user's password using efm utility:

```text
$ export EFMPASS=password
Expand All @@ -284,70 +356,70 @@ The examples in the following sections use these IP addresses:
8. Edit the following parameters in the properties file:

```text
1. db.user=efm
2. db.password.encrypted=096666746b05b081d1a98e43d94c9dad
3. db.port=5444
4. db.database=edb
5. db.service.owner=enterprisedb
6. db.service.name=edb-as-13
7. db.bin=/usr/edb/as13/bin
8. db.data.dir=/var/lib/edb/as13/data
9. jdbc.sslmode=require
10. [email protected]
11. from.email=node1@efm-pem
12. notification.level=INFO
13. notification.text.prefix=[PEM/EFM]
14. bind.address=172.16.161.200:7800
15. admin.port=7809
16. is.witness=false
17. local.period=10
18. local.timeout=60
19. local.timeout.final=10
20. remote.timeout=10
21. node.timeout=50
22. encrypt.agent.messages=true
23. stop.isolated.primary=true
24. stop.failed.primary=true
25. primary.shutdown.as.failure=false
26. update.physical.slots.period=0
27. ping.server.ip=8.8.8.8
28. ping.server.command=/bin/ping -q -c3 -w5
29. auto.allow.hosts=false
30. stable.nodes.file=false
31. db.reuse.connection.count=0
32. auto.failover=true
33. auto.reconfigure=true
34. promotable=true
35. use.replay.tiebreaker=true
36. standby.restart.delay=0
37. reconfigure.num.sync=false
38. reconfigure.sync.primary=false
39. minimum.standbys=0
40. recovery.check.period=1
41. restart.connection.timeout=60
42. auto.resume.period=0
43. virtual.ip=172.16.161.245
44. virtual.ip.interface=ens33
45. virtual.ip.prefix=24
46. virtual.ip.single=true
47. check.vip.before.promotion=true
48. pgpool.enable=false
49. sudo.command=sudo
50. sudo.user.command=sudo -u %u
51. syslog.host=localhost
52. syslog.port=514
53. syslog.protocol=UDP
54. syslog.facility=LOCAL1
55. file.log.enabled=true
56. syslog.enabled=false
57. jgroups.loglevel=INFO
58. efm.loglevel=INFO
59. jvm.options=-Xmx128m
60. script.fence=/usr/local/bin/stop-pemagent.sh
61. script.post.promotion=/usr/local/bin/start-pemagent.sh
db.user=efm
db.password.encrypted=096666746b05b081d1a98e43d94c9dad
db.port=5444
db.database=edb
db.service.owner=enterprisedb
db.service.name=edb-as-13
db.bin=/usr/edb/as13/bin
db.data.dir=/var/lib/edb/as13/data
jdbc.sslmode=require
[email protected]
from.email=node1@efm-pem
notification.level=INFO
notification.text.prefix=[PEM/EFM]
bind.address=172.16.161.200:7800
admin.port=7809
is.witness=false
local.period=10
local.timeout=60
local.timeout.final=10
remote.timeout=10
node.timeout=50
encrypt.agent.messages=true
stop.isolated.primary=true
stop.failed.primary=true
primary.shutdown.as.failure=false
update.physical.slots.period=0
ping.server.ip=8.8.8.8
ping.server.command=/bin/ping -q -c3 -w5
auto.allow.hosts=false
stable.nodes.file=false
db.reuse.connection.count=0
auto.failover=true
auto.reconfigure=true
promotable=true
use.replay.tiebreaker=true
standby.restart.delay=0
reconfigure.num.sync=false
reconfigure.sync.primary=false
minimum.standbys=0
recovery.check.period=1
restart.connection.timeout=60
auto.resume.period=0
virtual.ip=172.16.161.245
virtual.ip.interface=ens33
virtual.ip.prefix=24
virtual.ip.single=true
check.vip.before.promotion=true
pgpool.enable=false
sudo.command=sudo
sudo.user.command=sudo -u %u
syslog.host=localhost
syslog.port=514
syslog.protocol=UDP
syslog.facility=LOCAL1
file.log.enabled=true
syslog.enabled=false
jgroups.loglevel=INFO
efm.loglevel=INFO
jvm.options=-Xmx128m
script.remote.post.promotion=/usr/local/bin/stop-pemagent.sh
script.post.promotion=/usr/local/bin/start-pemagent.sh
```

9. Set the value of `is.witness` configuration parameter on the witness node to `true`:
9. Set the value of the `is.witness` configuration parameter on the witness node to `true`:

```text
is.witness=true
Expand Down Expand Up @@ -408,6 +480,11 @@ The examples in the following sections use these IP addresses:

This status confirms that EFM is set up successfully and managing the failover for the PEM Server.

The monitored database servers should register their PEM Agents using the VIP address. Also, access the PEM Web Client using a VIP address.
In case of failover, any of the standbys get promoted as the primary node. PEM Agents automatically connect to the new primary node. You can replace the failed primary node with a new standby using this procedure.

# Current Limitations

In case of failover, any of the standbys get promoted as the primary node. PEM Agents automatically connect to the new primary node. All the user sessions are lost, and they will have to log in again. You can replace the failed primary node with a new standby using this procedure.
The current limitations include:
- Web console sessions for the users are lost during the switch over.
- Per user settings done from the `Preferences` dialog are lost as they’re stored in local configuration files on the file system.
- Background processes, started via `Backup`, `Restore`, and `Maintenance` dialogs, and their logs are not shared between the systems and are lost during switch over.

0 comments on commit f26c942

Please sign in to comment.