-
Notifications
You must be signed in to change notification settings - Fork 251
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1964 from EnterpriseDB/content/pem/8/ha_guide_update
Updated the HA guide according to Ashesh's Comments on the google doc
- Loading branch information
Showing
1 changed file
with
176 additions
and
99 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,10 @@ | ||
--- | ||
title: "High Availability Using EFM" | ||
title: "High Availability Using Failover Manager" | ||
--- | ||
|
||
Postgres Enterprise Manager (PEM) assists database administrators, system architects, and performance analysts in administering, monitoring, and tuning Postgres database servers. | ||
|
||
EDB Postgres Failover Manager (EFM) is a high availability tool from EDB that enables a Postgres primary node to automatically failover to a standby node in the event of a software or hardware failure on the primary. | ||
Failover Manager (EFM) is a high availability tool from EDB that enables a Postgres primary node to automatically failover to a standby node in the event of a software or hardware failure on the primary. | ||
|
||
The examples in the following sections use these IP addresses: | ||
|
||
|
@@ -14,30 +14,42 @@ The examples in the following sections use these IP addresses: | |
- 172.16.161.203 - EFM Witness Node | ||
- 172.16.161.245 - PEM VIP (used by agents and users to connect) | ||
|
||
# Initial product installation and configuration | ||
The following needs to use the VIP address: | ||
|
||
1. Install the following on the primary and one or more standbys | ||
- The PEM Agent binding of the monitored database servers | ||
- Accessing the PEM Web Client | ||
- Accessing the Webserver services | ||
|
||
- [EDB Postgres Advanced Server 13](https://www.enterprisedb.com/docs/epas/latest/) (as backend database server for PEM) | ||
- [PEM Server](https://www.enterprisedb.com/docs/pem/latest/) | ||
# Initial Product Installation and Configuration | ||
|
||
1. Install the following on the primary and one or more standbys: | ||
|
||
- [EDB Postgres Advanced Server 13](https://www.enterprisedb.com/docs/epas/latest/epas_inst_linux/) (backend database for PEM Server) | ||
- [PEM Server](https://www.enterprisedb.com/docs/pem/latest/pem_inst_guide_linux/) | ||
- [EDB Failover Manager 4.1](https://www.enterprisedb.com/docs/efm/latest/efm_user/03_installing_efm/) | ||
|
||
Refer to the installation instructions in the product documentation using the links above or refer to the instructions given on the [EDBrepos website](https://repos.enterprisedb.com). You need to replace the `USERNAME:PASSWORD` with your username and password in the instructions to access the EDB repositories. | ||
Refer to the installation instructions in the product documentation using the links above or refer to the instructions given on the [EDB repos website](https://repos.enterprisedb.com). You need to replace the `USERNAME:PASSWORD` with your username and password in the instructions to access the EDB repositories. | ||
|
||
Ensure that the database server is configured to use `scram-sha-256` authentication method as the PEM Server configuration script does not work with `trust` authentication. | ||
|
||
You need to install the `java-1.8.0-openjdk` package to install EFM. | ||
|
||
2. Configure the PEM Server on all the primary and one or more standbys independently. For more detail, see Configuring the PEM Server section in the [PEM Installation Guides](https://www.enterprisedb.com/docs/pem/latest/). | ||
2. Configure the PEM Server on the primary server with initial configuration of type 1 (i.e. Web Services and Database): | ||
|
||
```text | ||
/usr/edb/pem/bin/configure-pem-server.sh -t 1 | ||
``` | ||
For more detail on configuration types see, [Configuring the PEM Server](https://www.enterprisedb.com/docs/pem/latest/pem_inst_guide_linux/04_installing_postgres_enterprise_manager/05_configuring_the_pem_server_on_linux/). | ||
|
||
3. Add the following ports in the firewall on all the primary and standbys to allow the access: | ||
3. Add the following ports in the firewall on the primary and all the standby servers to allow the access: | ||
|
||
- `8443` - for PEM Server (https) | ||
- `5444` - for EPAS 13 | ||
- `7800` - for EFM | ||
- `7908` - for EFM Admin | ||
|
||
For example: | ||
|
||
```text | ||
$ sudo firewall-cmd --zone=public --add-port=5444/tcp --permanent | ||
success | ||
|
@@ -51,9 +63,9 @@ The examples in the following sections use these IP addresses: | |
success | ||
``` | ||
|
||
# Set up the primary node for streaming replication. | ||
# Set Up the Primary Node for Streaming Replication. | ||
|
||
1. Create the replication role using the following command | ||
1. Create the replication role using the following command. | ||
|
||
```text | ||
$ /usr/edb/as13/bin/psql -h 172.16.161.200 -p 5444 -U enterprisedb edb -c “CREATE ROLE repl REPLICATION LOGIN PASSWORD 'password'”; | ||
|
@@ -73,19 +85,19 @@ The examples in the following sections use these IP addresses: | |
|
||
For more detailed information on configuring parameters for streaming replication refer to the [PostgreSQL documentation](https://www.postgresql.org/docs/13/warm-standby.html#STREAMING-REPLICATION). | ||
|
||
!!! Note | ||
The configuration parameters may be different for different versions of database server. You can email support at [[email protected]](mailto:[email protected]) for any help in setting up these parameters. | ||
!!! Note | ||
The configuration parameters may be different for different versions of database server. You can email support at [[email protected]](mailto:[email protected]) for any help in setting up these parameters. | ||
|
||
3. Add the following entry in the host-based authentication (`/var/lib/edb/as13/data/pg_hba.conf`) file to allow replication user to connect from all the standbys: | ||
3. Add the following entry in the host-based authentication (`/var/lib/edb/as13/data/pg_hba.conf`) file to allow the replication user to connect from all the standbys: | ||
|
||
```text | ||
hostssl replication repl 172.16.161.0/24 scram-sha-256 | ||
hostssl replication repl 172.16.161.201/24 scram-sha-256 | ||
``` | ||
|
||
!!! Note | ||
You can provide ip address with a more appropriate cidr range. | ||
You can change the cidr range of the IP address, if needed. | ||
|
||
4. Modify the host-based authentication (`/var/lib/edb/as13/data/pg_hba.conf`) file for the `pem_user` role to connect to all databases using the `scram-sha-256` authentication method. | ||
4. Modify the host-based authentication (`/var/lib/edb/as13/data/pg_hba.conf`) file for the `pem_user` role to connect to all databases using the `scram-sha-256` authentication method as below: | ||
|
||
```text | ||
# Allow local PEM agents and admins to connect to PEM server | ||
|
@@ -97,23 +109,24 @@ The examples in the following sections use these IP addresses: | |
hostssl pem +pem_agent 0.0.0.0/0 cert | ||
``` | ||
|
||
5. Restart the EPAS 13 Server | ||
5. Restart the EPAS 13 Server. | ||
|
||
```text | ||
systemctl restart edb-as-13.service | ||
``` | ||
|
||
# Set up the Standby Nodes for Streaming Replication | ||
# Set Up the Standby Nodes for Streaming Replication | ||
|
||
1. Stop the PEM backend database server (EPAS 13) service and the `pemagent` service on all the standby nodes: | ||
1. Stop the service for EPAS 13 on all the standby nodes: | ||
|
||
```text | ||
$ systemctl stop edb-as-13.service | ||
$ systemctl stop pemagent.service | ||
``` | ||
|
||
2. Remove the data directory of the PEM backend database server on all the standby nodes: | ||
!!! Note | ||
This example uses the `pg_basebackup` utility to create the replicas of the PEM backend database server on the standby servers. When using `pg_basebackup`, you need to stop the existing database server and remove the existing data directories. | ||
|
||
2. Remove the data directory of the database server on all the standby nodes: | ||
|
||
```text | ||
$ sudo su - enterprisedb | ||
|
@@ -142,41 +155,100 @@ The examples in the following sections use these IP addresses: | |
-D /var/lib/edb/as13/data -U repl -v -P -Fp -R -p 5444 | ||
``` | ||
|
||
The backup command creates the `postgresql.auto.conf` and the `standby.signal` file on the standby nodes. | ||
The backup command creates the `postgresql.auto.conf` and the `standby.signal` file on the standby nodes. The `postgresql.auto.conf` file content is as below: | ||
|
||
```text | ||
sudo su - enterprisedb cat /var/lib/edb/as13/data/postgresql.auto.conf | ||
# Do not edit this file manually | ||
# It will be overwritten by the ALTER SYSTEM command. | ||
primary_conninfo = ‘user=repl passfile=’’/var/lib/edb/.pgpass’’ channel_binding=prefer host=172.16.161.200 port=5444 sslmode=prefer sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsvrname=postgres target_session_attrs=any’ | ||
``` | ||
|
||
5. Edit the following parameter in `postgresql.con` file on all the standby nodes: | ||
5. Edit the following parameter in `postgresql.conf` file on each of the standby nodes: | ||
|
||
```text | ||
hot_standby = on | ||
``` | ||
|
||
6. Start the PEM backend database server (EPAS 13) on each of the standby nodes. | ||
6. Start the EPAS 13 database server on each of the standby nodes: | ||
|
||
```text | ||
$ systemctl enable edb-as-13 | ||
$ systemctl start edb-as-13 | ||
``` | ||
|
||
7. Copy the PEM agent certificates and keys from `/root/.pem` from the primary node to the standby nodes at same location. Set the following permissions to the PEM Agent Certificate file: | ||
|
||
```text | ||
$ sudo chmod 600 /root/.pem/agent1.key | ||
$ sudo chmod 640 /root/.pem/agent1.crt | ||
``` | ||
7. Copy the following files from the primary node to the standby nodes at the same location, overwriting any existing files, and set the following permissions on the files: | ||
|
||
- `/etc/httpd/conf.d/edb-pem.conf` | ||
- `/etc/httpd/conf.d/edb-ssl-pem.conf` | ||
- `/root/.pem/agent1.crt` | ||
- `/root/.pem/agent1.key` | ||
- `/usr/edb/pem/agent/etc/agent.cfg` | ||
- `/usr/edb/pem/share/.install-config` | ||
- `/usr/edb/pem/web/pem.wsgi` | ||
- `/usr/edb/pem/web/config_setup.py` | ||
|
||
For example, | ||
|
||
```text | ||
$ mkdir -p /root/.pem | ||
$ chown root:root /root/.pem | ||
$ chmod 0755 /root/.pem | ||
$ mkdir -p /var/lib/pemhome/.pem | ||
$ chown pem:pem /var/lib/pemhome/.pem | ||
$ chmod 0700 /var/lib/pemhome/.pem | ||
$ mkdir -p /usr/edb/pem/logs | ||
$ chown root:root /usr/edb/pem/logs | ||
$ chmod 0755 /usr/edb/pem/logs | ||
$ for file in /etc/httpd/conf.d/edb-pem.conf \ | ||
/etc/httpd/conf.d/edb-ssl-pem.conf \ | ||
/root/.pem/agent1.crt \ | ||
/usr/edb/pem/agent/etc/agent.cfg \ | ||
/usr/edb/pem/share/.install-config \ | ||
/usr/edb/pem/web/pem.wsgi \ | ||
/usr/edb/pem/web/config_setup.py; do \ | ||
chown root:root ${file}; \ | ||
chmod 0644 ${file}; \ | ||
done; | ||
$ chmod 0600 /root/.pem/agent1.key | ||
$ chown root:root /root/.pem/agent1.key | ||
``` | ||
|
||
This ensures that webserver is configured on the standby and is disabled by default. It is enabled automatically at switchover by EFM. | ||
|
||
!!! Note | ||
You need to keep the certificates in sync on master and standbys manually whenever the certificates are updated. | ||
|
||
# Set up EFM to manage failover on all hosts | ||
8. Run the `configure-selinux.sh` script to configure the SELinux policy for PEM as following: | ||
|
||
```text | ||
$ /usr/edb/pem/bin/configure-selinux.sh | ||
getenforce found, now executing 'getenforce' command | ||
Configure the httpd to work with the SELinux | ||
Allow the httpd to connect the database (httpd_can_network_connect_db = on) | ||
Allow the httpd to connect the network (httpd_can_network_connect = on) | ||
Allow the httpd to work with cgi (httpd_enable_cgi = on) | ||
Allow to read & write permission on the 'pem' user home directory | ||
SELinux policy is configured for PEM | ||
$ sudo chmod 640 /root/.pem/agent1.crt | ||
``` | ||
|
||
!!! Note | ||
At this point you should have a PEM Primary Server and two standbys that are ready to take over from the primary whenever needed. | ||
|
||
|
||
# Set Up EFM to Manage Failover on All Hosts | ||
|
||
1. Prepare the Primary Node to support EFM: | ||
|
||
- Create a database user `efm` to connect to the database servers. | ||
- Grant the execute privileges on the functions related to WAL logs and the monitoring privileges to the user. | ||
- Add entries in pg_hba.conf to allow the `efm` database user to connect to the database server from all nodes on all the hosts. | ||
- Reload the configurations on all the database servers. | ||
|
||
For example, | ||
|
||
```text | ||
$ cat > /tmp/efm-role.sql << _EOF_ | ||
|
@@ -273,7 +345,7 @@ The examples in the following sections use these IP addresses: | |
$ sudo chmod a+r /etc/edb/efm-4.1/efm.properties | ||
``` | ||
|
||
7. Encrypt the `efm` user's password using efm utility: | ||
7. Encrypt the efm user's password using efm utility: | ||
|
||
```text | ||
$ export EFMPASS=password | ||
|
@@ -284,70 +356,70 @@ The examples in the following sections use these IP addresses: | |
8. Edit the following parameters in the properties file: | ||
|
||
```text | ||
1. db.user=efm | ||
2. db.password.encrypted=096666746b05b081d1a98e43d94c9dad | ||
3. db.port=5444 | ||
4. db.database=edb | ||
5. db.service.owner=enterprisedb | ||
6. db.service.name=edb-as-13 | ||
7. db.bin=/usr/edb/as13/bin | ||
8. db.data.dir=/var/lib/edb/as13/data | ||
9. jdbc.sslmode=require | ||
10. [email protected] | ||
11. from.email=node1@efm-pem | ||
12. notification.level=INFO | ||
13. notification.text.prefix=[PEM/EFM] | ||
14. bind.address=172.16.161.200:7800 | ||
15. admin.port=7809 | ||
16. is.witness=false | ||
17. local.period=10 | ||
18. local.timeout=60 | ||
19. local.timeout.final=10 | ||
20. remote.timeout=10 | ||
21. node.timeout=50 | ||
22. encrypt.agent.messages=true | ||
23. stop.isolated.primary=true | ||
24. stop.failed.primary=true | ||
25. primary.shutdown.as.failure=false | ||
26. update.physical.slots.period=0 | ||
27. ping.server.ip=8.8.8.8 | ||
28. ping.server.command=/bin/ping -q -c3 -w5 | ||
29. auto.allow.hosts=false | ||
30. stable.nodes.file=false | ||
31. db.reuse.connection.count=0 | ||
32. auto.failover=true | ||
33. auto.reconfigure=true | ||
34. promotable=true | ||
35. use.replay.tiebreaker=true | ||
36. standby.restart.delay=0 | ||
37. reconfigure.num.sync=false | ||
38. reconfigure.sync.primary=false | ||
39. minimum.standbys=0 | ||
40. recovery.check.period=1 | ||
41. restart.connection.timeout=60 | ||
42. auto.resume.period=0 | ||
43. virtual.ip=172.16.161.245 | ||
44. virtual.ip.interface=ens33 | ||
45. virtual.ip.prefix=24 | ||
46. virtual.ip.single=true | ||
47. check.vip.before.promotion=true | ||
48. pgpool.enable=false | ||
49. sudo.command=sudo | ||
50. sudo.user.command=sudo -u %u | ||
51. syslog.host=localhost | ||
52. syslog.port=514 | ||
53. syslog.protocol=UDP | ||
54. syslog.facility=LOCAL1 | ||
55. file.log.enabled=true | ||
56. syslog.enabled=false | ||
57. jgroups.loglevel=INFO | ||
58. efm.loglevel=INFO | ||
59. jvm.options=-Xmx128m | ||
60. script.fence=/usr/local/bin/stop-pemagent.sh | ||
61. script.post.promotion=/usr/local/bin/start-pemagent.sh | ||
db.user=efm | ||
db.password.encrypted=096666746b05b081d1a98e43d94c9dad | ||
db.port=5444 | ||
db.database=edb | ||
db.service.owner=enterprisedb | ||
db.service.name=edb-as-13 | ||
db.bin=/usr/edb/as13/bin | ||
db.data.dir=/var/lib/edb/as13/data | ||
jdbc.sslmode=require | ||
[email protected] | ||
from.email=node1@efm-pem | ||
notification.level=INFO | ||
notification.text.prefix=[PEM/EFM] | ||
bind.address=172.16.161.200:7800 | ||
admin.port=7809 | ||
is.witness=false | ||
local.period=10 | ||
local.timeout=60 | ||
local.timeout.final=10 | ||
remote.timeout=10 | ||
node.timeout=50 | ||
encrypt.agent.messages=true | ||
stop.isolated.primary=true | ||
stop.failed.primary=true | ||
primary.shutdown.as.failure=false | ||
update.physical.slots.period=0 | ||
ping.server.ip=8.8.8.8 | ||
ping.server.command=/bin/ping -q -c3 -w5 | ||
auto.allow.hosts=false | ||
stable.nodes.file=false | ||
db.reuse.connection.count=0 | ||
auto.failover=true | ||
auto.reconfigure=true | ||
promotable=true | ||
use.replay.tiebreaker=true | ||
standby.restart.delay=0 | ||
reconfigure.num.sync=false | ||
reconfigure.sync.primary=false | ||
minimum.standbys=0 | ||
recovery.check.period=1 | ||
restart.connection.timeout=60 | ||
auto.resume.period=0 | ||
virtual.ip=172.16.161.245 | ||
virtual.ip.interface=ens33 | ||
virtual.ip.prefix=24 | ||
virtual.ip.single=true | ||
check.vip.before.promotion=true | ||
pgpool.enable=false | ||
sudo.command=sudo | ||
sudo.user.command=sudo -u %u | ||
syslog.host=localhost | ||
syslog.port=514 | ||
syslog.protocol=UDP | ||
syslog.facility=LOCAL1 | ||
file.log.enabled=true | ||
syslog.enabled=false | ||
jgroups.loglevel=INFO | ||
efm.loglevel=INFO | ||
jvm.options=-Xmx128m | ||
script.remote.post.promotion=/usr/local/bin/stop-pemagent.sh | ||
script.post.promotion=/usr/local/bin/start-pemagent.sh | ||
``` | ||
|
||
9. Set the value of `is.witness` configuration parameter on the witness node to `true`: | ||
9. Set the value of the `is.witness` configuration parameter on the witness node to `true`: | ||
|
||
```text | ||
is.witness=true | ||
|
@@ -408,6 +480,11 @@ The examples in the following sections use these IP addresses: | |
|
||
This status confirms that EFM is set up successfully and managing the failover for the PEM Server. | ||
|
||
The monitored database servers should register their PEM Agents using the VIP address. Also, access the PEM Web Client using a VIP address. | ||
In case of failover, any of the standbys get promoted as the primary node. PEM Agents automatically connect to the new primary node. You can replace the failed primary node with a new standby using this procedure. | ||
|
||
# Current Limitations | ||
|
||
In case of failover, any of the standbys get promoted as the primary node. PEM Agents automatically connect to the new primary node. All the user sessions are lost, and they will have to log in again. You can replace the failed primary node with a new standby using this procedure. | ||
The current limitations include: | ||
- Web console sessions for the users are lost during the switch over. | ||
- Per user settings done from the `Preferences` dialog are lost as they’re stored in local configuration files on the file system. | ||
- Background processes, started via `Backup`, `Restore`, and `Maintenance` dialogs, and their logs are not shared between the systems and are lost during switch over. |