From 2f5120d36ce1061e4b7bfc060215caf60374582a Mon Sep 17 00:00:00 2001 From: Betsy Gitelman Date: Mon, 9 Dec 2024 16:03:18 -0500 Subject: [PATCH] Edits to Added info about the Promote Status section of cluster status output #6317 --- .../docs/efm/4/06_monitoring_efm_cluster.mdx | 27 ++++++++++--------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/product_docs/docs/efm/4/06_monitoring_efm_cluster.mdx b/product_docs/docs/efm/4/06_monitoring_efm_cluster.mdx index 07e4873073a..5489963c12a 100644 --- a/product_docs/docs/efm/4/06_monitoring_efm_cluster.mdx +++ b/product_docs/docs/efm/4/06_monitoring_efm_cluster.mdx @@ -10,7 +10,7 @@ legacyRedirectsGenerated: -You can use either the Failover Manager `efm cluster-status` command or the PEM Client interface to check the current status of a monitored node of a Failover Manager cluster. +You can use either the Failover Manager `efm cluster-status` command or the PEM client interface to check the current status of a monitored node of a Failover Manager cluster. ## Reviewing the cluster status report @@ -62,7 +62,7 @@ Standby 172.19.12.163 UP 192.168.225.190 Primary 172.19.14.9 UP 192.168.225.190* ``` -The asterisk (\*) after the VIP address indicates that the address is available for connections. If a VIP address is not followed by an asterisk, the address was associated with the node in the properties file, but the address isn't currently in use. +The asterisk (\*) after the VIP address indicates that the address is available for connections. If a VIP address isn't followed by an asterisk, the address was associated with the node in the properties file, but the address isn't currently in use. Failover Manager agents provide the information displayed in the Cluster Status section. @@ -76,9 +76,10 @@ Standby priority host list: 172.19.12.163 172.19.10.2 ``` -The `Promote Status` section of the report includes information related to promotion in the cluster. The LSN information is used, along with the `Standby priority host list`, when choosing a standby to promote. If there is a mismatch in replay LSNs, Failover Manager will not allow a switchover (though the promotion of a standby is always allowed). +The `Promote Status` section of the report includes information related to promotion in the cluster. When choosing a standby to promote, the LSN information is used, along with the `Standby priority host list`. If there's a mismatch in replay LSNs, Failover Manager doesn't allow a switchover. However, the promotion of a standby is always allowed. + +The LSN information is the result of a direct query from the node on which you're invoking the `cluster-status` command to each database in the cluster. The query also returns the transaction log location of each database. Because the queries to each database return at different times, the LSNs might not match even if streaming replication is working normally for the cluster. To get the latest view of replication, connect to the primary database, and execute SQL command `SELECT * FROM pg_stat_replication;`. -The LSN information is the result of a direct query from the node on which you are invoking the `cluster-status` command to each database in the cluster. The query also returns the transaction log location of each database. Because the queries to each database return at different times, the LSNs might not match even if streaming replication is working normally for the cluster. To get the latest view of replication, connect to the primary database, and execute SQL command `SELECT * FROM pg_stat_replication;`. ```text Promote Status: @@ -89,7 +90,7 @@ Standby 172.19.12.163 0/4000638 0/4000638 Standby 172.19.10.2 0/4000638 0/4000638 ``` -If a database is down or if the database was restarted, but the resume command was not yet invoked, the state of the agent that resides on that host is idle. If an agent is idle, the cluster status report includes a summary of the condition of the idle node. For example: +If a database is down or if the database was restarted but the resume command wasn't yet invoked, the state of the agent that resides on that host is idle. If an agent is idle, the cluster status report includes a summary of the condition of the idle node. For example: ```text Agent Type Address DB VIP @@ -103,23 +104,23 @@ The cluster status process returns an exit code based on the state of the cluste - An exit code of `0` indicates that all agents are running, and the databases on the primary and standby nodes are running and in sync. -- A nonzero exit code indicates that there is a problem. The following problems can trigger a nonzero exit code: +- A nonzero exit code indicates a problem. The following problems can trigger a nonzero exit code: - A database is down, unknown, or has an idle agent. + - A database is down, unknown, or has an idle agent. - Failover Manager can't decrypt the provided database password. + - Failover Manager can't decrypt the provided database password. - There's a problem contacting the databases to get WAL locations. + - There's a problem contacting the databases to get WAL locations. - There's no primary agent. + - There's no primary agent. - There are no standby agents. + - There are no standby agents. - One or more standby nodes aren't in sync with the primary. + - One or more standby nodes aren't in sync with the primary. ## Monitoring streaming replication with Postgres Enterprise Manager -If you use Postgres Enterprise Manager (PEM) to monitor your servers, you can configure the Streaming Replication Analysis dashboard (part of the PEM interface) to display the state of a primary or standby node that is part of a Streaming Replication scenario. +If you use Postgres Enterprise Manager (PEM) to monitor your servers, you can configure the Streaming Replication Analysis dashboard (part of the PEM interface) to display the state of a primary or standby node that's part of a Streaming Replication scenario. ![The Streaming Replication dashboard (Primary node)](images/str_replication_dashboard_master.png)