From 6e591abf11a94bd866fb1ad821fb531da3d58bc7 Mon Sep 17 00:00:00 2001 From: Gregory Bulloch Date: Tue, 6 Sep 2022 09:59:13 +1000 Subject: [PATCH 01/13] chore(UPM-10477): Adding new BDR metrics to documentation --- .../metrics/index.mdx | 569 ++++++++++-------- 1 file changed, 303 insertions(+), 266 deletions(-) diff --git a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx index 87a36453130..b76266e1888 100644 --- a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx +++ b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx @@ -28,418 +28,455 @@ removed or renamed. Where feasible, an effort will be made not to change the meaning or type of existing metrics without also changing the metric name. -[comment1]: # "Generated content see upm-substrate repo config monitoring dir" +[comment1]: # "Generated content see https://github.com/EnterpriseDB/starlight-scripts/blob/main/docs/metrics_to_markdown_txt.py" #### Group `cnp_backends` -Backend counts from `pg_stat_activity` aggregated by the listed label -dimensions. Useful for identifying busy applications, excessive idle -backends, etc. -Derived from the `pg_stat_activity` view. ##### Metrics -| Metric | Usage | Description | -| -------------------------------------- | ----- | -------------------------------------------- | -| `cnp_backends_total` | GAUGE | Number of backends | -| `cnp_backends_max_tx_duration_seconds` | GAUGE | Maximum duration of a transaction in seconds | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_backends_state` | MAPPEDMETRIC | State of the backend (pg\_stat\_activity.state) mapped to integer enum | +| `cnp_backends_n_backends` | GAUGE | Number of backends in this group | +| `cnp_backends_max_tx_duration_seconds` | GAUGE | Maximum duration of a transaction in seconds in this group | +| `cnp_backends_max_backend_xmin_age` | GAUGE | Maximum duration of a transaction in seconds in this group | + ##### Labels -The metrics in this group can have these labels: +The above metrics may have these labels, represented +as dimensions in Azure Monitor: -| Label | Description | -| ------------------ | ----------------------- | -| `datname` | Name of the database | -| `usename` | Name of the user | -| `application_name` | Name of the application | -| `state` | State of the backend | +| Label | Description | +|-------|-------------| +| `datname` | Name of the database for this group of backends | +| `usename` | Name of the user in this group of backends | +| `application_name` | Name of the application for this group of backends | #### Group `cnp_backends_waiting` -Postgres-instance-level aggregate information on backends that are blocked -waiting for locks. Does not count I/O waits or other reasons backends might -wait or be blocked. -Derived from the `pg_locks` view. ##### Metrics -| Metric | Usage | Description | -| ---------------------------- | ----- | -------------------------------------------------------------------- | +| Metric | Usage | Description | +|----------|-------|-------------| | `cnp_backends_waiting_total` | GAUGE | Total number of backends that are currently waiting on other queries | #### Group `cnp_pg_database` -Per-database metrics for each database in the postgres instance. -Includes per-database vacuum progress information. - -Derived from the `pg_database` catalog. -See also `cnp_pg_stat_database`. ##### Metrics -| Metric | Usage | Description | -| ---------------------------- | ----- | ---------------------------------------------------------------------------------- | -| `cnp_pg_database_size_bytes` | GAUGE | Disk space used by the database | -| `cnp_pg_database_xid_age` | GAUGE | Number of transactions from the frozen XID to the current one | -| `cnp_pg_database_mxid_age` | GAUGE | Number of multiple transactions (Multixact) from the frozen XID to the current one | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_database_size_bytes` | GAUGE | Disk space used by the database | +| `cnp_pg_database_xid_age` | GAUGE | Number of transactions from the frozen XID to the current one | +| `cnp_pg_database_mxid_age` | GAUGE | Number of multiple transactions (Multixact) from the frozen XID to the current one | + ##### Labels -The metrics in this group can have these labels: +The above metrics may have these labels, represented +as dimensions in Azure Monitor: -| Label | Description | -| --------- | -------------------- | +| Label | Description | +|-------|-------------| | `datname` | Name of the database | #### Group `cnp_pg_postmaster` -Data on the postgres instance's managing "postmaster" process. -Derived from the `pg_postmaster_start_time()` function. ##### Metrics -| Metric | Usage | Description | -| ------------------------------ | ----- | ------------------------------------------- | -| `cnp_pg_postmaster_start_time` | GAUGE | Time when postgres started (based on epoch) | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_postmaster_start_time` | GAUGE | Time at which postgres started (based on epoch) | #### Group `cnp_pg_replication` -Physical replication details for a standby replica postgres instance -as captured from the standby replica. - -Derived from the `pg_last_xact_replay_timestamp()` function. - -Relevant only on standby replicas. -See also `cnp_pg_stat_replication`, `cnp_pg_replication_slots`. ##### Metrics -| Metric | Usage | Description | -| -------------------------------- | ----- | ----------------------------------------- | -| `cnp_pg_replication_lag` | GAUGE | Replication lag behind primary in seconds | -| `cnp_pg_replication_in_recovery` | GAUGE | Whether the instance is in recovery | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_replication_lag` | GAUGE | Replication lag behind primary in seconds | +| `cnp_pg_replication_in_recovery` | GAUGE | Whether the instance is in recovery | #### Group `cnp_pg_replication_slots` -Details about replication slots on a postgres instance. In most -configurations, only the primary server has active replication clients, -but other nodes can still have replication slots. -Logical replication slots are specific to a database, whereas -physical replication slots have an empty "database" label as they -apply to the postgres instance as a whole. - -Derived from the `pg_replication_slots` view. - -See also `cnp_pg_stat_replication`, `cnp_pg_replication`. ##### Metrics -| Metric | Usage | Description | -| ------------------------------------------ | ----- | ------------------------------------- | -| `cnp_pg_replication_slots_active` | GAUGE | Flag indicating if the slot is active | -| `cnp_pg_replication_slots_pg_wal_lsn_diff` | GAUGE | Replication lag in bytes | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_replication_slots_active` | GAUGE | Flag indicating if the slot is active | +| `cnp_pg_replication_slots_pg_wal_lsn_diff` | GAUGE | Replication lag in bytes | + ##### Labels -The metrics in this group can have these labels: +The above metrics may have these labels, represented +as dimensions in Azure Monitor: -| Label | Description | -| ----------- | ---------------------------- | +| Label | Description | +|-------|-------------| | `slot_name` | Name of the replication slot | -| `database` | Name of the database | +| `database` | Name of the database | #### Group `cnp_pg_stat_archiver` -Progress information about WAL archiving. Only the currently active primary -server generally performs WAL archiving. -WAL archiving is important for backup and restore. If WAL archiving is -delayed or failing for too long, the point-in-time recovery backups for -a postgres cluster will not be up to date. This condition has disaster recovery -implications and can potentially also affect failover. -Occasional WAL archiving failures are normal, but a growing delay in the time -since the last successful WAL archiving operation should be taken seriously. +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_archiver_archived_count` | COUNTER | Number of WAL files that have been successfully archived | +| `cnp_pg_stat_archiver_failed_count` | COUNTER | Number of failed attempts for archiving WAL files | +| `cnp_pg_stat_archiver_seconds_since_last_archival` | GAUGE | Seconds since the last successful archival operation | +| `cnp_pg_stat_archiver_seconds_since_last_failure` | GAUGE | Seconds since the last failed archival operation | +| `cnp_pg_stat_archiver_last_archived_time` | GAUGE | Epoch of the last time WAL archiving succeeded | +| `cnp_pg_stat_archiver_last_failed_time` | GAUGE | Epoch of the last time WAL archiving failed | +| `cnp_pg_stat_archiver_last_archived_wal_start_lsn` | GAUGE | Archived WAL start LSN | +| `cnp_pg_stat_archiver_last_failed_wal_start_lsn` | GAUGE | Last failed WAL LSN | +| `cnp_pg_stat_archiver_stats_reset_time` | GAUGE | Time at which these statistics were last reset | + +#### Group `cnp_pg_stat_bgwriter` -Metrics in this section are reset when a postgres stats reset is issued -on the db server. -Derived from the `pg_stat_archiver` view. ##### Metrics -| Metric | Usage | Description | -| -------------------------------------------------- | ------- | ---------------------------------------------------- | -| `cnp_pg_stat_archiver_archived_count` | COUNTER | Number of WAL files that were successfully archived | -| `cnp_pg_stat_archiver_failed_count` | COUNTER | Number of failed attempts for archiving WAL files | -| `cnp_pg_stat_archiver_seconds_since_last_archival` | GAUGE | Seconds since the last successful archival operation | -| `cnp_pg_stat_archiver_seconds_since_last_failure` | GAUGE | Seconds since the last failed archival operation | -| `cnp_pg_stat_archiver_last_archived_time` | GAUGE | Epoch of the last time WAL archiving succeeded | -| `cnp_pg_stat_archiver_last_failed_time` | GAUGE | Epoch of the last time WAL archiving failed | -| `cnp_pg_stat_archiver_last_archived_wal_start_lsn` | GAUGE | Archived WAL start LSN | -| `cnp_pg_stat_archiver_last_failed_wal_start_lsn` | GAUGE | Last failed WAL LSN | -| `cnp_pg_stat_archiver_stats_reset_time` | GAUGE | Time when these statistics were last reset | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_bgwriter_checkpoints_timed` | COUNTER | Number of scheduled checkpoints that have been performed | +| `cnp_pg_stat_bgwriter_checkpoints_req` | COUNTER | Number of requested checkpoints that have been performed | +| `cnp_pg_stat_bgwriter_checkpoint_write_time` | COUNTER | Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds | +| `cnp_pg_stat_bgwriter_checkpoint_sync_time` | COUNTER | Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds | +| `cnp_pg_stat_bgwriter_buffers_checkpoint` | COUNTER | Number of buffers written during checkpoints | +| `cnp_pg_stat_bgwriter_buffers_clean` | COUNTER | Number of buffers written by the background writer | +| `cnp_pg_stat_bgwriter_maxwritten_clean` | COUNTER | Number of times the background writer stopped a cleaning scan because it had written too many buffers | +| `cnp_pg_stat_bgwriter_buffers_backend` | COUNTER | Number of buffers written directly by a backend | +| `cnp_pg_stat_bgwriter_buffers_backend_fsync` | COUNTER | Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write) | +| `cnp_pg_stat_bgwriter_buffers_alloc` | COUNTER | Number of buffers allocated | -#### Group `cnp_pg_stat_bgwriter` +#### Group `cnp_pg_stat_database` + + + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_database_xact_commit` | COUNTER | Number of transactions in this database that have been committed | +| `cnp_pg_stat_database_xact_rollback` | COUNTER | Number of transactions in this database that have been rolled back | +| `cnp_pg_stat_database_blks_read` | COUNTER | Number of disk blocks read in this database | +| `cnp_pg_stat_database_blks_hit` | COUNTER | Number of times disk blocks were found already in the buffer cache, so that a read was not necessary (this only includes hits in the PostgreSQL buffer cache, not the operating system's file system cache) | +| `cnp_pg_stat_database_tup_returned` | COUNTER | Number of rows returned by queries in this database | +| `cnp_pg_stat_database_tup_fetched` | COUNTER | Number of rows fetched by queries in this database | +| `cnp_pg_stat_database_tup_inserted` | COUNTER | Number of rows inserted by queries in this database | +| `cnp_pg_stat_database_tup_updated` | COUNTER | Number of rows updated by queries in this database | +| `cnp_pg_stat_database_tup_deleted` | COUNTER | Number of rows deleted by queries in this database | +| `cnp_pg_stat_database_conflicts` | COUNTER | Number of queries canceled due to conflicts with recovery in this database | +| `cnp_pg_stat_database_temp_files` | COUNTER | Number of temporary files created by queries in this database | +| `cnp_pg_stat_database_temp_bytes` | COUNTER | Total amount of data written to temporary files by queries in this database | +| `cnp_pg_stat_database_deadlocks` | COUNTER | Number of deadlocks detected in this database | +| `cnp_pg_stat_database_blk_read_time` | COUNTER | Time spent reading data file blocks by backends in this database, in milliseconds | +| `cnp_pg_stat_database_blk_write_time` | COUNTER | Time spent writing data file blocks by backends in this database, in milliseconds | + + +##### Labels -Stats for the postgres background writer and checkpointer processes, which -are instance-wide and shared across all databases in a postgres instance. +The above metrics may have these labels, represented +as dimensions in Azure Monitor: -Very long delays between checkpoints on a busy system increase the time -taken for it to return to read/write availability if crash recovery is -required. Excessively frequent checkpoints can increase I/O load and the size -of the WAL stream for backup and replication. +| Label | Description | +|-------|-------------| +| `datname` | Name of this database | -The postgres documentation discusses checkpoints, dirty writeback, and -checkpoint tuning in detail. +#### Group `cnp_pg_stat_database_conflicts` -Metrics in this section are reset when a postgres stats reset is issued -on the db server. -Derived from the `pg_stat_bgwriter` catalog. ##### Metrics -| Metric | Usage | Description | -| -------------------------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | -| `cnp_pg_stat_bgwriter_checkpoints_timed` | COUNTER | Number of scheduled checkpoints that were performed | -| `cnp_pg_stat_bgwriter_checkpoints_req` | COUNTER | Number of requested checkpoints that were performed | -| `cnp_pg_stat_bgwriter_checkpoint_write_time` | COUNTER | Total amount of time that was spent in the portion of checkpoint processing where files are written to disk, in milliseconds | -| `cnp_pg_stat_bgwriter_checkpoint_sync_time` | COUNTER | Total amount of time that was spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds | -| `cnp_pg_stat_bgwriter_buffers_checkpoint` | COUNTER | Number of buffers written during checkpoints | -| `cnp_pg_stat_bgwriter_buffers_clean` | COUNTER | Number of buffers written by the background writer | -| `cnp_pg_stat_bgwriter_maxwritten_clean` | COUNTER | Number of times the background writer stopped a cleaning scan because it wrote too many buffers | -| `cnp_pg_stat_bgwriter_buffers_backend` | COUNTER | Number of buffers written directly by a backend | -| `cnp_pg_stat_bgwriter_buffers_backend_fsync` | COUNTER | Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write) | -| `cnp_pg_stat_bgwriter_buffers_alloc` | COUNTER | Number of buffers allocated | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_database_conflicts_confl_tablespace` | COUNTER | Number of queries in this database that have been canceled due to dropped tablespaces | +| `cnp_pg_stat_database_conflicts_confl_lock` | COUNTER | Number of queries in this database that have been canceled due to lock timeouts | +| `cnp_pg_stat_database_conflicts_confl_snapshot` | COUNTER | Number of queries in this database that have been canceled due to old snapshots | +| `cnp_pg_stat_database_conflicts_confl_bufferpin` | COUNTER | Number of queries in this database that have been canceled due to pinned buffers | +| `cnp_pg_stat_database_conflicts_confl_deadlock` | COUNTER | Number of queries in this database that have been canceled due to deadlocks | -#### Group `cnp_pg_stat_database` -This metrics group directly exposes the summary data postgres collects in its -own `pg_stat_database` view. It contains statistical counters maintained by -postgres for database activity. +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of the database | -Metrics in this section are reset when a postgres stats reset is issued -on the db server. +#### Group `cnp_pg_stat_user_tables` -Derived from the `pg_stat_database` catalog. -See also `cnp_pg_database`. ##### Metrics -| Metric | Usage | Description | -| ------------------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `cnp_pg_stat_database_xact_commit` | COUNTER | Number of transactions in this database that were committed | -| `cnp_pg_stat_database_xact_rollback` | COUNTER | Number of transactions in this database that were rolled back | -| `cnp_pg_stat_database_blks_read` | COUNTER | Number of disk blocks read in this database | -| `cnp_pg_stat_database_blks_hit` | COUNTER | Number of times disk blocks were found already in the buffer cache, so that a read was not necessary (this only includes hits in the Postgres buffer cache, not the operating system's file system cache) | -| `cnp_pg_stat_database_tup_returned` | COUNTER | Number of rows returned by queries in this database | -| `cnp_pg_stat_database_tup_fetched` | COUNTER | Number of rows fetched by queries in this database | -| `cnp_pg_stat_database_tup_inserted` | COUNTER | Number of rows inserted by queries in this database | -| `cnp_pg_stat_database_tup_updated` | COUNTER | Number of rows updated by queries in this database | -| `cnp_pg_stat_database_tup_deleted` | COUNTER | Number of rows deleted by queries in this database | -| `cnp_pg_stat_database_conflicts` | COUNTER | Number of queries canceled due to conflicts with recovery in this database | -| `cnp_pg_stat_database_temp_files` | COUNTER | Number of temporary files created by queries in this database | -| `cnp_pg_stat_database_temp_bytes` | COUNTER | Total amount of data written to temporary files by queries in this database | -| `cnp_pg_stat_database_deadlocks` | COUNTER | Number of deadlocks detected in this database | -| `cnp_pg_stat_database_blk_read_time` | COUNTER | Time spent reading data file blocks by backends in this database, in milliseconds | -| `cnp_pg_stat_database_blk_write_time` | COUNTER | Time spent writing data file blocks by backends in this database, in milliseconds | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_user_tables_seq_scan` | COUNTER | Number of sequential scans initiated on this table | +| `cnp_pg_stat_user_tables_seq_tup_read` | COUNTER | Number of live rows fetched by sequential scans | +| `cnp_pg_stat_user_tables_idx_scan` | COUNTER | Number of index scans initiated on this table | +| `cnp_pg_stat_user_tables_idx_tup_fetch` | COUNTER | Number of live rows fetched by index scans | +| `cnp_pg_stat_user_tables_n_tup_ins` | COUNTER | Number of rows inserted | +| `cnp_pg_stat_user_tables_n_tup_upd` | COUNTER | Number of rows updated | +| `cnp_pg_stat_user_tables_n_tup_del` | COUNTER | Number of rows deleted | +| `cnp_pg_stat_user_tables_n_tup_hot_upd` | COUNTER | Number of rows HOT updated (i.e., with no separate index update required) | +| `cnp_pg_stat_user_tables_n_live_tup` | GAUGE | Estimated number of live rows | +| `cnp_pg_stat_user_tables_n_dead_tup` | GAUGE | Estimated number of dead rows | +| `cnp_pg_stat_user_tables_n_mod_since_analyze` | GAUGE | Estimated number of rows changed since last analyze | +| `cnp_pg_stat_user_tables_last_vacuum` | GAUGE | Last time at which this table was manually vacuumed (not counting VACUUM FULL) | +| `cnp_pg_stat_user_tables_last_autovacuum` | GAUGE | Last time at which this table was vacuumed by the autovacuum daemon | +| `cnp_pg_stat_user_tables_last_analyze` | GAUGE | Last time at which this table was manually analyzed | +| `cnp_pg_stat_user_tables_last_autoanalyze` | GAUGE | Last time at which this table was analyzed by the autovacuum daemon | +| `cnp_pg_stat_user_tables_vacuum_count` | COUNTER | Number of times this table has been manually vacuumed (not counting VACUUM FULL) | +| `cnp_pg_stat_user_tables_autovacuum_count` | COUNTER | Number of times this table has been vacuumed by the autovacuum daemon | +| `cnp_pg_stat_user_tables_analyze_count` | COUNTER | Number of times this table has been manually analyzed | +| `cnp_pg_stat_user_tables_autoanalyze_count` | COUNTER | Number of times this table has been analyzed by the autovacuum daemon | + ##### Labels -This group of metrics can have these labels: +The above metrics may have these labels, represented +as dimensions in Azure Monitor: -| Label | Description | -| --------- | --------------------- | -| `datname` | Name of this database | +| Label | Description | +|-------|-------------| +| `datname` | Name of current database | +| `schemaname` | Name of the schema that this table is in | +| `relname` | Name of this table | -#### Group `cnp_pg_stat_database_conflicts` +#### Group `cnp_pg_stat_replication` + + + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_replication_backend_start_age` | GAUGE | How long ago in seconds this process was started | +| `cnp_pg_stat_replication_backend_xmin_age` | COUNTER | The age of this standby's xmin horizon | +| `cnp_pg_stat_replication_sent_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location sent on this connection | +| `cnp_pg_stat_replication_write_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location written to disk by this standby server | +| `cnp_pg_stat_replication_flush_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location flushed to disk by this standby server | +| `cnp_pg_stat_replication_replay_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location replayed into the database on this standby server | +| `cnp_pg_stat_replication_write_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written it | +| `cnp_pg_stat_replication_flush_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written and flushed it | +| `cnp_pg_stat_replication_replay_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it | + + +##### Labels -These metrics provide information on conflicts between queries on a standby replica -and the standby replica's replay of the change-stream from the primary. These are -called recovery conflicts. +The above metrics may have these labels, represented +as dimensions in Azure Monitor: -These metrics are unrelated to "INSERT ... ON CONFLICT" conflicts or -multi-master replication row conflicts. They are relevant only on standby -replicas. +| Label | Description | +|-------|-------------| +| `usename` | Name of the replication user | +| `application_name` | Name of the application | -Metrics in this section are reset when a postgres stats reset is issued -on the db server. +#### Group `cnp_pg_statio_user_tables` -Only defined on standby replicas. -Derived from the `pg_stat_database_conflicts` view. ##### Metrics -| Metric | Usage | Description | -| ------------------------------------------------- | ------- | -------------------------------------------------------------------------------- | -| `cnp_pg_stat_database_conflicts_confl_tablespace` | COUNTER | Number of queries in this database that were canceled due to dropped tablespaces | -| `cnp_pg_stat_database_conflicts_confl_lock` | COUNTER | Number of queries in this database that were canceled due to lock timeouts | -| `cnp_pg_stat_database_conflicts_confl_snapshot` | COUNTER | Number of queries in this database were canceled due to old snapshots | -| `cnp_pg_stat_database_conflicts_confl_bufferpin` | COUNTER | Number of queries in this database that were canceled due to pinned buffers | -| `cnp_pg_stat_database_conflicts_confl_deadlock` | COUNTER | Number of queries in this database that were canceled due to deadlocks | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_statio_user_tables_heap_blks_read` | COUNTER | Number of disk blocks read from this table | +| `cnp_pg_statio_user_tables_heap_blks_hit` | COUNTER | Number of buffer hits in this table | +| `cnp_pg_statio_user_tables_idx_blks_read` | COUNTER | Number of disk blocks read from all indexes on this table | +| `cnp_pg_statio_user_tables_idx_blks_hit` | COUNTER | Number of buffer hits in all indexes on this table | +| `cnp_pg_statio_user_tables_toast_blks_read` | COUNTER | Number of disk blocks read from this table's TOAST table (if any) | +| `cnp_pg_statio_user_tables_toast_blks_hit` | COUNTER | Number of buffer hits in this table's TOAST table (if any) | +| `cnp_pg_statio_user_tables_tidx_blks_read` | COUNTER | Number of disk blocks read from this table's TOAST table indexes (if any) | +| `cnp_pg_statio_user_tables_tidx_blks_hit` | COUNTER | Number of buffer hits in this table's TOAST table indexes (if any) | + ##### Labels -This group of metrics can have these labels: +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of current database | +| `schemaname` | Name of the schema that this table is in | +| `relname` | Name of this table | + +#### Group `cnp_pg_settings` -| Label | Description | -| --------- | -------------------- | -| `datname` | Name of the database | -#### Group `cnp_pg_stat_user_tables` -Access and usage statistics maintained by postgres on nonsystem tables. +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_settings_setting` | GAUGE | Setting value. Note that settings are only reported when they were changed via Cloud Native PostgreSQL. | -Metrics in this section are reset when a postgres stats reset is issued -on the db server. -Derived from the `pg_stat_user_tables` view. +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `name` | Name of the setting | + +#### Group `cnp_xlog_insert` + -See also `cnp_pg_statio_user_tables`. ##### Metrics -| Metric | Usage | Description | -| --------------------------------------------- | ------- | --------------------------------------------------------------------------- | -| `cnp_pg_stat_user_tables_seq_scan` | COUNTER | Number of sequential scans initiated on this table | -| `cnp_pg_stat_user_tables_seq_tup_read` | COUNTER | Number of live rows fetched by sequential scans | -| `cnp_pg_stat_user_tables_idx_scan` | COUNTER | Number of index scans initiated on this table | -| `cnp_pg_stat_user_tables_idx_tup_fetch` | COUNTER | Number of live rows fetched by index scans | -| `cnp_pg_stat_user_tables_n_tup_ins` | COUNTER | Number of rows inserted | -| `cnp_pg_stat_user_tables_n_tup_upd` | COUNTER | Number of rows updated | -| `cnp_pg_stat_user_tables_n_tup_del` | COUNTER | Number of rows deleted | -| `cnp_pg_stat_user_tables_n_tup_hot_upd` | COUNTER | Number of rows HOT updated (i.e., with no separate index update required) | -| `cnp_pg_stat_user_tables_n_live_tup` | GAUGE | Estimated number of live rows | -| `cnp_pg_stat_user_tables_n_dead_tup` | GAUGE | Estimated number of dead rows | -| `cnp_pg_stat_user_tables_n_mod_since_analyze` | GAUGE | Estimated number of rows changed since last analyze | -| `cnp_pg_stat_user_tables_last_vacuum` | GAUGE | Last time when this table was manually vacuumed (not counting VACUUM FULL) | -| `cnp_pg_stat_user_tables_last_autovacuum` | GAUGE | Last time when this table was vacuumed by the autovacuum daemon | -| `cnp_pg_stat_user_tables_last_analyze` | GAUGE | Last time when this table was manually analyzed | -| `cnp_pg_stat_user_tables_last_autoanalyze` | GAUGE | Last time when this table was analyzed by the autovacuum daemon | -| `cnp_pg_stat_user_tables_vacuum_count` | COUNTER | Number of times this table was manually vacuumed (not counting VACUUM FULL) | -| `cnp_pg_stat_user_tables_autovacuum_count` | COUNTER | Number of times this table was vacuumed by the autovacuum daemon | -| `cnp_pg_stat_user_tables_analyze_count` | COUNTER | Number of times this table was manually analyzed | -| `cnp_pg_stat_user_tables_autoanalyze_count` | COUNTER | Number of times this table was analyzed by the autovacuum daemon | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_xlog_insert_lsn` | GAUGE | Node xlog insert position (lsn) | -##### Labels +#### Group `cnp_bdr_raft_mon` -This group of metrics can have these labels: -| Label | Description | -| ------------ | ---------------------------------------- | -| `datname` | Name of current database | -| `schemaname` | Name of the schema that this table is in | -| `relname` | Name of this table | -#### Group `cnp_pg_stat_replication` +##### Metrics -Realtime information about replication connections to this postgres instance, -their progress and activity. +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_bdr_raft_mon_raftstatus` | GAUGE | Raft health status; 0 for unhealthy, 1 for healthy | -Metrics in this section are not reset when a postgres stats reset is issued -on the db server. The "stat" in the name is a historic artifact from postgres -development. +#### Group `cnp_bdr_lag_mon` -Derived from the `pg_stat_replication` view. -See also `cnp_pg_replication_slots`, `cnp_pg_replication`. ##### Metrics -| Metric | Usage | Description | -| -------------------------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------- | -| `cnp_pg_stat_replication_backend_start` | COUNTER | Time when this process started | -| `cnp_pg_stat_replication_backend_xmin_age` | COUNTER | The age of this standby replica's xmin horizon | -| `cnp_pg_stat_replication_sent_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location sent on this connection | -| `cnp_pg_stat_replication_write_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location written to disk by this standby replica | -| `cnp_pg_stat_replication_flush_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location flushed to disk by this standby replica | -| `cnp_pg_stat_replication_replay_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location replayed into the database on this standby replica | -| `cnp_pg_stat_replication_write_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby replica wrote it | -| `cnp_pg_stat_replication_flush_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby replica wrote and flushed it | -| `cnp_pg_stat_replication_replay_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby replica wrote, flushed, and applied it | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_bdr_lag_mon_sent_lag` | GAUGE | node slot sent lag | +| `cnp_bdr_lag_mon_write_lag` | GAUGE | node slot write lag | +| `cnp_bdr_lag_mon_flush_lag` | GAUGE | node slot flush lag | +| `cnp_bdr_lag_mon_replay_lag` | GAUGE | node slot replay lag | -##### Labels +#### Group `cnp_bdr_rep_slot_stats` -This group of metrics can have these labels: -| Label | Description | -| ------------------ | ---------------------------- | -| `usename` | Name of the replication user | -| `application_name` | Name of the application | -| `client_addr` | Client IP address | -#### Group `cnp_pg_statio_user_tables` +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_bdr_rep_slot_stats_spill_txns` | COUNTER | spill\_txns | +| `cnp_bdr_rep_slot_stats_spill_count` | COUNTER | spill\_count | +| `cnp_bdr_rep_slot_stats_spill_bytes` | COUNTER | spill\_bytes | +| `cnp_bdr_rep_slot_stats_stream_txns` | COUNTER | stream\_txns | +| `cnp_bdr_rep_slot_stats_stream_count` | COUNTER | stream\_count | +| `cnp_bdr_rep_slot_stats_stream_bytes` | COUNTER | stream\_bytes | +| `cnp_bdr_rep_slot_stats_total_txns` | COUNTER | total\_txns | +| `cnp_bdr_rep_slot_stats_total_bytes` | COUNTER | total\_bytes | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: -I/O activity statistics maintained by postgres on nonsystem tables. +| Label | Description | +|-------|-------------| +| `peer_name` | peer\_name | +| `slot_name` | slot\_name | -Metrics in this section are reset when a postgres stats reset is issued -on the db server. +#### Group `cnp_bdr_rep_lag` -Derived from the `pg_statio_user_tables` view. -See also `cnp_pg_stat_user_tables`. ##### Metrics -| Metric | Usage | Description | -| ------------------------------------------- | ------- | ------------------------------------------------------------------------- | -| `cnp_pg_statio_user_tables_heap_blks_read` | COUNTER | Number of disk blocks read from this table | -| `cnp_pg_statio_user_tables_heap_blks_hit` | COUNTER | Number of buffer hits in this table | -| `cnp_pg_statio_user_tables_idx_blks_read` | COUNTER | Number of disk blocks read from all indexes on this table | -| `cnp_pg_statio_user_tables_idx_blks_hit` | COUNTER | Number of buffer hits in all indexes on this table | -| `cnp_pg_statio_user_tables_toast_blks_read` | COUNTER | Number of disk blocks read from this table's TOAST table (if any) | -| `cnp_pg_statio_user_tables_toast_blks_hit` | COUNTER | Number of buffer hits in this table's TOAST table (if any) | -| `cnp_pg_statio_user_tables_tidx_blks_read` | COUNTER | Number of disk blocks read from this table's TOAST table indexes (if any) | -| `cnp_pg_statio_user_tables_tidx_blks_hit` | COUNTER | Number of buffer hits in this table's TOAST table indexes (if any) | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_bdr_rep_lag_replay_lag_s` | GAUGE | replay\_lag\_s | +| `cnp_bdr_rep_lag_replay_lag_bytes` | GAUGE | replay\_lag\_bytes | +| `cnp_bdr_rep_lag_apply_rate` | GAUGE | apply\_rate | +| `cnp_bdr_rep_lag_catchup_interval_s` | GAUGE | catchup\_interval\_s | + ##### Labels -This group of metrics can have these labels: +The above metrics may have these labels, represented +as dimensions in Azure Monitor: -| Label | Description | -| ------------ | ---------------------------------------- | -| `datname` | Name of current database | -| `schemaname` | Name of the schema that this table is in | -| `relname` | Name of this table | +| Label | Description | +|-------|-------------| +| `peer_name` | peer\_name | -#### Group `cnp_pg_settings` +#### Group `cnp_bdr_node_slots` -Expose the subset of postgres server settings that can be represented as -Prometheus compatible metrics–any integer, boolean, or real number. -Text-format settings, list-valued settings, and enumeration-typed settings are -not captured or reported. -This set of metrics does not expose per-database settings assigned with -`ALTER DATABASE ... SET ...`, per-user settings assigned with `ALTER USER ... -SET ...`, or per-session values. It shows only the database-system-wide -global values. You can explore other settings interactively using postgres -system views. -Derived from the `pg_settings` view. +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_bdr_node_slots_active_pid` | GAUGE | active\_pid | +| `cnp_bdr_node_slots_xmin_age` | GAUGE | xmin age | +| `cnp_bdr_node_slots_catalog_xmin_age` | GAUGE | catalog\_xmin age | +| `cnp_bdr_node_slots_restart_lsn_age` | GAUGE | restart\_lsn age | +| `cnp_bdr_node_slots_confirmed_flush_lsn_age` | GAUGE | confirmed\_flush\_lsn age | +| `cnp_bdr_node_slots_flush_lag_bytes` | GAUGE | flush\_lag in bytes | +| `cnp_bdr_node_slots_replay_lag_bytes` | GAUGE | replay\_lag in bytes | +| `cnp_bdr_node_slots_slot_state` | MAPPEDMETRIC | slot\_state | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `peer_name` | peer\_name | +| `slot_name` | slot\_name | + +#### Group `cnp_bdr_global_locking` + + ##### Metrics -| Metric | Usage | Description | -| ------------------------- | ----- | ------------- | -| `cnp_pg_settings_setting` | GAUGE | Setting value | +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_bdr_global_locking_since_locally_requested_s` | GAUGE | since\_locally\_requested\_s | +| `cnp_bdr_global_locking_since_local_granted_s` | GAUGE | since\_local\_granted\_s | + ##### Labels -This group of metrics can have these labels: +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `lock_type` | lock\_type | + -| Label | Description | -| ------ | ------------------- | -| `name` | Name of the setting | [comment2]: # "End generated content" From 6f9ff6cbd36a97e12410927d2ad162397d454141 Mon Sep 17 00:00:00 2001 From: Gregory Bulloch Date: Tue, 13 Sep 2022 16:28:53 +1000 Subject: [PATCH 02/13] chore(UPM-10477): Added section descriptions --- .../metrics/index.mdx | 215 +++++++++++------- 1 file changed, 138 insertions(+), 77 deletions(-) diff --git a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx index b76266e1888..f741a056993 100644 --- a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx +++ b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx @@ -1,44 +1,16 @@ ---- -title: "Metrics details" -redirects: -- /biganimal/latest/using_cluster/05_monitoring_and_logging/06_metrics ---- - -BigAnimal collects a wide set of metrics about Postgres instances and makes them available -in your Cloud Provider. Most of these metrics are acquired directly from Postgres system tables, -views, and functions. The Postgres documentation serves as the main reference for these metrics. - -Some data from Postgres monitoring system views, tables, and functions are -transformed to be easier to consume in Prometheus metrics format. For example, -timestamp fields are generally converted to Unix epoch time and can be accompanied -by a relative time-interval metric. Other metrics are aggregated into -categories by label dimensions to limit the number of very specific and -narrowly scoped individual metrics emitted. It is not very useful to -report the inactivity period of every single backend, for example, so backend -statistics are aggregated by database, user, `application_name`, and backend -state. - -Prometheus [labels](https://prometheus.io/docs/practices/naming/#labels) -are included in the $.Message.labels JSON object. -Dimensions vary depending on the individual metric and are documented -separately for each group of related metrics. - -The available set of metrics is subject to change. Metrics might be added, -removed or renamed. Where feasible, an effort will be made not to change the -meaning or type of existing metrics without also changing the metric name. - - -[comment1]: # "Generated content see https://github.com/EnterpriseDB/starlight-scripts/blob/main/docs/metrics_to_markdown_txt.py" - #### Group `cnp_backends` +Backend counts from `pg_stat_activity` aggregated by the listed label +dimensions. Useful for identifying busy applications, excessive idle +backends, etc. +Derived from the `pg_stat_activity` view. ##### Metrics | Metric | Usage | Description | |----------|-------|-------------| -| `cnp_backends_state` | MAPPEDMETRIC | State of the backend (pg\_stat\_activity.state) mapped to integer enum | +| `cnp_backends_state` | GAUGE | State of the backend (pg\_stat\_activity.state) mapped to integer enum. active = 1, idle = 2, idle in transaction = 3, idle in transaction (aborted) = 4, fastpath function call = 5, disabled = 6, and -1 = other/unrecognised | | `cnp_backends_n_backends` | GAUGE | Number of backends in this group | | `cnp_backends_max_tx_duration_seconds` | GAUGE | Maximum duration of a transaction in seconds in this group | | `cnp_backends_max_backend_xmin_age` | GAUGE | Maximum duration of a transaction in seconds in this group | @@ -57,7 +29,11 @@ as dimensions in Azure Monitor: #### Group `cnp_backends_waiting` +Postgres-instance-level aggregate information on backends that are blocked +waiting for locks. Does not count I/O waits or other reasons backends might +wait or be blocked. +Derived from the `pg_locks` view. ##### Metrics @@ -67,7 +43,12 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_database` +Per-database metrics for each database in the postgres instance. +Includes per-database vacuum progress information. + +Derived from the `pg_database` catalog. +See also `cnp_pg_stat_database`. ##### Metrics @@ -89,7 +70,9 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_postmaster` +Data on the postgres instance's managing "postmaster" process. +Derived from the `pg_postmaster_start_time()` function. ##### Metrics @@ -99,7 +82,14 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_replication` +Physical replication details for a standby postgres instance +as captured from the standby itself. + +Derived from the `pg_last_xact_replay_timestamp()` function. +Only relevant on standby servers. + +See also `cnp_pg_stat_replication`, `cnp_pg_replication_slots`. ##### Metrics @@ -110,7 +100,17 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_replication_slots` +Details about replication slots on a postgres instance. In most +configurations only the primary server will have active replication clients, +but other nodes may still have replication slots. + +Note that logical replication slots are specific to a database, whereas +physical replication slots will have an empty "database" label as they +apply to the postgres instance as a whole. + +Derived from the `pg_replication_slots` view. +See also `cnp_pg_stat_replication`, `cnp_pg_replication`. ##### Metrics @@ -132,7 +132,21 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_stat_archiver` +Progress information about WAL archiving. Only the currently active primary +server will generally be performing WAL archiving. +WAL archiving is important for backup and restore. If WAL archiving is +delayed or failing for too long, the point-in-time recovery backups for +a postgres cluster will not be up to date. This has disaster recovery +implications and can potentially also affect failover. + +Occasional WAL archiving failures are normal, but a growing delay in the time +since the last successful WAL archiving operation should be taken seriously. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_stat_archiver` view. ##### Metrics @@ -150,7 +164,21 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_stat_bgwriter` +Stats for the postgres background writer and checkpointer processes, which +are instance-wide and shared across all databases in a postgres instance. + +Very long delays between checkpoints on a busy system will increase the time +taken for it to return to read/write availability if crash recovery is +required. Excessively frequent checkpoints can increase I/O load and the size +of the WAL stream for backup and replication. + +The postgres documentation discusses checkpoints, dirty writeback, and +checkpoint tuning in detail. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. +Derived from the `pg_stat_bgwriter` catalog. ##### Metrics @@ -169,7 +197,16 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_stat_database` +This metrics group directly exposes the summary data postgres collects in its +own `pg_stat_database` view. It contains statistical counters maintained by +postgres itself for database activity. +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_stat_database` catalog. + +See also `cnp_pg_database`. ##### Metrics @@ -203,7 +240,20 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_stat_database_conflicts` +These metrics provide information on conflicts between queries on a standby +and the standby's replay of the change-stream from the primary. These are +called recovery conflicts. + +These metrics are unrelated to "INSERT ... ON CONFLICT" conflicts, or +multi-master replication row conflicts. They are only relevant on standby +servers. +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Only defined on standby servers. + +Derived from the `pg_stat_database_conflicts` view. ##### Metrics @@ -227,7 +277,14 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_stat_user_tables` +Access and usage statistics maintained by postgres on non-system tables. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. +Derived from the `pg_stat_user_tables` view. + +See also `cnp_pg_statio_user_tables`. ##### Metrics @@ -267,7 +324,16 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_stat_replication` +Realtime information about replication connections to this postgres instance, +their progress and activity. + +Metrics in this section are not reset when a postgres stats reset is issued +on the db server. The "stat" in the name is a historic artefact from postgres +development. + +Derived from the `pg_stat_replication` view. +See also `cnp_pg_replication_slots`, `cnp_pg_replication`. ##### Metrics @@ -296,7 +362,14 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_statio_user_tables` +I/O activity statistics maintained by postgres on non-system tables. +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_statio_user_tables` view. + +See also `cnp_pg_stat_user_tables`. ##### Metrics @@ -325,7 +398,18 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_settings` +Expose the subset of postgres server settings that can be represented as +Prometheus compatible metrics - any integer, boolean or real number. +Text-format settings, list-valued settings and enumeration-typed settings are +not captured or reported. + +This set of metrics does not expose per-database settings assigned with +`ALTER DATABASE ... SET ...`, per-user settings assigned with `ALTER USER ... +SET ...`, or per-session values. It only shows the database-system-wide +global values. You can explore other settings interactively using postgres +system views. +Derived from the `pg_settings` view. ##### Metrics @@ -345,7 +429,9 @@ as dimensions in Azure Monitor: #### Group `cnp_xlog_insert` - +Reports the postgres instance's transaction log insert position in bytes. +Useful to compare one postgres instance's WAL insert position with other +instances' replication replay positions in monitoring. ##### Metrics @@ -355,7 +441,7 @@ as dimensions in Azure Monitor: #### Group `cnp_bdr_raft_mon` - +Expose the raft status per CNP node of a BDR cluster ##### Metrics @@ -363,22 +449,13 @@ as dimensions in Azure Monitor: |----------|-------|-------------| | `cnp_bdr_raft_mon_raftstatus` | GAUGE | Raft health status; 0 for unhealthy, 1 for healthy | -#### Group `cnp_bdr_lag_mon` - - - -##### Metrics - -| Metric | Usage | Description | -|----------|-------|-------------| -| `cnp_bdr_lag_mon_sent_lag` | GAUGE | node slot sent lag | -| `cnp_bdr_lag_mon_write_lag` | GAUGE | node slot write lag | -| `cnp_bdr_lag_mon_flush_lag` | GAUGE | node slot flush lag | -| `cnp_bdr_lag_mon_replay_lag` | GAUGE | node slot replay lag | - #### Group `cnp_bdr_rep_slot_stats` - +Metrics from pg_catalog.pg_stat_replication_slots for each BDR replication slot. +These metrics can be used to monitor logical decoding activity and performance the +sending (upstream) side of a logical replication connection. +See https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-SLOTS-VIEW +for details. ##### Metrics @@ -406,7 +483,10 @@ as dimensions in Azure Monitor: #### Group `cnp_bdr_rep_lag` - +Metrics based on the bdr.node_replication_rates monitoring catalog for monitoring +BDR replication performance and repliation lag. See +https://www.enterprisedb.com/docs/pgd/latest/monitoring/#monitoring-outgoing-replication +and https://www.enterprisedb.com/docs/pgd/latest/bdr/catalogs/#bdrnode_replication_rates ##### Metrics @@ -429,7 +509,9 @@ as dimensions in Azure Monitor: #### Group `cnp_bdr_node_slots` - +Metrics derived from the bdr.node_slots view. These metrics provide lower level insight +into the progress of outbound BDR replication, including transaction ID limits and +WAL retention and the connection status of replication sessions. ##### Metrics @@ -442,7 +524,7 @@ as dimensions in Azure Monitor: | `cnp_bdr_node_slots_confirmed_flush_lsn_age` | GAUGE | confirmed\_flush\_lsn age | | `cnp_bdr_node_slots_flush_lag_bytes` | GAUGE | flush\_lag in bytes | | `cnp_bdr_node_slots_replay_lag_bytes` | GAUGE | replay\_lag in bytes | -| `cnp_bdr_node_slots_slot_state` | MAPPEDMETRIC | slot\_state | +| `cnp_bdr_node_slots_slot_state` | GAUGE | slot\_state enumeration. disconnected = 0, streaming = 1, catchup = 2, unknown/unrecognised -1 | ##### Labels @@ -457,7 +539,10 @@ as dimensions in Azure Monitor: #### Group `cnp_bdr_global_locking` - +metrics for bdr global lock acquire and hold durations for both DDL and DML lock types. +Useful for detection of long global lock waits or frequent global locks that may impact performance. +These metrics are not fine grained and do not expose information about individual tables, etc. +Details are available in the bdr.global_locks view. ##### Metrics @@ -475,27 +560,3 @@ as dimensions in Azure Monitor: | Label | Description | |-------|-------------| | `lock_type` | lock\_type | - - - -[comment2]: # "End generated content" - - -### Other metrics streams - -In addition to postgres metrics from the Cloud Native PostgreSQL operator that -manages databases in BigAnimal, you can stream additional metrics about Kubernetes cluster -state and other details to your Cloud Platform. Any such metrics -are generally well-known metrics from widely used tools, documented by the -upstream vendor of the component. - -Details on individual metrics from such sources are not listed in this -document. Refer to the documentation of the tool or project that defines the -metrics. - -See also: - -- [Kubernetes cluster metrics](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/). - -The cloud platform can supply additional streams of metrics -directly to your metrics, analytics, and dashboarding endpoint. From be76dce3579f883edabd1f4840d38ed8ca47cd7d Mon Sep 17 00:00:00 2001 From: Gregory Bulloch Date: Tue, 13 Sep 2022 16:45:50 +1000 Subject: [PATCH 03/13] chore(UPM-10477): Added section descriptions --- .../metrics/index.mdx | 54 +++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx index f741a056993..ccf326f325b 100644 --- a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx +++ b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx @@ -1,3 +1,35 @@ +--- +title: "Metrics details" +redirects: +- /biganimal/latest/using_cluster/05_monitoring_and_logging/06_metrics +--- + +BigAnimal collects a wide set of metrics about Postgres instances and makes them available +in your Cloud Provider. Most of these metrics are acquired directly from Postgres system tables, +views, and functions. The Postgres documentation serves as the main reference for these metrics. + +Some data from Postgres monitoring system views, tables, and functions are +transformed to be easier to consume in Prometheus metrics format. For example, +timestamp fields are generally converted to Unix epoch time and can be accompanied +by a relative time-interval metric. Other metrics are aggregated into +categories by label dimensions to limit the number of very specific and +narrowly scoped individual metrics emitted. It is not very useful to +report the inactivity period of every single backend, for example, so backend +statistics are aggregated by database, user, `application_name`, and backend +state. + +Prometheus [labels](https://prometheus.io/docs/practices/naming/#labels) +are included in the $.Message.labels JSON object. +Dimensions vary depending on the individual metric and are documented +separately for each group of related metrics. + +The available set of metrics is subject to change. Metrics might be added, +removed or renamed. Where feasible, an effort will be made not to change the +meaning or type of existing metrics without also changing the metric name. + + +[comment1]: # "Generated content see upm-substrate repo config monitoring dir" + #### Group `cnp_backends` Backend counts from `pg_stat_activity` aggregated by the listed label @@ -560,3 +592,25 @@ as dimensions in Azure Monitor: | Label | Description | |-------|-------------| | `lock_type` | lock\_type | + +[comment2]: # "End generated content" + + +### Other metrics streams + +In addition to postgres metrics from the Cloud Native PostgreSQL operator that +manages databases in BigAnimal, you can stream additional metrics about Kubernetes cluster +state and other details to your Cloud Platform. Any such metrics +are generally well-known metrics from widely used tools, documented by the +upstream vendor of the component. + +Details on individual metrics from such sources are not listed in this +document. Refer to the documentation of the tool or project that defines the +metrics. + +See also: + +- [Kubernetes cluster metrics](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/). + +The cloud platform can supply additional streams of metrics +directly to your metrics, analytics, and dashboarding endpoint. From 04aa5294942517c05b01853c2fc96533aaa39929 Mon Sep 17 00:00:00 2001 From: Gregory Bulloch Date: Wed, 14 Sep 2022 09:32:29 +1000 Subject: [PATCH 04/13] chore(UPM-10477): Added section descriptions --- .../metrics/index.mdx | 46 +++++++++---------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx index ccf326f325b..0dbfaf03782 100644 --- a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx +++ b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx @@ -28,7 +28,7 @@ removed or renamed. Where feasible, an effort will be made not to change the meaning or type of existing metrics without also changing the metric name. -[comment1]: # "Generated content see upm-substrate repo config monitoring dir" +[comment1]: # "Generated content using script https://github.com/EnterpriseDB/starlight-scripts/blob/main/docs/metrics_to_markdown_txt.py" #### Group `cnp_backends` @@ -114,12 +114,12 @@ Derived from the `pg_postmaster_start_time()` function. #### Group `cnp_pg_replication` -Physical replication details for a standby postgres instance -as captured from the standby itself. +Physical replication details for a standby replica postgres instance +as captured from the standby replica. Derived from the `pg_last_xact_replay_timestamp()` function. -Only relevant on standby servers. +Relevant only on standby replicas. See also `cnp_pg_stat_replication`, `cnp_pg_replication_slots`. @@ -133,11 +133,11 @@ See also `cnp_pg_stat_replication`, `cnp_pg_replication_slots`. #### Group `cnp_pg_replication_slots` Details about replication slots on a postgres instance. In most -configurations only the primary server will have active replication clients, -but other nodes may still have replication slots. +configurations, only the primary server has active replication clients, +but other nodes can still have replication slots. -Note that logical replication slots are specific to a database, whereas -physical replication slots will have an empty "database" label as they +Logical replication slots are specific to a database, whereas +physical replication slots have an empty "database" label as they apply to the postgres instance as a whole. Derived from the `pg_replication_slots` view. @@ -165,11 +165,11 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_stat_archiver` Progress information about WAL archiving. Only the currently active primary -server will generally be performing WAL archiving. +server generally performs WAL archiving. WAL archiving is important for backup and restore. If WAL archiving is delayed or failing for too long, the point-in-time recovery backups for -a postgres cluster will not be up to date. This has disaster recovery +a postgres cluster will not be up to date. This condition has disaster recovery implications and can potentially also affect failover. Occasional WAL archiving failures are normal, but a growing delay in the time @@ -199,7 +199,7 @@ Derived from the `pg_stat_archiver` view. Stats for the postgres background writer and checkpointer processes, which are instance-wide and shared across all databases in a postgres instance. -Very long delays between checkpoints on a busy system will increase the time +Very long delays between checkpoints on a busy system increase the time taken for it to return to read/write availability if crash recovery is required. Excessively frequent checkpoints can increase I/O load and the size of the WAL stream for backup and replication. @@ -231,7 +231,7 @@ Derived from the `pg_stat_bgwriter` catalog. This metrics group directly exposes the summary data postgres collects in its own `pg_stat_database` view. It contains statistical counters maintained by -postgres itself for database activity. +postgres for database activity. Metrics in this section are reset when a postgres stats reset is issued on the db server. @@ -272,18 +272,18 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_stat_database_conflicts` -These metrics provide information on conflicts between queries on a standby -and the standby's replay of the change-stream from the primary. These are +These metrics provide information on conflicts between queries on a standby replica +and the standby replica's replay of the change-stream from the primary. These are called recovery conflicts. -These metrics are unrelated to "INSERT ... ON CONFLICT" conflicts, or -multi-master replication row conflicts. They are only relevant on standby -servers. +These metrics are unrelated to "INSERT ... ON CONFLICT" conflicts or +multi-master replication row conflicts. They are relevant only on standby +replicas. Metrics in this section are reset when a postgres stats reset is issued on the db server. -Only defined on standby servers. +Only defined on standby replicas. Derived from the `pg_stat_database_conflicts` view. @@ -309,7 +309,7 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_stat_user_tables` -Access and usage statistics maintained by postgres on non-system tables. +Access and usage statistics maintained by postgres on nonsystem tables. Metrics in this section are reset when a postgres stats reset is issued on the db server. @@ -394,7 +394,7 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_statio_user_tables` -I/O activity statistics maintained by postgres on non-system tables. +I/O activity statistics maintained by postgres on nonsystem tables. Metrics in this section are reset when a postgres stats reset is issued on the db server. @@ -431,13 +431,13 @@ as dimensions in Azure Monitor: #### Group `cnp_pg_settings` Expose the subset of postgres server settings that can be represented as -Prometheus compatible metrics - any integer, boolean or real number. -Text-format settings, list-valued settings and enumeration-typed settings are +Prometheus compatible metrics–any integer, boolean, or real number. +Text-format settings, list-valued settings, and enumeration-typed settings are not captured or reported. This set of metrics does not expose per-database settings assigned with `ALTER DATABASE ... SET ...`, per-user settings assigned with `ALTER USER ... -SET ...`, or per-session values. It only shows the database-system-wide +SET ...`, or per-session values. It shows only the database-system-wide global values. You can explore other settings interactively using postgres system views. From 8649e2841935b6c102f569d6bcb01fe9676fb1e2 Mon Sep 17 00:00:00 2001 From: Josh Heyer <63653723+josh-heyer@users.noreply.github.com> Date: Tue, 13 Sep 2022 22:40:20 -0600 Subject: [PATCH 05/13] A few more relative links that need updated ...due to BDR and HARP being moved out of overview --- product_docs/docs/pgd/4/overview/index.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/product_docs/docs/pgd/4/overview/index.mdx b/product_docs/docs/pgd/4/overview/index.mdx index 35b22dafda7..cb56598b016 100644 --- a/product_docs/docs/pgd/4/overview/index.mdx +++ b/product_docs/docs/pgd/4/overview/index.mdx @@ -21,17 +21,17 @@ Three different Postgres distributions can be used: What Postgres distribution and version is right for you depends on the features you need. See the feature matrix in [Choosing a Postgres distribution](/pgd/latest/choosing_server) for detailed comparison. -## [BDR](bdr) +## [BDR](../bdr) -A Postgres server with the [BDR](bdr) extension installed is referred to as a BDR +A Postgres server with the [BDR](../bdr) extension installed is referred to as a BDR node. BDR nodes can be either data nodes or witness nodes. Witness nodes don't participate in data replication and are only used as a tie-breaker for consensus. -## [HARP](harp) +## [HARP](../harp) -[HARP](harp) is connection management tool for a EDB Postgres Distributed cluster. +[HARP](../harp) is connection management tool for a EDB Postgres Distributed cluster. It leverages consensus-driven quorum to determine the correct connection end-point in a semi-exclusive manner to prevent unintended multi-node writes from an From 64e55e1bc681e6dd8493f60becad9063a00895ce Mon Sep 17 00:00:00 2001 From: Dee Dee Rothery <83650384+drothery-edb@users.noreply.github.com> Date: Wed, 14 Sep 2022 05:43:18 -0400 Subject: [PATCH 06/13] copy edits --- .../05_monitoring_and_logging/metrics/index.mdx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx index 0dbfaf03782..73435709d88 100644 --- a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx +++ b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx @@ -5,7 +5,7 @@ redirects: --- BigAnimal collects a wide set of metrics about Postgres instances and makes them available -in your Cloud Provider. Most of these metrics are acquired directly from Postgres system tables, +in your cloud platform. Most of these metrics are acquired directly from Postgres system tables, views, and functions. The Postgres documentation serves as the main reference for these metrics. Some data from Postgres monitoring system views, tables, and functions are @@ -596,11 +596,11 @@ as dimensions in Azure Monitor: [comment2]: # "End generated content" -### Other metrics streams +## Other metrics streams -In addition to postgres metrics from the Cloud Native PostgreSQL operator that +In addition to Postgres metrics from the Cloud Native PostgreSQL operator that manages databases in BigAnimal, you can stream additional metrics about Kubernetes cluster -state and other details to your Cloud Platform. Any such metrics +state and other details to your cloud platform. Any such metrics are generally well-known metrics from widely used tools, documented by the upstream vendor of the component. @@ -610,7 +610,7 @@ metrics. See also: -- [Kubernetes cluster metrics](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/). +- [Kubernetes cluster metrics](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/) The cloud platform can supply additional streams of metrics directly to your metrics, analytics, and dashboarding endpoint. From b2d484f583e4fb0e642a4a8efcd75bf23370de85 Mon Sep 17 00:00:00 2001 From: nataliawojcik27 <90423028+nataliawojcik27@users.noreply.github.com> Date: Wed, 14 Sep 2022 12:54:25 -0400 Subject: [PATCH 07/13] natalia-mispelling-update removed extra 'a' and misspelling of 'privileges' as 'priviledges' --- .../biganimal/release/using_cluster/01_postgres_access.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/product_docs/docs/biganimal/release/using_cluster/01_postgres_access.mdx b/product_docs/docs/biganimal/release/using_cluster/01_postgres_access.mdx index ae008d67202..00911022afa 100644 --- a/product_docs/docs/biganimal/release/using_cluster/01_postgres_access.mdx +++ b/product_docs/docs/biganimal/release/using_cluster/01_postgres_access.mdx @@ -2,7 +2,7 @@ title: "Managing Postgres access" --- -Don't use the `edb_admin` database role and `edb_admin` database created when creating your cluster in your application. Instead, create a new database role and a new database, which provides a high level of isolation in Postgres. If multiple applications are using the same cluster, each database can also contain multiple schemas, essentially a namespace in the database. If strict isolation is needed, use a dedicated cluster or dedicated database. If that strict isolation level isn't required, a you can deploy a single database with multiple schemas. Refer to [Privileges](https://www.postgresql.org/docs/current/ddl-priv.html) in the PostgreSQL documentation to further customize ownership and roles to your requirements. +Don't use the `edb_admin` database role and `edb_admin` database created when creating your cluster in your application. Instead, create a new database role and a new database, which provides a high level of isolation in Postgres. If multiple applications are using the same cluster, each database can also contain multiple schemas, essentially a namespace in the database. If strict isolation is needed, use a dedicated cluster or dedicated database. If that strict isolation level isn't required, you can deploy a single database with multiple schemas. Refer to [Privileges](https://www.postgresql.org/docs/current/ddl-priv.html) in the PostgreSQL documentation to further customize ownership and roles to your requirements. To create a new role and database, first connect using `psql`: @@ -12,7 +12,7 @@ psql -W "postgres://edb_admin@xxxxxxxxx.xxxxx.biganimal.io:5432/edb_admin?sslmod ## Notes on the edb_admin role -- The `edb_admin` role does not have superuser priviledges by default. Contact [Support](../overview/support) to request superuser priviledges for `edb_admin`. If you request superuser privileges, you **must** take care to limit the number of connections used by superusers to avoid degraded service and/or compromising availability. +- The `edb_admin` role does not have superuser privileges by default. Contact [Support](../overview/support) to request superuser priviledges for `edb_admin`. If you request superuser privileges, you **must** take care to limit the number of connections used by superusers to avoid degraded service and/or compromising availability. - Changes to system configuration (GUCs) made by edb_admin or other Postgres users are not persisted though a reboot or maintenance. Use the BigAnimal portal to modify system configuration. From 372656f6d866c74a22219ed6ba7ceaebc1f01bac Mon Sep 17 00:00:00 2001 From: Josh Heyer Date: Wed, 14 Sep 2022 16:24:52 +0000 Subject: [PATCH 08/13] correct monitoring links --- product_docs/docs/pgd/4/bdr/functions.mdx | 4 ++-- product_docs/docs/pgd/4/bdr/transaction-streaming.mdx | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/product_docs/docs/pgd/4/bdr/functions.mdx b/product_docs/docs/pgd/4/bdr/functions.mdx index 009609fa5e9..cbec090d6ed 100644 --- a/product_docs/docs/pgd/4/bdr/functions.mdx +++ b/product_docs/docs/pgd/4/bdr/functions.mdx @@ -705,7 +705,7 @@ bdr.monitor_group_versions() #### Notes This function returns a record with fields `status` and `message`, -as explained in [Monitoring](../../monitoring/#monitoring-bdr-versions). +as explained in [Monitoring](../monitoring/#monitoring-bdr-versions). This function calls `bdr.run_on_all_nodes()`. @@ -724,7 +724,7 @@ bdr.monitor_group_raft() #### Notes This function returns a record with fields `status` and `message`, -as explained in [Monitoring](../../monitoring/#monitoring-raft-consensus). +as explained in [Monitoring](../monitoring/#monitoring-raft-consensus). This function calls `bdr.run_on_all_nodes()`. diff --git a/product_docs/docs/pgd/4/bdr/transaction-streaming.mdx b/product_docs/docs/pgd/4/bdr/transaction-streaming.mdx index 8e1550748c1..8b637a17832 100644 --- a/product_docs/docs/pgd/4/bdr/transaction-streaming.mdx +++ b/product_docs/docs/pgd/4/bdr/transaction-streaming.mdx @@ -117,7 +117,7 @@ either a writer or a file. The decision is based on several factors: (writer 0 is always reserved for non-streamed transactions.) - If parallel apply is on but all writers are already busy handling streamed transactions, then the new transaction is streamed to a file. See - [bdr.writers](../../monitoring#monitoring-bdr-writers) to check BDR + [bdr.writers](../monitoring#monitoring-bdr-writers) to check BDR writer status. If streaming to a writer is possible (that is, a free writer is available), then the From b10d78546c3a6061d0f6ba187fe70a75743da6ef Mon Sep 17 00:00:00 2001 From: Josh Heyer Date: Wed, 14 Sep 2022 17:44:08 +0000 Subject: [PATCH 09/13] correct cli usage url, compatibility matrix --- product_docs/docs/pgd/4/bdr/index.mdx | 2 +- product_docs/docs/pgd/4/cli/installing_cli.mdx | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/product_docs/docs/pgd/4/bdr/index.mdx b/product_docs/docs/pgd/4/bdr/index.mdx index 1dcce6d955d..ae91a502c04 100644 --- a/product_docs/docs/pgd/4/bdr/index.mdx +++ b/product_docs/docs/pgd/4/bdr/index.mdx @@ -159,7 +159,7 @@ overhead of replication as the cluster grows and minimizing the bandwidth to oth BDR is compatible with Postgres, EDB Postgres Extended Server, and EDB Postgres Advanced Server distributions and can be deployed as a -standard Postgres extension. See [Compatibility matrix](/pgd/latest/compatibility_matrix/) +standard Postgres extension. See [Compatibility matrix](/pgd/latest/#compatibility-matrix) for details of supported version combinations. Some key BDR features depend on certain core diff --git a/product_docs/docs/pgd/4/cli/installing_cli.mdx b/product_docs/docs/pgd/4/cli/installing_cli.mdx index b118caefc67..6d2b46ef940 100644 --- a/product_docs/docs/pgd/4/cli/installing_cli.mdx +++ b/product_docs/docs/pgd/4/cli/installing_cli.mdx @@ -43,5 +43,5 @@ The `pgd-config.yml`, is located in the `/etc/edb` directory, by default. The PG 2. `$HOME/.edb` 3. `.` (working directory) -If you rename the file or move it to another location, specify the new name and location using the optional `-f` or `--config-file` flag when entering a command. See the [sample use case](/pgd_cli/using_cli/#passing-a-database-connection-string-directly-to-a-command). +If you rename the file or move it to another location, specify the new name and location using the optional `-f` or `--config-file` flag when entering a command. See the [sample use case](/pgd/latest/cli/#passing-a-database-connection-string). From 93d0bcf52ea59da04980e640c090e2df4d28b149 Mon Sep 17 00:00:00 2001 From: Gregory Bulloch Date: Thu, 15 Sep 2022 09:22:43 +1000 Subject: [PATCH 10/13] chore(UPM-10477): Updated formating in script as per Dee Dee's suggestions and regenerated doc --- .../metrics/index.mdx | 119 +++++------------- 1 file changed, 33 insertions(+), 86 deletions(-) diff --git a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx index 73435709d88..ca524e122a5 100644 --- a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx +++ b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging/metrics/index.mdx @@ -30,7 +30,7 @@ meaning or type of existing metrics without also changing the metric name. [comment1]: # "Generated content using script https://github.com/EnterpriseDB/starlight-scripts/blob/main/docs/metrics_to_markdown_txt.py" -#### Group `cnp_backends` +## `cnp_backends` Backend counts from `pg_stat_activity` aggregated by the listed label dimensions. Useful for identifying busy applications, excessive idle @@ -38,8 +38,6 @@ backends, etc. Derived from the `pg_stat_activity` view. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_backends_state` | GAUGE | State of the backend (pg\_stat\_activity.state) mapped to integer enum. active = 1, idle = 2, idle in transaction = 3, idle in transaction (aborted) = 4, fastpath function call = 5, disabled = 6, and -1 = other/unrecognised | @@ -48,9 +46,8 @@ Derived from the `pg_stat_activity` view. | `cnp_backends_max_backend_xmin_age` | GAUGE | Maximum duration of a transaction in seconds in this group | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | @@ -59,7 +56,7 @@ as dimensions in Azure Monitor: | `usename` | Name of the user in this group of backends | | `application_name` | Name of the application for this group of backends | -#### Group `cnp_backends_waiting` +## `cnp_backends_waiting` Postgres-instance-level aggregate information on backends that are blocked waiting for locks. Does not count I/O waits or other reasons backends might @@ -67,13 +64,11 @@ wait or be blocked. Derived from the `pg_locks` view. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_backends_waiting_total` | GAUGE | Total number of backends that are currently waiting on other queries | -#### Group `cnp_pg_database` +## `cnp_pg_database` Per-database metrics for each database in the postgres instance. Includes per-database vacuum progress information. @@ -82,8 +77,6 @@ Derived from the `pg_database` catalog. See also `cnp_pg_stat_database`. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_database_size_bytes` | GAUGE | Disk space used by the database | @@ -91,28 +84,25 @@ See also `cnp_pg_stat_database`. | `cnp_pg_database_mxid_age` | GAUGE | Number of multiple transactions (Multixact) from the frozen XID to the current one | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | |-------|-------------| | `datname` | Name of the database | -#### Group `cnp_pg_postmaster` +## `cnp_pg_postmaster` Data on the postgres instance's managing "postmaster" process. Derived from the `pg_postmaster_start_time()` function. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_postmaster_start_time` | GAUGE | Time at which postgres started (based on epoch) | -#### Group `cnp_pg_replication` +## `cnp_pg_replication` Physical replication details for a standby replica postgres instance as captured from the standby replica. @@ -123,14 +113,12 @@ Relevant only on standby replicas. See also `cnp_pg_stat_replication`, `cnp_pg_replication_slots`. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_replication_lag` | GAUGE | Replication lag behind primary in seconds | | `cnp_pg_replication_in_recovery` | GAUGE | Whether the instance is in recovery | -#### Group `cnp_pg_replication_slots` +## `cnp_pg_replication_slots` Details about replication slots on a postgres instance. In most configurations, only the primary server has active replication clients, @@ -144,17 +132,14 @@ Derived from the `pg_replication_slots` view. See also `cnp_pg_stat_replication`, `cnp_pg_replication`. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_replication_slots_active` | GAUGE | Flag indicating if the slot is active | | `cnp_pg_replication_slots_pg_wal_lsn_diff` | GAUGE | Replication lag in bytes | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | @@ -162,7 +147,7 @@ as dimensions in Azure Monitor: | `slot_name` | Name of the replication slot | | `database` | Name of the database | -#### Group `cnp_pg_stat_archiver` +## `cnp_pg_stat_archiver` Progress information about WAL archiving. Only the currently active primary server generally performs WAL archiving. @@ -180,8 +165,6 @@ on the db server. Derived from the `pg_stat_archiver` view. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_stat_archiver_archived_count` | COUNTER | Number of WAL files that have been successfully archived | @@ -194,7 +177,7 @@ Derived from the `pg_stat_archiver` view. | `cnp_pg_stat_archiver_last_failed_wal_start_lsn` | GAUGE | Last failed WAL LSN | | `cnp_pg_stat_archiver_stats_reset_time` | GAUGE | Time at which these statistics were last reset | -#### Group `cnp_pg_stat_bgwriter` +## `cnp_pg_stat_bgwriter` Stats for the postgres background writer and checkpointer processes, which are instance-wide and shared across all databases in a postgres instance. @@ -212,8 +195,6 @@ on the db server. Derived from the `pg_stat_bgwriter` catalog. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_stat_bgwriter_checkpoints_timed` | COUNTER | Number of scheduled checkpoints that have been performed | @@ -227,7 +208,7 @@ Derived from the `pg_stat_bgwriter` catalog. | `cnp_pg_stat_bgwriter_buffers_backend_fsync` | COUNTER | Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write) | | `cnp_pg_stat_bgwriter_buffers_alloc` | COUNTER | Number of buffers allocated | -#### Group `cnp_pg_stat_database` +## `cnp_pg_stat_database` This metrics group directly exposes the summary data postgres collects in its own `pg_stat_database` view. It contains statistical counters maintained by @@ -240,8 +221,6 @@ Derived from the `pg_stat_database` catalog. See also `cnp_pg_database`. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_stat_database_xact_commit` | COUNTER | Number of transactions in this database that have been committed | @@ -261,16 +240,15 @@ See also `cnp_pg_database`. | `cnp_pg_stat_database_blk_write_time` | COUNTER | Time spent writing data file blocks by backends in this database, in milliseconds | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | |-------|-------------| | `datname` | Name of this database | -#### Group `cnp_pg_stat_database_conflicts` +## `cnp_pg_stat_database_conflicts` These metrics provide information on conflicts between queries on a standby replica and the standby replica's replay of the change-stream from the primary. These are @@ -287,8 +265,6 @@ Only defined on standby replicas. Derived from the `pg_stat_database_conflicts` view. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_stat_database_conflicts_confl_tablespace` | COUNTER | Number of queries in this database that have been canceled due to dropped tablespaces | @@ -298,16 +274,15 @@ Derived from the `pg_stat_database_conflicts` view. | `cnp_pg_stat_database_conflicts_confl_deadlock` | COUNTER | Number of queries in this database that have been canceled due to deadlocks | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | |-------|-------------| | `datname` | Name of the database | -#### Group `cnp_pg_stat_user_tables` +## `cnp_pg_stat_user_tables` Access and usage statistics maintained by postgres on nonsystem tables. @@ -318,8 +293,6 @@ Derived from the `pg_stat_user_tables` view. See also `cnp_pg_statio_user_tables`. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_stat_user_tables_seq_scan` | COUNTER | Number of sequential scans initiated on this table | @@ -343,9 +316,8 @@ See also `cnp_pg_statio_user_tables`. | `cnp_pg_stat_user_tables_autoanalyze_count` | COUNTER | Number of times this table has been analyzed by the autovacuum daemon | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | @@ -354,7 +326,7 @@ as dimensions in Azure Monitor: | `schemaname` | Name of the schema that this table is in | | `relname` | Name of this table | -#### Group `cnp_pg_stat_replication` +## `cnp_pg_stat_replication` Realtime information about replication connections to this postgres instance, their progress and activity. @@ -367,8 +339,6 @@ Derived from the `pg_stat_replication` view. See also `cnp_pg_replication_slots`, `cnp_pg_replication`. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_stat_replication_backend_start_age` | GAUGE | How long ago in seconds this process was started | @@ -382,9 +352,8 @@ See also `cnp_pg_replication_slots`, `cnp_pg_replication`. | `cnp_pg_stat_replication_replay_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | @@ -392,7 +361,7 @@ as dimensions in Azure Monitor: | `usename` | Name of the replication user | | `application_name` | Name of the application | -#### Group `cnp_pg_statio_user_tables` +## `cnp_pg_statio_user_tables` I/O activity statistics maintained by postgres on nonsystem tables. @@ -403,8 +372,6 @@ Derived from the `pg_statio_user_tables` view. See also `cnp_pg_stat_user_tables`. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_statio_user_tables_heap_blks_read` | COUNTER | Number of disk blocks read from this table | @@ -417,9 +384,8 @@ See also `cnp_pg_stat_user_tables`. | `cnp_pg_statio_user_tables_tidx_blks_hit` | COUNTER | Number of buffer hits in this table's TOAST table indexes (if any) | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | @@ -428,7 +394,7 @@ as dimensions in Azure Monitor: | `schemaname` | Name of the schema that this table is in | | `relname` | Name of this table | -#### Group `cnp_pg_settings` +## `cnp_pg_settings` Expose the subset of postgres server settings that can be represented as Prometheus compatible metrics–any integer, boolean, or real number. @@ -443,45 +409,38 @@ system views. Derived from the `pg_settings` view. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_pg_settings_setting` | GAUGE | Setting value. Note that settings are only reported when they were changed via Cloud Native PostgreSQL. | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | |-------|-------------| | `name` | Name of the setting | -#### Group `cnp_xlog_insert` +## `cnp_xlog_insert` Reports the postgres instance's transaction log insert position in bytes. Useful to compare one postgres instance's WAL insert position with other instances' replication replay positions in monitoring. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_xlog_insert_lsn` | GAUGE | Node xlog insert position (lsn) | -#### Group `cnp_bdr_raft_mon` +## `cnp_bdr_raft_mon` Expose the raft status per CNP node of a BDR cluster -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_bdr_raft_mon_raftstatus` | GAUGE | Raft health status; 0 for unhealthy, 1 for healthy | -#### Group `cnp_bdr_rep_slot_stats` +## `cnp_bdr_rep_slot_stats` Metrics from pg_catalog.pg_stat_replication_slots for each BDR replication slot. These metrics can be used to monitor logical decoding activity and performance the @@ -489,8 +448,6 @@ sending (upstream) side of a logical replication connection. See https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-SLOTS-VIEW for details. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_bdr_rep_slot_stats_spill_txns` | COUNTER | spill\_txns | @@ -503,9 +460,8 @@ for details. | `cnp_bdr_rep_slot_stats_total_bytes` | COUNTER | total\_bytes | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | @@ -513,15 +469,13 @@ as dimensions in Azure Monitor: | `peer_name` | peer\_name | | `slot_name` | slot\_name | -#### Group `cnp_bdr_rep_lag` +## `cnp_bdr_rep_lag` Metrics based on the bdr.node_replication_rates monitoring catalog for monitoring BDR replication performance and repliation lag. See https://www.enterprisedb.com/docs/pgd/latest/monitoring/#monitoring-outgoing-replication and https://www.enterprisedb.com/docs/pgd/latest/bdr/catalogs/#bdrnode_replication_rates -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_bdr_rep_lag_replay_lag_s` | GAUGE | replay\_lag\_s | @@ -530,23 +484,20 @@ and https://www.enterprisedb.com/docs/pgd/latest/bdr/catalogs/#bdrnode_replicati | `cnp_bdr_rep_lag_catchup_interval_s` | GAUGE | catchup\_interval\_s | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | |-------|-------------| | `peer_name` | peer\_name | -#### Group `cnp_bdr_node_slots` +## `cnp_bdr_node_slots` Metrics derived from the bdr.node_slots view. These metrics provide lower level insight into the progress of outbound BDR replication, including transaction ID limits and WAL retention and the connection status of replication sessions. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_bdr_node_slots_active_pid` | GAUGE | active\_pid | @@ -559,9 +510,8 @@ WAL retention and the connection status of replication sessions. | `cnp_bdr_node_slots_slot_state` | GAUGE | slot\_state enumeration. disconnected = 0, streaming = 1, catchup = 2, unknown/unrecognised -1 | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | @@ -569,24 +519,21 @@ as dimensions in Azure Monitor: | `peer_name` | peer\_name | | `slot_name` | slot\_name | -#### Group `cnp_bdr_global_locking` +## `cnp_bdr_global_locking` metrics for bdr global lock acquire and hold durations for both DDL and DML lock types. Useful for detection of long global lock waits or frequent global locks that may impact performance. These metrics are not fine grained and do not expose information about individual tables, etc. Details are available in the bdr.global_locks view. -##### Metrics - | Metric | Usage | Description | |----------|-------|-------------| | `cnp_bdr_global_locking_since_locally_requested_s` | GAUGE | since\_locally\_requested\_s | | `cnp_bdr_global_locking_since_local_granted_s` | GAUGE | since\_local\_granted\_s | -##### Labels -The above metrics may have these labels, represented +The metrics may have these labels, represented as dimensions in Azure Monitor: | Label | Description | From 061e22c92cd8b224b716d9e8dd9064b9b98f81d6 Mon Sep 17 00:00:00 2001 From: drothery-edb Date: Fri, 16 Sep 2022 13:04:45 -0400 Subject: [PATCH 11/13] PGD: soften language against using legacy options --- product_docs/docs/pgd/4/bdr/durability.mdx | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/product_docs/docs/pgd/4/bdr/durability.mdx b/product_docs/docs/pgd/4/bdr/durability.mdx index a3130b71629..77767abbb70 100644 --- a/product_docs/docs/pgd/4/bdr/durability.mdx +++ b/product_docs/docs/pgd/4/bdr/durability.mdx @@ -45,9 +45,7 @@ Postgres provides [Physical Streaming Replication](https://www.postgresql.org/do For backward compatibility, BDR still supports configuring synchronous replication with `synchronous_commit` and `synchronous_standby_names`. See [Legacy synchronous replication](durability#legacy-synchronous-replication-using-bdr), -but the use of [Group Commit](group-commit) is recommended instead -in all cases. - +but consider using [Group Commit](group-commit) instead. ## Terms and definitions BDR nodes can take different @@ -108,7 +106,7 @@ Postgres crashes.* `synchronous_replication_availability` to `async'`, otherwise the values for the asynchronous BDR default apply.* -*(3) Not recommended. Consider using Group Commit instead.* +*(3) Consider using Group Commit instead.* Reception ensures the peer operating normally can eventually apply the transaction without requiring any further @@ -208,8 +206,7 @@ required synchronization level and prevents loss of data. ## Legacy synchronous replication using BDR !!! Note - We don't recommend this approach. Consider using - [Group Commit](group-commit) instead. + Consider using [Group Commit](group-commit) instead. ### Usage From a2487505c72f160b0b266dd209f19424c12fd55e Mon Sep 17 00:00:00 2001 From: drothery-edb Date: Sat, 17 Sep 2022 13:22:11 -0400 Subject: [PATCH 12/13] PGD: updated harp redirect --- static/_redirects | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/static/_redirects b/static/_redirects index a42e5f0b6df..ffaa8f5cb6a 100644 --- a/static/_redirects +++ b/static/_redirects @@ -44,7 +44,7 @@ /docs/bdr/3.6/* /docs/pgd/3.6/ 302 /docs/pgd/3.6/pglogical/* /docs/pgd/3.6/ 302 /docs/harp/latest/* /docs/pgd/latest/overview/harp/:splat 302 -/docs/harp/2/* /docs/pgd/4/overview/harp/:splat 301 +/docs/harp/2/* /docs/pgd/4/harp/:splat 301 /docs/pglogical/latest/* /docs/pgd/3.7/pglogical/:splat 302 # BART From 2f26cb4e4439c954b7d12f359c4aac93798492af Mon Sep 17 00:00:00 2001 From: drothery-edb Date: Sun, 18 Sep 2022 09:09:54 -0400 Subject: [PATCH 13/13] Updated other redirects Josh pointed out --- static/_redirects | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/static/_redirects b/static/_redirects index ffaa8f5cb6a..39f318290d4 100644 --- a/static/_redirects +++ b/static/_redirects @@ -38,12 +38,12 @@ # PGD /docs/pgd/latest/overview/bdr/* /docs/pgd/latest/bdr/:splat 302 /docs/pgd/latest/overview/harp/* /docs/pgd/latest/harp/:splat 302 -/docs/bdr/latest/* /docs/pgd/latest/overview/bdr/:splat 302 -/docs/bdr/4/* /docs/pgd/4/overview/bdr/:splat 301 +/docs/bdr/latest/* /docs/pgd/latest/bdr/:splat 302 +/docs/bdr/4/* /docs/pgd/4/bdr/:splat 301 /docs/bdr/3.7/* /docs/pgd/3.7/bdr/:splat 301 /docs/bdr/3.6/* /docs/pgd/3.6/ 302 /docs/pgd/3.6/pglogical/* /docs/pgd/3.6/ 302 -/docs/harp/latest/* /docs/pgd/latest/overview/harp/:splat 302 +/docs/harp/latest/* /docs/pgd/latest/harp/:splat 302 /docs/harp/2/* /docs/pgd/4/harp/:splat 301 /docs/pglogical/latest/* /docs/pgd/3.7/pglogical/:splat 302