diff --git a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging.mdx b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging.mdx index 0dadd3f074c..679b2670707 100644 --- a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging.mdx +++ b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging.mdx @@ -2,46 +2,96 @@ title: "Monitoring and logging" --- -You can monitor your Postgres clusters by viewing the metrics and logs from Azure. For existing Postgres Enterprise Manager (PEM) users who wish to monitor EDB Cloud clusters alongside self-managed Postgres clusters, you can use the remote Remote Monitoring capability of PEM. For more information on using PEM to monitor your clusters see [Remote Monitoring](../../../../../pem/latest/pem_admin/02a_pem_remote_monitoring). +You can monitor your Postgres clusters by viewing the metrics and logs from +Azure. -The following sections describe viewing metrics and logs directly from Azure. +For existing Postgres Enterprise Manager (PEM) users who wish to monitor +BigAnimal clusters alongside self-managed Postgres clusters, you can use the +remote Remote Monitoring capability of PEM. For more information on using PEM +to monitor your clusters see +[Remote Monitoring](../../../../../pem/latest/pem_admin/02a_pem_remote_monitoring). -## Viewing metrics and logs from Azure +The following sections describes how to access logs and metrics directly in the +Azure portal. -EDB Cloud sends all metrics and logs from PostgreSQL clusters to Azure. The following describes what metrics and logs are sent and how to view them. +As every customer's needs are different, it is anticipated that applying Azure +Monitor features to the supplied data streams will enable customers to create +tailored insights into their workloads in order to better meet their business +goals. -### Azure log analytics +Pre-defined dashboards and metrics queries are provided in Azure Monitor as a +starting point for exploring the available data. -When BigAnimal deploys workloads on Azure, the logs from the PostgreSQL clusters are forwarded to the Azure Log Workspace. -To query BigAnimal logs, you must use [Azure Log Analytics](https://docs.microsoft.com/en-us/azure/azure-monitor/logs/log-analytics-overview) and [Kusto Query language](https://azure-training.com/azure-data-science/the-kusto-query-language/). +## Viewing metrics and logs from Azure +BigAnimal sends all metrics and logs from Postgres clusters to Azure. The +following describes what metrics and logs are sent and how to view them. +### Azure Log Analytics -### Querying PostgreSQL cluster logs +When BigAnimal deploys workloads on Azure, the logs from the postgres +clusters are forwarded to the Azure Log Workspace. +A pre-defined shared dashboard panel in the Azure Portal shows recent postgres +logs. To query BigAnimal logs in more detail you must use +[Azure Log Analytics](https://docs.microsoft.com/en-us/azure/azure-monitor/logs/log-analytics-overview) +and the +[Kusto Query language](https://azure-training.com/azure-data-science/the-kusto-query-language/). -All logs from your PostgreSQL clusters are stored in the _Customer Log Analytics workspace_. To find your _Customer Log Analytics workspace_: +### Using shared dashboards to view PostgreSQL cluster logs and metrics -1. Sign in to the [Azure portal](https://portal.azure.com). +To view logs and selected metrics summaries from your PostgreSQL clusters using +Shared Dashboard: +1. Sign in to the [Azure portal](https://portal.azure.com). 2. Select **Resource Groups**. +2. Select the Resource Group corresponding to the region where you choose to + deploy your BigAnimal cluster. You will see resources included in that + Resource Group. +3. Select the resource of type _Shared Dashboard_ with the suffix -customer. +4. Select the **Go to dashboard** link located at the top of the page. -2. Select the Resource Group corresponding to the region where you choose to deploy your BigAnimal cluster. You will see resources included in that Resource Group. +The default shared dashboard provided by BigAnimal will be extended and +enhanced over time. It includes panels for monitoring and diagnostic +information like: -3. Select the resource of type _Log Analytics workspace_ with the suffix -customer. +* Recent log entries for all clusters +* Row insert, update, and delete rates per database +* Query deadlock rates +* Connection counts over time +* Temporary storage use trend +* Replication lag +* Age of longest running transaction + +!!! Important +Changes you make to the shared dashboard will be overwritten when +BigAnimal updates are deployed. Create your own custom dashboard if you wish to +modify or extend the provided dashboard. You can start with the BigAnimal dashboard +by using the "Clone" button in the dashboard view. +!!! + +### Querying PostgreSQL cluster logs and metrics +All logs from your PostgreSQL clusters are stored in the _Customer Log +Analytics workspace_. To find your _Customer Log Analytics workspace_: + +1. Sign in to the [Azure portal](https://portal.azure.com). +2. Select **Resource Groups**. +2. Select the Resource Group corresponding to the region where you choose to + deploy your BigAnimal cluster. You will see resources included in that Resource + Group. +3. Select the resource of type _Log Analytics workspace_ with the suffix -customer. 4. Select the Logs in the menu on the left in the General section. +5. Close the dashboard with pre-built queries. This will bring you to the KQL Editor. -5. Close the dashboard with prebuilt queries. This will bring you to the KQL Editor. +#### Available Logs and Metrics -The following tables are available in the _Customer Log Analytic workspace_. +See the next section [Metrics Details](#metrics-details-list) for a listing of +available metrics and details on the structure of log entries. -| Table name | Description | Logger | -| ---------- | ----------- | ------ | -| PostgresLogs_CL | Logs of the Customer clusters databases (all postgres related logs) | `logger = postgres` | -| PostgresAuditLogs_CL | Audit Logs of the Customer clusters databases | `logger = pgaudit or edb_audit` | +#### Example Log Queries -You can use the KQL Query editor to compose your queries over these tables. For example, +For example, ``` PostgresLogs_CL @@ -59,18 +109,20 @@ PostgresAuditLogs_CL | sort by record_log_time_s desc ``` -### Using shared dashboards to view PostgreSQL cluster logs - -To view logs from your PostgreSQL clusters using Shared Dashboard: - -1. Sign in to the [Azure portal](https://portal.azure.com). -2. Select **Resource Groups**. +#### Example Metrics Queries -2. Select the Resource Group corresponding to the region where you choose to deploy your BigAnimal cluster. You will see resources included in that Resource Group. +To list the metrics from BigAnimal presently available in the `InsightsMetrics` +table use this query: -3. Select the resource of type _Shared Dashboard_ with the suffix -customer. +``` +InsightsMetrics +| where Namespace == "prometheus" +| distinct Name +``` -4. Select the **Go to dashboard** link located at the top of the page. +(Or just use Metrics Explorer). +### See also +* [Azure Monitor Metrics Overview](https://docs.microsoft.com/en-us/azure/azure-monitor/essentials/data-platform-metrics) diff --git a/product_docs/docs/biganimal/release/using_cluster/06_metrics.mdx b/product_docs/docs/biganimal/release/using_cluster/06_metrics.mdx new file mode 100644 index 00000000000..dce8e228c51 --- /dev/null +++ b/product_docs/docs/biganimal/release/using_cluster/06_metrics.mdx @@ -0,0 +1,598 @@ +--- +title: "Metrics Details" +--- + +A variety of metrics are collected by the BigAnimal instance and made available +to the customer's Azure subscription for dashboarding, alerting, querying and +other analytics. + +See [Monitoring and Logging](#monitoring-and-logging) for an introduction to +the available monitoring capabilities. + +This section explains how to find and interpret the available metrics and logs. +It also lists and describes the individual metrics provided. + +## Understanding BigAnimal Logs and Metrics + +You can see example queries over these metrics by editing the predefined +dashboard panels in the default shared dashboard. Some pre-defined queries +and/or functions may also be available in the Log Analytics queries panel. +The Azure Monitor Metrics Explorer provides a useful entry point for +discovering the available metrics. + +In-depth advice on the details of querying these metrics is beyond the scope of +this documentation. Refer to The Azure Log Analytics and Azure Monitor +documentation and to the documentation on the Kusto query language used by +Azure Monitor. A wide variety of analytics capabilities are available including +time-series functions, seasonally adjusted statistics, alert generation and +more. + +## Available Logs and Metrics + +The following tables in the _Customer Log Analytic workspace_ contain entries +specific to BigAnimal: + +| Table name | Description | +| ---------- | ----------- | +| PostgresLogs_CL | Logs of the Customer clusters databases (all postgres related logs) | +| PostgresAuditLogs_CL | Audit Logs of the Customer clusters databases, if enabled | +| InsightsMetrics | Metrics streams from BigAnimal Prometheus and Azure Monitor. BigAnimal metrics have `namespace == "prometheus"` | + +You can use the KQL Query editor in the Log Workspace view to compose queries +over these tables. + +## Logs + +Postgres logs are added to the `PostgresLogs_CL` table. + +Logs are split into structured fields matching those of the Postgres +[csvlog format](https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG) +with a `record_` prefix and a type-suffix. For example the `application_name` +is in the `record_application_name_s` log field. + +The `pg_cluster_id_s` field identifies the specific postgres cluster +that originated the log message. + +## Metrics Overview + +BigAnimal collects a wide set of metrics about postgres instances into the +`InsightsMetrics` log analytics table. Most of these metrics are acquired +directly from postgres system tables, views, and functions. The postgres +documentation serves as the main reference for these metrics. + +KQL can be used to analyze time-series metrics, report latest samples of +metrics, etc by querying the `InsightsMetrics` table. + +Some data from postgres monitoring system views, tables and functions are +transformed to be easier to consume in Prometheus metrics format. For example, +timestamp fields are generally converted to unix epoch time and/or accompanied +by a relative time-interval metric. Other metrics are aggregated into +categories by label dimensions to limit the number of very specific and +narrowly scoped individual metrics emitted. It would be not be very useful to +report the inactivity period of every single backend, for example, so backend +statistics are aggregated by database, user, `application_name` and backend +state. + +Prometheus [Labels](https://prometheus.io/docs/practices/naming/#labels) +are mapped to Azure metrics +[Dimensions](https://docs.microsoft.com/en-us/azure/azure-monitor/essentials/data-platform-metrics#multi-dimensional-metrics). +Dimensions vary depending on the individual metric, and are documented +separately for each group of related metrics. + +The forwarded Prometheus metrics use structured json fields, particularly for +the `Tags` field. Effective use of them will require use of the +[`todynamic()`](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parsejsonfunction) +function in queries. + +The available set of metrics is subject to change. Metrics may be added, +removed or renamed. Where feasible an effort will be made not to change the +meaning or type of existing metrics without also changing the metric name. + +At time of writing all metrics forwarded from Prometheus are in the +`prometheus` namespace. This may change in a future release. + +Effective use of the available metrics will require an understanding of Azure +time-series data, metrics dimensions, and of the tagging conventions used in +the metrics streams. + +### Metrics tags + +All postgres metrics share a common tagging scheme. Entries will generally +have at least the following tags: + +| Name | Description | +|--------------------------------|-------------| +| address | IP address of the host the metric originated from | +| postgresql | BigAnimal postgres cluster identifier e.g. `p-abcdef012345` | +| role | Postgres instance role, "primary" or "replica" | +| datname | Postgres database name (where applicable) | +| pod_name | k8s pod name | +| hostName | AKS node host name | +| container.azm.ms/clusterName | AKS cluster name | + +When querying for tags best performance is achieved when any filters that do not +require inspection of tags (e.g. filters by metric name) are applied before any +tag-based filters. + +The `Tags` field of a metrics entry is a json-typed field that may be queried +for individual values with `todynamic(Tags).keyname` in KQL. Some uses of values +may require explicit casts to another type e.g. `tostring(...)`. + +Example usage: + +``` +InsightsMetrics +| where Namespace == "prometheus" and Name startswith "cnp_" +| extend t = todynamic(Tags) +| where t.role == "primary" +| project postgres_cluster_id = tostring(t.postgresql), dbname = tostring(t.datname) +| where not (dbname has_any("template0", "template1")) +| distinct postgres_cluster_id, dbname +``` + +[comment1]: # (Generated content see upm-substrate repo config monitoring dir) + +#### Group `cnp_backends` + +Backend counts from `pg_stat_activity` aggregated by the listed label +dimensions. Useful for identifying busy applications, excessive idle +backends, etc. + +Derived from the `pg_stat_activity` view. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_backends_total` | GAUGE | Number of backends | +| `cnp_backends_max_tx_duration_seconds` | GAUGE | Maximum duration of a transaction in seconds | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of the database | +| `usename` | Name of the user | +| `application_name` | Name of the application | +| `state` | State of the backend | + +#### Group `cnp_backends_waiting` + +Postgres-instance-level aggregate information on backends that are blocked +waiting for locks. Does not count I/O waits or other reasons backends might +wait or be blocked. + +Derived from the `pg_locks` view. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_backends_waiting_total` | GAUGE | Total number of backends that are currently waiting on other queries | + +#### Group `cnp_pg_database` + +Per-database metrics for each database in the postgres instance. +Includes per-database vacuum progress information. + +Derived from the `pg_database` catalog. + +See also `cnp_pg_stat_database`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_database_size_bytes` | GAUGE | Disk space used by the database | +| `cnp_pg_database_xid_age` | GAUGE | Number of transactions from the frozen XID to the current one | +| `cnp_pg_database_mxid_age` | GAUGE | Number of multiple transactions (Multixact) from the frozen XID to the current one | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of the database | + +#### Group `cnp_pg_postmaster` + +Data on the postgres instance's managing "postmaster" process. + +Derived from the `pg_postmaster_start_time()` function. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_postmaster_start_time` | GAUGE | Time at which postgres started (based on epoch) | + +#### Group `cnp_pg_replication` + +Physical replication details for a standby postgres instance +as captured from the standby itself. + +Derived from the `pg_last_xact_replay_timestamp()` function. + +Only relevant on standby servers. + +See also `cnp_pg_stat_replication`, `cnp_pg_replication_slots`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_replication_lag` | GAUGE | Replication lag behind primary in seconds | +| `cnp_pg_replication_in_recovery` | GAUGE | Whether the instance is in recovery | + +#### Group `cnp_pg_replication_slots` + +Details about replication slots on a postgres instance. In most +configurations only the primary server will have active replication clients, +but other nodes may still have replication slots. + +Note that logical replication slots are specific to a database, whereas +physical replication slots will have an empty "database" label as they +apply to the postgres instance as a whole. + +Derived from the `pg_replication_slots` view. + +See also `cnp_pg_stat_replication`, `cnp_pg_replication`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_replication_slots_active` | GAUGE | Flag indicating if the slot is active | +| `cnp_pg_replication_slots_pg_wal_lsn_diff` | GAUGE | Replication lag in bytes | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `slot_name` | Name of the replication slot | +| `database` | Name of the database | + +#### Group `cnp_pg_stat_archiver` + +Progress information about WAL archiving. Only the currently active primary +server will generally be performing WAL archiving. + +WAL archiving is important for backup and restore. If WAL archiving is +delayed or failing for too long, the point-in-time recovery backups for +a postgres cluster will not be up to date. This has disaster recovery +implications and can potentially also affect failover. + +Occasional WAL archiving failures are normal, but a growing delay in the time +since the last successful WAL archiving operation should be taken seriously. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_stat_archiver` view. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_archiver_archived_count` | COUNTER | Number of WAL files that have been successfully archived | +| `cnp_pg_stat_archiver_failed_count` | COUNTER | Number of failed attempts for archiving WAL files | +| `cnp_pg_stat_archiver_seconds_since_last_archival` | GAUGE | Seconds since the last successful archival operation | +| `cnp_pg_stat_archiver_seconds_since_last_failure` | GAUGE | Seconds since the last failed archival operation | +| `cnp_pg_stat_archiver_last_archived_time` | GAUGE | Epoch of the last time WAL archiving succeeded | +| `cnp_pg_stat_archiver_last_failed_time` | GAUGE | Epoch of the last time WAL archiving failed | +| `cnp_pg_stat_archiver_last_archived_wal_start_lsn` | GAUGE | Archived WAL start LSN | +| `cnp_pg_stat_archiver_last_failed_wal_start_lsn` | GAUGE | Last failed WAL LSN | +| `cnp_pg_stat_archiver_stats_reset_time` | GAUGE | Time at which these statistics were last reset | + +#### Group `cnp_pg_stat_bgwriter` + +Stats for the postgres background writer and checkpointer processes, which +are instance-wide and shared across all databases in a postgres instance. + +Very long delays between checkpoints on a busy system will increase the time +taken for it to return to read/write availability if crash recovery is +required. Excessively frequent checkpoints can increase I/O load and the size +of the WAL stream for backup and replication. + +The postgres documentation discusses checkpoints, dirty writeback, and +checkpoint tuning in detail. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_stat_bgwriter` catalog. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_bgwriter_checkpoints_timed` | COUNTER | Number of scheduled checkpoints that have been performed | +| `cnp_pg_stat_bgwriter_checkpoints_req` | COUNTER | Number of requested checkpoints that have been performed | +| `cnp_pg_stat_bgwriter_checkpoint_write_time` | COUNTER | Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds | +| `cnp_pg_stat_bgwriter_checkpoint_sync_time` | COUNTER | Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds | +| `cnp_pg_stat_bgwriter_buffers_checkpoint` | COUNTER | Number of buffers written during checkpoints | +| `cnp_pg_stat_bgwriter_buffers_clean` | COUNTER | Number of buffers written by the background writer | +| `cnp_pg_stat_bgwriter_maxwritten_clean` | COUNTER | Number of times the background writer stopped a cleaning scan because it had written too many buffers | +| `cnp_pg_stat_bgwriter_buffers_backend` | COUNTER | Number of buffers written directly by a backend | +| `cnp_pg_stat_bgwriter_buffers_backend_fsync` | COUNTER | Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write) | +| `cnp_pg_stat_bgwriter_buffers_alloc` | COUNTER | Number of buffers allocated | + +#### Group `cnp_pg_stat_database` + +This metrics group directly exposes the summary data postgres collects in its +own `pg_stat_database` view. It contains statistical counters maintained by +postgres itself for database activity. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_stat_database` catalog. + +See also `cnp_pg_database`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_database_xact_commit` | COUNTER | Number of transactions in this database that have been committed | +| `cnp_pg_stat_database_xact_rollback` | COUNTER | Number of transactions in this database that have been rolled back | +| `cnp_pg_stat_database_blks_read` | COUNTER | Number of disk blocks read in this database | +| `cnp_pg_stat_database_blks_hit` | COUNTER | Number of times disk blocks were found already in the buffer cache, so that a read was not necessary (this only includes hits in the PostgreSQL buffer cache, not the operating system's file system cache) | +| `cnp_pg_stat_database_tup_returned` | COUNTER | Number of rows returned by queries in this database | +| `cnp_pg_stat_database_tup_fetched` | COUNTER | Number of rows fetched by queries in this database | +| `cnp_pg_stat_database_tup_inserted` | COUNTER | Number of rows inserted by queries in this database | +| `cnp_pg_stat_database_tup_updated` | COUNTER | Number of rows updated by queries in this database | +| `cnp_pg_stat_database_tup_deleted` | COUNTER | Number of rows deleted by queries in this database | +| `cnp_pg_stat_database_conflicts` | COUNTER | Number of queries canceled due to conflicts with recovery in this database | +| `cnp_pg_stat_database_temp_files` | COUNTER | Number of temporary files created by queries in this database | +| `cnp_pg_stat_database_temp_bytes` | COUNTER | Total amount of data written to temporary files by queries in this database | +| `cnp_pg_stat_database_deadlocks` | COUNTER | Number of deadlocks detected in this database | +| `cnp_pg_stat_database_blk_read_time` | COUNTER | Time spent reading data file blocks by backends in this database, in milliseconds | +| `cnp_pg_stat_database_blk_write_time` | COUNTER | Time spent writing data file blocks by backends in this database, in milliseconds | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of this database | + +#### Group `cnp_pg_stat_database_conflicts` + +These metrics provide information on conflicts between queries on a standby +and the standby's replay of the change-stream from the primary. These are +called recovery conflicts. + +These metrics are unrelated to "INSERT ... ON CONFLICT" conflicts, or +multi-master replication row conflicts. They are only relevant on standby +servers. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Only defined on standby servers. + +Derived from the `pg_stat_database_conflicts` view. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_database_conflicts_confl_tablespace` | COUNTER | Number of queries in this database that have been canceled due to dropped tablespaces | +| `cnp_pg_stat_database_conflicts_confl_lock` | COUNTER | Number of queries in this database that have been canceled due to lock timeouts | +| `cnp_pg_stat_database_conflicts_confl_snapshot` | COUNTER | Number of queries in this database that have been canceled due to old snapshots | +| `cnp_pg_stat_database_conflicts_confl_bufferpin` | COUNTER | Number of queries in this database that have been canceled due to pinned buffers | +| `cnp_pg_stat_database_conflicts_confl_deadlock` | COUNTER | Number of queries in this database that have been canceled due to deadlocks | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of the database | + +#### Group `cnp_pg_stat_user_tables` + +Access and usage statistics maintained by postgres on non-system tables. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_stat_user_tables` view. + +See also `cnp_pg_statio_user_tables`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_user_tables_seq_scan` | COUNTER | Number of sequential scans initiated on this table | +| `cnp_pg_stat_user_tables_seq_tup_read` | COUNTER | Number of live rows fetched by sequential scans | +| `cnp_pg_stat_user_tables_idx_scan` | COUNTER | Number of index scans initiated on this table | +| `cnp_pg_stat_user_tables_idx_tup_fetch` | COUNTER | Number of live rows fetched by index scans | +| `cnp_pg_stat_user_tables_n_tup_ins` | COUNTER | Number of rows inserted | +| `cnp_pg_stat_user_tables_n_tup_upd` | COUNTER | Number of rows updated | +| `cnp_pg_stat_user_tables_n_tup_del` | COUNTER | Number of rows deleted | +| `cnp_pg_stat_user_tables_n_tup_hot_upd` | COUNTER | Number of rows HOT updated (i.e., with no separate index update required) | +| `cnp_pg_stat_user_tables_n_live_tup` | GAUGE | Estimated number of live rows | +| `cnp_pg_stat_user_tables_n_dead_tup` | GAUGE | Estimated number of dead rows | +| `cnp_pg_stat_user_tables_n_mod_since_analyze` | GAUGE | Estimated number of rows changed since last analyze | +| `cnp_pg_stat_user_tables_last_vacuum` | GAUGE | Last time at which this table was manually vacuumed (not counting VACUUM FULL) | +| `cnp_pg_stat_user_tables_last_autovacuum` | GAUGE | Last time at which this table was vacuumed by the autovacuum daemon | +| `cnp_pg_stat_user_tables_last_analyze` | GAUGE | Last time at which this table was manually analyzed | +| `cnp_pg_stat_user_tables_last_autoanalyze` | GAUGE | Last time at which this table was analyzed by the autovacuum daemon | +| `cnp_pg_stat_user_tables_vacuum_count` | COUNTER | Number of times this table has been manually vacuumed (not counting VACUUM FULL) | +| `cnp_pg_stat_user_tables_autovacuum_count` | COUNTER | Number of times this table has been vacuumed by the autovacuum daemon | +| `cnp_pg_stat_user_tables_analyze_count` | COUNTER | Number of times this table has been manually analyzed | +| `cnp_pg_stat_user_tables_autoanalyze_count` | COUNTER | Number of times this table has been analyzed by the autovacuum daemon | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of current database | +| `schemaname` | Name of the schema that this table is in | +| `relname` | Name of this table | + +#### Group `cnp_pg_stat_replication` + +Realtime information about replication connections to this postgres instance, +their progress and activity. + +Metrics in this section are not reset when a postgres stats reset is issued +on the db server. The "stat" in the name is a historic artefact from postgres +development. + +Derived from the `pg_stat_replication` view. + +See also `cnp_pg_replication_slots`, `cnp_pg_replication`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_replication_backend_start` | COUNTER | Time when this process was started | +| `cnp_pg_stat_replication_backend_xmin_age` | COUNTER | The age of this standby's xmin horizon | +| `cnp_pg_stat_replication_sent_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location sent on this connection | +| `cnp_pg_stat_replication_write_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location written to disk by this standby server | +| `cnp_pg_stat_replication_flush_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location flushed to disk by this standby server | +| `cnp_pg_stat_replication_replay_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location replayed into the database on this standby server | +| `cnp_pg_stat_replication_write_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written it | +| `cnp_pg_stat_replication_flush_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written and flushed it | +| `cnp_pg_stat_replication_replay_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `usename` | Name of the replication user | +| `application_name` | Name of the application | +| `client_addr` | Client IP address | + +#### Group `cnp_pg_statio_user_tables` + +I/O activity statistics maintained by postgres on non-system tables. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_statio_user_tables` view. + +See also `cnp_pg_stat_user_tables`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_statio_user_tables_heap_blks_read` | COUNTER | Number of disk blocks read from this table | +| `cnp_pg_statio_user_tables_heap_blks_hit` | COUNTER | Number of buffer hits in this table | +| `cnp_pg_statio_user_tables_idx_blks_read` | COUNTER | Number of disk blocks read from all indexes on this table | +| `cnp_pg_statio_user_tables_idx_blks_hit` | COUNTER | Number of buffer hits in all indexes on this table | +| `cnp_pg_statio_user_tables_toast_blks_read` | COUNTER | Number of disk blocks read from this table's TOAST table (if any) | +| `cnp_pg_statio_user_tables_toast_blks_hit` | COUNTER | Number of buffer hits in this table's TOAST table (if any) | +| `cnp_pg_statio_user_tables_tidx_blks_read` | COUNTER | Number of disk blocks read from this table's TOAST table indexes (if any) | +| `cnp_pg_statio_user_tables_tidx_blks_hit` | COUNTER | Number of buffer hits in this table's TOAST table indexes (if any) | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of current database | +| `schemaname` | Name of the schema that this table is in | +| `relname` | Name of this table | + +#### Group `cnp_pg_settings` + +Expose the subset of postgres server settings that can be represented as +Prometheus compatible metrics - any integer, boolean or real number. +Text-format settings, list-valued settings and enumeration-typed settings are +not captured or reported. + +This set of metrics does not expose per-database settings assigned with +`ALTER DATABASE ... SET ...`, per-user settings assigned with `ALTER USER ... +SET ...`, or per-session values. It only shows the database-system-wide +global values. You can explore other settings interactively using postgres +system views. + +Derived from the `pg_settings` view. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_settings_setting` | GAUGE | Setting value | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `name` | Name of the setting | + +[comment2]: # (End generated content) + +### Other metrics streams + +In addition to postgres metrics from the Cloud Native PostgreSQL operator that +manages databases in BigAnimal, additional metrics about Kubernetes cluster +state and other details may be streamed to the Log Workspace. Any such metrics +are generally well-known metrics from widely used tools, documented by the +upstream vendor of the component. + +Details on individual metrics from such sources will not be listed in this +document. Refer to the documentation of the tool or project that defines the +metrics. + +See also: + +* [Kubernetes cluster metrics](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/). + +Additional streams of metrics may be supplied by the cloud platform itself +directly to the customer's metrics, analytics and dashboarding endpoint. + +### Dive Deeper + +The capabilities available in the Azure portal are too broad to fully cover in this +documentation. They include the ability to: + +* Discover metrics in the Azure Monitor Metrics Explorer (Monitor -> Metrics) +* Query logs and metrics from the Azure Monitor Logs view (Monitor -> Logs) +* Create dashboards backed by metrics queries in the Portal +* Define alerting rules to trigger notifications based on queries +* Use AI-assisted analytics assistant capabilities ("Metrics Advisers") to find + patterns in metrics +* Apply complex analytic tools for time-series data in Application Insights, + including seasonally adjusted statistics to discover patterns, anomalies and + trends.