From 9198a51a4fb6df4daacd7f921d4b36b5b04e8e33 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Tue, 2 Nov 2021 13:49:51 +0800 Subject: [PATCH] doc(UPM-2321): Document CNP metrics exposed by BigAnimal List the CNP metrics exposed by BigAnimal Also provide some guidance on using those metrics and on the structure of metrics and logs entries. Note that this documentation change contains a section that is generated by a script. The script is indended to be hosted in the upm-substrate repo. It doesn't seem practical to add the script here and have it re-generate the automatically generated section on every run, so updating it is expected to be part of the BigAnimal release process for now. A comment in the Markdown tries to direct the reader to where the script lives. --- .../05_monitoring_and_logging.mdx | 108 +++- .../release/using_cluster/06_metrics.mdx | 598 ++++++++++++++++++ 2 files changed, 678 insertions(+), 28 deletions(-) create mode 100644 product_docs/docs/biganimal/release/using_cluster/06_metrics.mdx diff --git a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging.mdx b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging.mdx index 0dadd3f074c..679b2670707 100644 --- a/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging.mdx +++ b/product_docs/docs/biganimal/release/using_cluster/05_monitoring_and_logging.mdx @@ -2,46 +2,96 @@ title: "Monitoring and logging" --- -You can monitor your Postgres clusters by viewing the metrics and logs from Azure. For existing Postgres Enterprise Manager (PEM) users who wish to monitor EDB Cloud clusters alongside self-managed Postgres clusters, you can use the remote Remote Monitoring capability of PEM. For more information on using PEM to monitor your clusters see [Remote Monitoring](../../../../../pem/latest/pem_admin/02a_pem_remote_monitoring). +You can monitor your Postgres clusters by viewing the metrics and logs from +Azure. -The following sections describe viewing metrics and logs directly from Azure. +For existing Postgres Enterprise Manager (PEM) users who wish to monitor +BigAnimal clusters alongside self-managed Postgres clusters, you can use the +remote Remote Monitoring capability of PEM. For more information on using PEM +to monitor your clusters see +[Remote Monitoring](../../../../../pem/latest/pem_admin/02a_pem_remote_monitoring). -## Viewing metrics and logs from Azure +The following sections describes how to access logs and metrics directly in the +Azure portal. -EDB Cloud sends all metrics and logs from PostgreSQL clusters to Azure. The following describes what metrics and logs are sent and how to view them. +As every customer's needs are different, it is anticipated that applying Azure +Monitor features to the supplied data streams will enable customers to create +tailored insights into their workloads in order to better meet their business +goals. -### Azure log analytics +Pre-defined dashboards and metrics queries are provided in Azure Monitor as a +starting point for exploring the available data. -When BigAnimal deploys workloads on Azure, the logs from the PostgreSQL clusters are forwarded to the Azure Log Workspace. -To query BigAnimal logs, you must use [Azure Log Analytics](https://docs.microsoft.com/en-us/azure/azure-monitor/logs/log-analytics-overview) and [Kusto Query language](https://azure-training.com/azure-data-science/the-kusto-query-language/). +## Viewing metrics and logs from Azure +BigAnimal sends all metrics and logs from Postgres clusters to Azure. The +following describes what metrics and logs are sent and how to view them. +### Azure Log Analytics -### Querying PostgreSQL cluster logs +When BigAnimal deploys workloads on Azure, the logs from the postgres +clusters are forwarded to the Azure Log Workspace. +A pre-defined shared dashboard panel in the Azure Portal shows recent postgres +logs. To query BigAnimal logs in more detail you must use +[Azure Log Analytics](https://docs.microsoft.com/en-us/azure/azure-monitor/logs/log-analytics-overview) +and the +[Kusto Query language](https://azure-training.com/azure-data-science/the-kusto-query-language/). -All logs from your PostgreSQL clusters are stored in the _Customer Log Analytics workspace_. To find your _Customer Log Analytics workspace_: +### Using shared dashboards to view PostgreSQL cluster logs and metrics -1. Sign in to the [Azure portal](https://portal.azure.com). +To view logs and selected metrics summaries from your PostgreSQL clusters using +Shared Dashboard: +1. Sign in to the [Azure portal](https://portal.azure.com). 2. Select **Resource Groups**. +2. Select the Resource Group corresponding to the region where you choose to + deploy your BigAnimal cluster. You will see resources included in that + Resource Group. +3. Select the resource of type _Shared Dashboard_ with the suffix -customer. +4. Select the **Go to dashboard** link located at the top of the page. -2. Select the Resource Group corresponding to the region where you choose to deploy your BigAnimal cluster. You will see resources included in that Resource Group. +The default shared dashboard provided by BigAnimal will be extended and +enhanced over time. It includes panels for monitoring and diagnostic +information like: -3. Select the resource of type _Log Analytics workspace_ with the suffix -customer. +* Recent log entries for all clusters +* Row insert, update, and delete rates per database +* Query deadlock rates +* Connection counts over time +* Temporary storage use trend +* Replication lag +* Age of longest running transaction + +!!! Important +Changes you make to the shared dashboard will be overwritten when +BigAnimal updates are deployed. Create your own custom dashboard if you wish to +modify or extend the provided dashboard. You can start with the BigAnimal dashboard +by using the "Clone" button in the dashboard view. +!!! + +### Querying PostgreSQL cluster logs and metrics +All logs from your PostgreSQL clusters are stored in the _Customer Log +Analytics workspace_. To find your _Customer Log Analytics workspace_: + +1. Sign in to the [Azure portal](https://portal.azure.com). +2. Select **Resource Groups**. +2. Select the Resource Group corresponding to the region where you choose to + deploy your BigAnimal cluster. You will see resources included in that Resource + Group. +3. Select the resource of type _Log Analytics workspace_ with the suffix -customer. 4. Select the Logs in the menu on the left in the General section. +5. Close the dashboard with pre-built queries. This will bring you to the KQL Editor. -5. Close the dashboard with prebuilt queries. This will bring you to the KQL Editor. +#### Available Logs and Metrics -The following tables are available in the _Customer Log Analytic workspace_. +See the next section [Metrics Details](#metrics-details-list) for a listing of +available metrics and details on the structure of log entries. -| Table name | Description | Logger | -| ---------- | ----------- | ------ | -| PostgresLogs_CL | Logs of the Customer clusters databases (all postgres related logs) | `logger = postgres` | -| PostgresAuditLogs_CL | Audit Logs of the Customer clusters databases | `logger = pgaudit or edb_audit` | +#### Example Log Queries -You can use the KQL Query editor to compose your queries over these tables. For example, +For example, ``` PostgresLogs_CL @@ -59,18 +109,20 @@ PostgresAuditLogs_CL | sort by record_log_time_s desc ``` -### Using shared dashboards to view PostgreSQL cluster logs - -To view logs from your PostgreSQL clusters using Shared Dashboard: - -1. Sign in to the [Azure portal](https://portal.azure.com). -2. Select **Resource Groups**. +#### Example Metrics Queries -2. Select the Resource Group corresponding to the region where you choose to deploy your BigAnimal cluster. You will see resources included in that Resource Group. +To list the metrics from BigAnimal presently available in the `InsightsMetrics` +table use this query: -3. Select the resource of type _Shared Dashboard_ with the suffix -customer. +``` +InsightsMetrics +| where Namespace == "prometheus" +| distinct Name +``` -4. Select the **Go to dashboard** link located at the top of the page. +(Or just use Metrics Explorer). +### See also +* [Azure Monitor Metrics Overview](https://docs.microsoft.com/en-us/azure/azure-monitor/essentials/data-platform-metrics) diff --git a/product_docs/docs/biganimal/release/using_cluster/06_metrics.mdx b/product_docs/docs/biganimal/release/using_cluster/06_metrics.mdx new file mode 100644 index 00000000000..dce8e228c51 --- /dev/null +++ b/product_docs/docs/biganimal/release/using_cluster/06_metrics.mdx @@ -0,0 +1,598 @@ +--- +title: "Metrics Details" +--- + +A variety of metrics are collected by the BigAnimal instance and made available +to the customer's Azure subscription for dashboarding, alerting, querying and +other analytics. + +See [Monitoring and Logging](#monitoring-and-logging) for an introduction to +the available monitoring capabilities. + +This section explains how to find and interpret the available metrics and logs. +It also lists and describes the individual metrics provided. + +## Understanding BigAnimal Logs and Metrics + +You can see example queries over these metrics by editing the predefined +dashboard panels in the default shared dashboard. Some pre-defined queries +and/or functions may also be available in the Log Analytics queries panel. +The Azure Monitor Metrics Explorer provides a useful entry point for +discovering the available metrics. + +In-depth advice on the details of querying these metrics is beyond the scope of +this documentation. Refer to The Azure Log Analytics and Azure Monitor +documentation and to the documentation on the Kusto query language used by +Azure Monitor. A wide variety of analytics capabilities are available including +time-series functions, seasonally adjusted statistics, alert generation and +more. + +## Available Logs and Metrics + +The following tables in the _Customer Log Analytic workspace_ contain entries +specific to BigAnimal: + +| Table name | Description | +| ---------- | ----------- | +| PostgresLogs_CL | Logs of the Customer clusters databases (all postgres related logs) | +| PostgresAuditLogs_CL | Audit Logs of the Customer clusters databases, if enabled | +| InsightsMetrics | Metrics streams from BigAnimal Prometheus and Azure Monitor. BigAnimal metrics have `namespace == "prometheus"` | + +You can use the KQL Query editor in the Log Workspace view to compose queries +over these tables. + +## Logs + +Postgres logs are added to the `PostgresLogs_CL` table. + +Logs are split into structured fields matching those of the Postgres +[csvlog format](https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG) +with a `record_` prefix and a type-suffix. For example the `application_name` +is in the `record_application_name_s` log field. + +The `pg_cluster_id_s` field identifies the specific postgres cluster +that originated the log message. + +## Metrics Overview + +BigAnimal collects a wide set of metrics about postgres instances into the +`InsightsMetrics` log analytics table. Most of these metrics are acquired +directly from postgres system tables, views, and functions. The postgres +documentation serves as the main reference for these metrics. + +KQL can be used to analyze time-series metrics, report latest samples of +metrics, etc by querying the `InsightsMetrics` table. + +Some data from postgres monitoring system views, tables and functions are +transformed to be easier to consume in Prometheus metrics format. For example, +timestamp fields are generally converted to unix epoch time and/or accompanied +by a relative time-interval metric. Other metrics are aggregated into +categories by label dimensions to limit the number of very specific and +narrowly scoped individual metrics emitted. It would be not be very useful to +report the inactivity period of every single backend, for example, so backend +statistics are aggregated by database, user, `application_name` and backend +state. + +Prometheus [Labels](https://prometheus.io/docs/practices/naming/#labels) +are mapped to Azure metrics +[Dimensions](https://docs.microsoft.com/en-us/azure/azure-monitor/essentials/data-platform-metrics#multi-dimensional-metrics). +Dimensions vary depending on the individual metric, and are documented +separately for each group of related metrics. + +The forwarded Prometheus metrics use structured json fields, particularly for +the `Tags` field. Effective use of them will require use of the +[`todynamic()`](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parsejsonfunction) +function in queries. + +The available set of metrics is subject to change. Metrics may be added, +removed or renamed. Where feasible an effort will be made not to change the +meaning or type of existing metrics without also changing the metric name. + +At time of writing all metrics forwarded from Prometheus are in the +`prometheus` namespace. This may change in a future release. + +Effective use of the available metrics will require an understanding of Azure +time-series data, metrics dimensions, and of the tagging conventions used in +the metrics streams. + +### Metrics tags + +All postgres metrics share a common tagging scheme. Entries will generally +have at least the following tags: + +| Name | Description | +|--------------------------------|-------------| +| address | IP address of the host the metric originated from | +| postgresql | BigAnimal postgres cluster identifier e.g. `p-abcdef012345` | +| role | Postgres instance role, "primary" or "replica" | +| datname | Postgres database name (where applicable) | +| pod_name | k8s pod name | +| hostName | AKS node host name | +| container.azm.ms/clusterName | AKS cluster name | + +When querying for tags best performance is achieved when any filters that do not +require inspection of tags (e.g. filters by metric name) are applied before any +tag-based filters. + +The `Tags` field of a metrics entry is a json-typed field that may be queried +for individual values with `todynamic(Tags).keyname` in KQL. Some uses of values +may require explicit casts to another type e.g. `tostring(...)`. + +Example usage: + +``` +InsightsMetrics +| where Namespace == "prometheus" and Name startswith "cnp_" +| extend t = todynamic(Tags) +| where t.role == "primary" +| project postgres_cluster_id = tostring(t.postgresql), dbname = tostring(t.datname) +| where not (dbname has_any("template0", "template1")) +| distinct postgres_cluster_id, dbname +``` + +[comment1]: # (Generated content see upm-substrate repo config monitoring dir) + +#### Group `cnp_backends` + +Backend counts from `pg_stat_activity` aggregated by the listed label +dimensions. Useful for identifying busy applications, excessive idle +backends, etc. + +Derived from the `pg_stat_activity` view. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_backends_total` | GAUGE | Number of backends | +| `cnp_backends_max_tx_duration_seconds` | GAUGE | Maximum duration of a transaction in seconds | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of the database | +| `usename` | Name of the user | +| `application_name` | Name of the application | +| `state` | State of the backend | + +#### Group `cnp_backends_waiting` + +Postgres-instance-level aggregate information on backends that are blocked +waiting for locks. Does not count I/O waits or other reasons backends might +wait or be blocked. + +Derived from the `pg_locks` view. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_backends_waiting_total` | GAUGE | Total number of backends that are currently waiting on other queries | + +#### Group `cnp_pg_database` + +Per-database metrics for each database in the postgres instance. +Includes per-database vacuum progress information. + +Derived from the `pg_database` catalog. + +See also `cnp_pg_stat_database`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_database_size_bytes` | GAUGE | Disk space used by the database | +| `cnp_pg_database_xid_age` | GAUGE | Number of transactions from the frozen XID to the current one | +| `cnp_pg_database_mxid_age` | GAUGE | Number of multiple transactions (Multixact) from the frozen XID to the current one | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of the database | + +#### Group `cnp_pg_postmaster` + +Data on the postgres instance's managing "postmaster" process. + +Derived from the `pg_postmaster_start_time()` function. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_postmaster_start_time` | GAUGE | Time at which postgres started (based on epoch) | + +#### Group `cnp_pg_replication` + +Physical replication details for a standby postgres instance +as captured from the standby itself. + +Derived from the `pg_last_xact_replay_timestamp()` function. + +Only relevant on standby servers. + +See also `cnp_pg_stat_replication`, `cnp_pg_replication_slots`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_replication_lag` | GAUGE | Replication lag behind primary in seconds | +| `cnp_pg_replication_in_recovery` | GAUGE | Whether the instance is in recovery | + +#### Group `cnp_pg_replication_slots` + +Details about replication slots on a postgres instance. In most +configurations only the primary server will have active replication clients, +but other nodes may still have replication slots. + +Note that logical replication slots are specific to a database, whereas +physical replication slots will have an empty "database" label as they +apply to the postgres instance as a whole. + +Derived from the `pg_replication_slots` view. + +See also `cnp_pg_stat_replication`, `cnp_pg_replication`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_replication_slots_active` | GAUGE | Flag indicating if the slot is active | +| `cnp_pg_replication_slots_pg_wal_lsn_diff` | GAUGE | Replication lag in bytes | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `slot_name` | Name of the replication slot | +| `database` | Name of the database | + +#### Group `cnp_pg_stat_archiver` + +Progress information about WAL archiving. Only the currently active primary +server will generally be performing WAL archiving. + +WAL archiving is important for backup and restore. If WAL archiving is +delayed or failing for too long, the point-in-time recovery backups for +a postgres cluster will not be up to date. This has disaster recovery +implications and can potentially also affect failover. + +Occasional WAL archiving failures are normal, but a growing delay in the time +since the last successful WAL archiving operation should be taken seriously. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_stat_archiver` view. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_archiver_archived_count` | COUNTER | Number of WAL files that have been successfully archived | +| `cnp_pg_stat_archiver_failed_count` | COUNTER | Number of failed attempts for archiving WAL files | +| `cnp_pg_stat_archiver_seconds_since_last_archival` | GAUGE | Seconds since the last successful archival operation | +| `cnp_pg_stat_archiver_seconds_since_last_failure` | GAUGE | Seconds since the last failed archival operation | +| `cnp_pg_stat_archiver_last_archived_time` | GAUGE | Epoch of the last time WAL archiving succeeded | +| `cnp_pg_stat_archiver_last_failed_time` | GAUGE | Epoch of the last time WAL archiving failed | +| `cnp_pg_stat_archiver_last_archived_wal_start_lsn` | GAUGE | Archived WAL start LSN | +| `cnp_pg_stat_archiver_last_failed_wal_start_lsn` | GAUGE | Last failed WAL LSN | +| `cnp_pg_stat_archiver_stats_reset_time` | GAUGE | Time at which these statistics were last reset | + +#### Group `cnp_pg_stat_bgwriter` + +Stats for the postgres background writer and checkpointer processes, which +are instance-wide and shared across all databases in a postgres instance. + +Very long delays between checkpoints on a busy system will increase the time +taken for it to return to read/write availability if crash recovery is +required. Excessively frequent checkpoints can increase I/O load and the size +of the WAL stream for backup and replication. + +The postgres documentation discusses checkpoints, dirty writeback, and +checkpoint tuning in detail. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_stat_bgwriter` catalog. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_bgwriter_checkpoints_timed` | COUNTER | Number of scheduled checkpoints that have been performed | +| `cnp_pg_stat_bgwriter_checkpoints_req` | COUNTER | Number of requested checkpoints that have been performed | +| `cnp_pg_stat_bgwriter_checkpoint_write_time` | COUNTER | Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds | +| `cnp_pg_stat_bgwriter_checkpoint_sync_time` | COUNTER | Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds | +| `cnp_pg_stat_bgwriter_buffers_checkpoint` | COUNTER | Number of buffers written during checkpoints | +| `cnp_pg_stat_bgwriter_buffers_clean` | COUNTER | Number of buffers written by the background writer | +| `cnp_pg_stat_bgwriter_maxwritten_clean` | COUNTER | Number of times the background writer stopped a cleaning scan because it had written too many buffers | +| `cnp_pg_stat_bgwriter_buffers_backend` | COUNTER | Number of buffers written directly by a backend | +| `cnp_pg_stat_bgwriter_buffers_backend_fsync` | COUNTER | Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write) | +| `cnp_pg_stat_bgwriter_buffers_alloc` | COUNTER | Number of buffers allocated | + +#### Group `cnp_pg_stat_database` + +This metrics group directly exposes the summary data postgres collects in its +own `pg_stat_database` view. It contains statistical counters maintained by +postgres itself for database activity. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_stat_database` catalog. + +See also `cnp_pg_database`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_database_xact_commit` | COUNTER | Number of transactions in this database that have been committed | +| `cnp_pg_stat_database_xact_rollback` | COUNTER | Number of transactions in this database that have been rolled back | +| `cnp_pg_stat_database_blks_read` | COUNTER | Number of disk blocks read in this database | +| `cnp_pg_stat_database_blks_hit` | COUNTER | Number of times disk blocks were found already in the buffer cache, so that a read was not necessary (this only includes hits in the PostgreSQL buffer cache, not the operating system's file system cache) | +| `cnp_pg_stat_database_tup_returned` | COUNTER | Number of rows returned by queries in this database | +| `cnp_pg_stat_database_tup_fetched` | COUNTER | Number of rows fetched by queries in this database | +| `cnp_pg_stat_database_tup_inserted` | COUNTER | Number of rows inserted by queries in this database | +| `cnp_pg_stat_database_tup_updated` | COUNTER | Number of rows updated by queries in this database | +| `cnp_pg_stat_database_tup_deleted` | COUNTER | Number of rows deleted by queries in this database | +| `cnp_pg_stat_database_conflicts` | COUNTER | Number of queries canceled due to conflicts with recovery in this database | +| `cnp_pg_stat_database_temp_files` | COUNTER | Number of temporary files created by queries in this database | +| `cnp_pg_stat_database_temp_bytes` | COUNTER | Total amount of data written to temporary files by queries in this database | +| `cnp_pg_stat_database_deadlocks` | COUNTER | Number of deadlocks detected in this database | +| `cnp_pg_stat_database_blk_read_time` | COUNTER | Time spent reading data file blocks by backends in this database, in milliseconds | +| `cnp_pg_stat_database_blk_write_time` | COUNTER | Time spent writing data file blocks by backends in this database, in milliseconds | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of this database | + +#### Group `cnp_pg_stat_database_conflicts` + +These metrics provide information on conflicts between queries on a standby +and the standby's replay of the change-stream from the primary. These are +called recovery conflicts. + +These metrics are unrelated to "INSERT ... ON CONFLICT" conflicts, or +multi-master replication row conflicts. They are only relevant on standby +servers. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Only defined on standby servers. + +Derived from the `pg_stat_database_conflicts` view. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_database_conflicts_confl_tablespace` | COUNTER | Number of queries in this database that have been canceled due to dropped tablespaces | +| `cnp_pg_stat_database_conflicts_confl_lock` | COUNTER | Number of queries in this database that have been canceled due to lock timeouts | +| `cnp_pg_stat_database_conflicts_confl_snapshot` | COUNTER | Number of queries in this database that have been canceled due to old snapshots | +| `cnp_pg_stat_database_conflicts_confl_bufferpin` | COUNTER | Number of queries in this database that have been canceled due to pinned buffers | +| `cnp_pg_stat_database_conflicts_confl_deadlock` | COUNTER | Number of queries in this database that have been canceled due to deadlocks | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of the database | + +#### Group `cnp_pg_stat_user_tables` + +Access and usage statistics maintained by postgres on non-system tables. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_stat_user_tables` view. + +See also `cnp_pg_statio_user_tables`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_user_tables_seq_scan` | COUNTER | Number of sequential scans initiated on this table | +| `cnp_pg_stat_user_tables_seq_tup_read` | COUNTER | Number of live rows fetched by sequential scans | +| `cnp_pg_stat_user_tables_idx_scan` | COUNTER | Number of index scans initiated on this table | +| `cnp_pg_stat_user_tables_idx_tup_fetch` | COUNTER | Number of live rows fetched by index scans | +| `cnp_pg_stat_user_tables_n_tup_ins` | COUNTER | Number of rows inserted | +| `cnp_pg_stat_user_tables_n_tup_upd` | COUNTER | Number of rows updated | +| `cnp_pg_stat_user_tables_n_tup_del` | COUNTER | Number of rows deleted | +| `cnp_pg_stat_user_tables_n_tup_hot_upd` | COUNTER | Number of rows HOT updated (i.e., with no separate index update required) | +| `cnp_pg_stat_user_tables_n_live_tup` | GAUGE | Estimated number of live rows | +| `cnp_pg_stat_user_tables_n_dead_tup` | GAUGE | Estimated number of dead rows | +| `cnp_pg_stat_user_tables_n_mod_since_analyze` | GAUGE | Estimated number of rows changed since last analyze | +| `cnp_pg_stat_user_tables_last_vacuum` | GAUGE | Last time at which this table was manually vacuumed (not counting VACUUM FULL) | +| `cnp_pg_stat_user_tables_last_autovacuum` | GAUGE | Last time at which this table was vacuumed by the autovacuum daemon | +| `cnp_pg_stat_user_tables_last_analyze` | GAUGE | Last time at which this table was manually analyzed | +| `cnp_pg_stat_user_tables_last_autoanalyze` | GAUGE | Last time at which this table was analyzed by the autovacuum daemon | +| `cnp_pg_stat_user_tables_vacuum_count` | COUNTER | Number of times this table has been manually vacuumed (not counting VACUUM FULL) | +| `cnp_pg_stat_user_tables_autovacuum_count` | COUNTER | Number of times this table has been vacuumed by the autovacuum daemon | +| `cnp_pg_stat_user_tables_analyze_count` | COUNTER | Number of times this table has been manually analyzed | +| `cnp_pg_stat_user_tables_autoanalyze_count` | COUNTER | Number of times this table has been analyzed by the autovacuum daemon | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of current database | +| `schemaname` | Name of the schema that this table is in | +| `relname` | Name of this table | + +#### Group `cnp_pg_stat_replication` + +Realtime information about replication connections to this postgres instance, +their progress and activity. + +Metrics in this section are not reset when a postgres stats reset is issued +on the db server. The "stat" in the name is a historic artefact from postgres +development. + +Derived from the `pg_stat_replication` view. + +See also `cnp_pg_replication_slots`, `cnp_pg_replication`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_stat_replication_backend_start` | COUNTER | Time when this process was started | +| `cnp_pg_stat_replication_backend_xmin_age` | COUNTER | The age of this standby's xmin horizon | +| `cnp_pg_stat_replication_sent_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location sent on this connection | +| `cnp_pg_stat_replication_write_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location written to disk by this standby server | +| `cnp_pg_stat_replication_flush_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location flushed to disk by this standby server | +| `cnp_pg_stat_replication_replay_diff_bytes` | GAUGE | Difference in bytes from the last write-ahead log location replayed into the database on this standby server | +| `cnp_pg_stat_replication_write_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written it | +| `cnp_pg_stat_replication_flush_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written and flushed it | +| `cnp_pg_stat_replication_replay_lag_seconds` | GAUGE | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `usename` | Name of the replication user | +| `application_name` | Name of the application | +| `client_addr` | Client IP address | + +#### Group `cnp_pg_statio_user_tables` + +I/O activity statistics maintained by postgres on non-system tables. + +Metrics in this section are reset when a postgres stats reset is issued +on the db server. + +Derived from the `pg_statio_user_tables` view. + +See also `cnp_pg_stat_user_tables`. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_statio_user_tables_heap_blks_read` | COUNTER | Number of disk blocks read from this table | +| `cnp_pg_statio_user_tables_heap_blks_hit` | COUNTER | Number of buffer hits in this table | +| `cnp_pg_statio_user_tables_idx_blks_read` | COUNTER | Number of disk blocks read from all indexes on this table | +| `cnp_pg_statio_user_tables_idx_blks_hit` | COUNTER | Number of buffer hits in all indexes on this table | +| `cnp_pg_statio_user_tables_toast_blks_read` | COUNTER | Number of disk blocks read from this table's TOAST table (if any) | +| `cnp_pg_statio_user_tables_toast_blks_hit` | COUNTER | Number of buffer hits in this table's TOAST table (if any) | +| `cnp_pg_statio_user_tables_tidx_blks_read` | COUNTER | Number of disk blocks read from this table's TOAST table indexes (if any) | +| `cnp_pg_statio_user_tables_tidx_blks_hit` | COUNTER | Number of buffer hits in this table's TOAST table indexes (if any) | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `datname` | Name of current database | +| `schemaname` | Name of the schema that this table is in | +| `relname` | Name of this table | + +#### Group `cnp_pg_settings` + +Expose the subset of postgres server settings that can be represented as +Prometheus compatible metrics - any integer, boolean or real number. +Text-format settings, list-valued settings and enumeration-typed settings are +not captured or reported. + +This set of metrics does not expose per-database settings assigned with +`ALTER DATABASE ... SET ...`, per-user settings assigned with `ALTER USER ... +SET ...`, or per-session values. It only shows the database-system-wide +global values. You can explore other settings interactively using postgres +system views. + +Derived from the `pg_settings` view. + +##### Metrics + +| Metric | Usage | Description | +|----------|-------|-------------| +| `cnp_pg_settings_setting` | GAUGE | Setting value | + + +##### Labels + +The above metrics may have these labels, represented +as dimensions in Azure Monitor: + +| Label | Description | +|-------|-------------| +| `name` | Name of the setting | + +[comment2]: # (End generated content) + +### Other metrics streams + +In addition to postgres metrics from the Cloud Native PostgreSQL operator that +manages databases in BigAnimal, additional metrics about Kubernetes cluster +state and other details may be streamed to the Log Workspace. Any such metrics +are generally well-known metrics from widely used tools, documented by the +upstream vendor of the component. + +Details on individual metrics from such sources will not be listed in this +document. Refer to the documentation of the tool or project that defines the +metrics. + +See also: + +* [Kubernetes cluster metrics](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/). + +Additional streams of metrics may be supplied by the cloud platform itself +directly to the customer's metrics, analytics and dashboarding endpoint. + +### Dive Deeper + +The capabilities available in the Azure portal are too broad to fully cover in this +documentation. They include the ability to: + +* Discover metrics in the Azure Monitor Metrics Explorer (Monitor -> Metrics) +* Query logs and metrics from the Azure Monitor Logs view (Monitor -> Logs) +* Create dashboards backed by metrics queries in the Portal +* Define alerting rules to trigger notifications based on queries +* Use AI-assisted analytics assistant capabilities ("Metrics Advisers") to find + patterns in metrics +* Apply complex analytic tools for time-series data in Application Insights, + including seasonally adjusted statistics to discover patterns, anomalies and + trends.