From 539f4b2eee95aadfe8c563714cc98c5f7d27fecd Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Sat, 10 Aug 2024 22:01:17 +0300 Subject: [PATCH 01/27] add nsd-control to ndsudo (#18301) --- src/collectors/plugins.d/ndsudo.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/collectors/plugins.d/ndsudo.c b/src/collectors/plugins.d/ndsudo.c index eda4953544ef88..dfdc3089aaa298 100644 --- a/src/collectors/plugins.d/ndsudo.c +++ b/src/collectors/plugins.d/ndsudo.c @@ -13,6 +13,14 @@ struct command { const char *params; const char *search[MAX_SEARCH]; } allowed_commands[] = { + { + .name = "nsd-control-stats", + .params = "stats_noreset", + .search = { + [0] = "nsd-control", + [1] = NULL, + }, + }, { .name = "chronyc-serverstats", .params = "serverstats", From 32ae1201bca03050f978dc2a1ac51aa4d1f197f2 Mon Sep 17 00:00:00 2001 From: netdatabot Date: Sun, 11 Aug 2024 00:19:23 +0000 Subject: [PATCH 02/27] [ci skip] Update changelog and version for nightly build: v1.46.0-274-nightly. --- CHANGELOG.md | 2 +- packaging/version | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 636e3e48a0b71f..2c0df0747773d6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,7 @@ **Merged pull requests:** +- add nsd-control to ndsudo [\#18301](https://github.com/netdata/netdata/pull/18301) ([ilyam8](https://github.com/ilyam8)) - Regenerate integrations.js [\#18299](https://github.com/netdata/netdata/pull/18299) ([netdatabot](https://github.com/netdatabot)) - go.d gearman fix meta [\#18298](https://github.com/netdata/netdata/pull/18298) ([ilyam8](https://github.com/ilyam8)) - add go.d/gearman [\#18294](https://github.com/netdata/netdata/pull/18294) ([ilyam8](https://github.com/ilyam8)) @@ -414,7 +415,6 @@ - Fix mongodb default config indentation [\#17715](https://github.com/netdata/netdata/pull/17715) ([louis-lau](https://github.com/louis-lau)) - Fix compilation with disable-cloud [\#17714](https://github.com/netdata/netdata/pull/17714) ([stelfrag](https://github.com/stelfrag)) - fix on link [\#17712](https://github.com/netdata/netdata/pull/17712) ([Ancairon](https://github.com/Ancairon)) -- gha labeler add collectors/windows [\#17711](https://github.com/netdata/netdata/pull/17711) ([ilyam8](https://github.com/ilyam8)) ## [v1.45.6](https://github.com/netdata/netdata/tree/v1.45.6) (2024-06-05) diff --git a/packaging/version b/packaging/version index 0ea7c3fe07b32f..590d8d56a02c56 100644 --- a/packaging/version +++ b/packaging/version @@ -1 +1 @@ -v1.46.0-272-nightly +v1.46.0-274-nightly From 25c2c599d0a95cf98a7db5691e95466aa86b141a Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Sun, 11 Aug 2024 19:11:54 +0300 Subject: [PATCH 03/27] remove python.d/nsd (#18300) --- CMakeLists.txt | 2 - src/collectors/python.d.plugin/nsd/README.md | 1 - .../nsd/integrations/name_server_daemon.md | 232 ------------------ .../python.d.plugin/nsd/metadata.yaml | 201 --------------- .../python.d.plugin/nsd/nsd.chart.py | 105 -------- src/collectors/python.d.plugin/nsd/nsd.conf | 91 ------- 6 files changed, 632 deletions(-) delete mode 120000 src/collectors/python.d.plugin/nsd/README.md delete mode 100644 src/collectors/python.d.plugin/nsd/integrations/name_server_daemon.md delete mode 100644 src/collectors/python.d.plugin/nsd/metadata.yaml delete mode 100644 src/collectors/python.d.plugin/nsd/nsd.chart.py delete mode 100644 src/collectors/python.d.plugin/nsd/nsd.conf diff --git a/CMakeLists.txt b/CMakeLists.txt index daeb17a3da63f3..eee142c7678b81 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -2786,7 +2786,6 @@ install(FILES src/collectors/python.d.plugin/go_expvar/go_expvar.conf src/collectors/python.d.plugin/haproxy/haproxy.conf src/collectors/python.d.plugin/monit/monit.conf - src/collectors/python.d.plugin/nsd/nsd.conf src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.conf src/collectors/python.d.plugin/openldap/openldap.conf src/collectors/python.d.plugin/oracledb/oracledb.conf @@ -2816,7 +2815,6 @@ install(FILES src/collectors/python.d.plugin/go_expvar/go_expvar.chart.py src/collectors/python.d.plugin/haproxy/haproxy.chart.py src/collectors/python.d.plugin/monit/monit.chart.py - src/collectors/python.d.plugin/nsd/nsd.chart.py src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.chart.py src/collectors/python.d.plugin/openldap/openldap.chart.py src/collectors/python.d.plugin/oracledb/oracledb.chart.py diff --git a/src/collectors/python.d.plugin/nsd/README.md b/src/collectors/python.d.plugin/nsd/README.md deleted file mode 120000 index 59fcfe49134540..00000000000000 --- a/src/collectors/python.d.plugin/nsd/README.md +++ /dev/null @@ -1 +0,0 @@ -integrations/name_server_daemon.md \ No newline at end of file diff --git a/src/collectors/python.d.plugin/nsd/integrations/name_server_daemon.md b/src/collectors/python.d.plugin/nsd/integrations/name_server_daemon.md deleted file mode 100644 index 1844304f29e173..00000000000000 --- a/src/collectors/python.d.plugin/nsd/integrations/name_server_daemon.md +++ /dev/null @@ -1,232 +0,0 @@ - - -# Name Server Daemon - - - - - -Plugin: python.d.plugin -Module: nsd - - - -## Overview - -This collector monitors NSD statistics like queries, zones, protocols, query types and more. - - -It uses the `nsd-control stats_noreset` command to gather metrics. - - -This collector is supported on all platforms. - -This collector only supports collecting metrics from a single instance of this integration. - - -### Default Behavior - -#### Auto-Detection - -If permissions are satisfied, the collector will be able to run `nsd-control stats_noreset`, thus collecting metrics. - -#### Limits - -The default configuration for this integration does not impose any limits on data collection. - -#### Performance Impact - -The default configuration for this integration is not expected to impose a significant performance impact on the system. - - -## Metrics - -Metrics grouped by *scope*. - -The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. - - - -### Per Name Server Daemon instance - -These metrics refer to the entire monitored application. - -This scope has no labels. - -Metrics: - -| Metric | Dimensions | Unit | -|:------|:----------|:----| -| nsd.queries | queries | queries/s | -| nsd.zones | master, slave | zones | -| nsd.protocols | udp, udp6, tcp, tcp6 | queries/s | -| nsd.type | A, NS, CNAME, SOA, PTR, HINFO, MX, NAPTR, TXT, AAAA, SRV, ANY | queries/s | -| nsd.transfer | NOTIFY, AXFR | queries/s | -| nsd.rcode | NOERROR, FORMERR, SERVFAIL, NXDOMAIN, NOTIMP, REFUSED, YXDOMAIN | queries/s | - - - -## Alerts - -There are no alerts configured by default for this integration. - - -## Setup - -### Prerequisites - -#### NSD version - -The version of `nsd` must be 4.0+. - - -#### Provide Netdata the permissions to run the command - -Netdata must have permissions to run the `nsd-control stats_noreset` command. - -You can: - -- Add "netdata" user to "nsd" group: - ``` - usermod -aG nsd netdata - ``` -- Add Netdata to sudoers - 1. Edit the sudoers file: - ``` - visudo -f /etc/sudoers.d/netdata - ``` - 2. Add the entry: - ``` - Defaults:netdata !requiretty - netdata ALL=(ALL) NOPASSWD: /usr/sbin/nsd-control stats_noreset - ``` - - > Note that you will need to set the `command` option to `sudo /usr/sbin/nsd-control stats_noreset` if you use this method. - - - -### Configuration - -#### File - -The configuration file name for this integration is `python.d/nsd.conf`. - - -You can edit the configuration file using the `edit-config` script from the -Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). - -```bash -cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata -sudo ./edit-config python.d/nsd.conf -``` -#### Options - -This particular collector does not need further configuration to work if permissions are satisfied, but you can always customize it's data collection behavior. - -There are 2 sections: - -* Global variables -* One or more JOBS that can define multiple different instances to monitor. - -The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. - -Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. - -Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. - - -
Config options - -| Name | Description | Default | Required | -|:----|:-----------|:-------|:--------:| -| update_every | Sets the default data collection frequency. | 30 | no | -| priority | Controls the order of charts at the netdata dashboard. | 60000 | no | -| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no | -| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no | -| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | no | -| command | The command to run | nsd-control stats_noreset | no | - -
- -#### Examples - -##### Basic - -A basic configuration example. - -```yaml -local: - name: 'nsd_local' - command: 'nsd-control stats_noreset' - -``` - - -## Troubleshooting - -### Debug Mode - -To troubleshoot issues with the `nsd` collector, run the `python.d.plugin` with the debug option enabled. The output -should give you clues as to why the collector isn't working. - -- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on - your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. - - ```bash - cd /usr/libexec/netdata/plugins.d/ - ``` - -- Switch to the `netdata` user. - - ```bash - sudo -u netdata -s - ``` - -- Run the `python.d.plugin` to debug the collector: - - ```bash - ./python.d.plugin nsd debug trace - ``` - -### Getting Logs - -If you're encountering problems with the `nsd` collector, follow these steps to retrieve logs and identify potential issues: - -- **Run the command** specific to your system (systemd, non-systemd, or Docker container). -- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem. - -#### System with systemd - -Use the following command to view logs generated since the last Netdata service restart: - -```bash -journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep nsd -``` - -#### System without systemd - -Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name: - -```bash -grep nsd /var/log/netdata/collector.log -``` - -**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues. - -#### Docker Container - -If your Netdata runs in a Docker container named "netdata" (replace if different), use this command: - -```bash -docker logs netdata 2>&1 | grep nsd -``` - - diff --git a/src/collectors/python.d.plugin/nsd/metadata.yaml b/src/collectors/python.d.plugin/nsd/metadata.yaml deleted file mode 100644 index f5e2c46b0adc33..00000000000000 --- a/src/collectors/python.d.plugin/nsd/metadata.yaml +++ /dev/null @@ -1,201 +0,0 @@ -plugin_name: python.d.plugin -modules: - - meta: - plugin_name: python.d.plugin - module_name: nsd - monitored_instance: - name: Name Server Daemon - link: https://nsd.docs.nlnetlabs.nl/en/latest/# - categories: - - data-collection.dns-and-dhcp-servers - icon_filename: "nsd.svg" - related_resources: - integrations: - list: [] - info_provided_to_referring_integrations: - description: "" - keywords: - - nsd - - name server daemon - most_popular: false - overview: - data_collection: - metrics_description: | - This collector monitors NSD statistics like queries, zones, protocols, query types and more. - method_description: | - It uses the `nsd-control stats_noreset` command to gather metrics. - supported_platforms: - include: [] - exclude: [] - multi_instance: false - additional_permissions: - description: "" - default_behavior: - auto_detection: - description: If permissions are satisfied, the collector will be able to run `nsd-control stats_noreset`, thus collecting metrics. - limits: - description: "" - performance_impact: - description: "" - setup: - prerequisites: - list: - - title: NSD version - description: | - The version of `nsd` must be 4.0+. - - title: Provide Netdata the permissions to run the command - description: | - Netdata must have permissions to run the `nsd-control stats_noreset` command. - - You can: - - - Add "netdata" user to "nsd" group: - ``` - usermod -aG nsd netdata - ``` - - Add Netdata to sudoers - 1. Edit the sudoers file: - ``` - visudo -f /etc/sudoers.d/netdata - ``` - 2. Add the entry: - ``` - Defaults:netdata !requiretty - netdata ALL=(ALL) NOPASSWD: /usr/sbin/nsd-control stats_noreset - ``` - - > Note that you will need to set the `command` option to `sudo /usr/sbin/nsd-control stats_noreset` if you use this method. - - configuration: - file: - name: "python.d/nsd.conf" - options: - description: | - This particular collector does not need further configuration to work if permissions are satisfied, but you can always customize it's data collection behavior. - - There are 2 sections: - - * Global variables - * One or more JOBS that can define multiple different instances to monitor. - - The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. - - Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. - - Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. - folding: - title: "Config options" - enabled: true - list: - - name: update_every - description: Sets the default data collection frequency. - default_value: 30 - required: false - - name: priority - description: Controls the order of charts at the netdata dashboard. - default_value: 60000 - required: false - - name: autodetection_retry - description: Sets the job re-check interval in seconds. - default_value: 0 - required: false - - name: penalty - description: Indicates whether to apply penalty to update_every in case of failures. - default_value: yes - required: false - - name: name - description: > - Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed - running at any time. This allows autodetection to try several alternatives and pick the one that works. - default_value: "" - required: false - - name: command - description: The command to run - default_value: "nsd-control stats_noreset" - required: false - examples: - folding: - enabled: true - title: "Config" - list: - - name: Basic - description: A basic configuration example. - folding: - enabled: false - config: | - local: - name: 'nsd_local' - command: 'nsd-control stats_noreset' - troubleshooting: - problems: - list: [] - alerts: [] - metrics: - folding: - title: Metrics - enabled: false - description: "" - availability: [] - scopes: - - name: global - description: "These metrics refer to the entire monitored application." - labels: [] - metrics: - - name: nsd.queries - description: queries - unit: "queries/s" - chart_type: line - dimensions: - - name: queries - - name: nsd.zones - description: zones - unit: "zones" - chart_type: stacked - dimensions: - - name: master - - name: slave - - name: nsd.protocols - description: protocol - unit: "queries/s" - chart_type: stacked - dimensions: - - name: udp - - name: udp6 - - name: tcp - - name: tcp6 - - name: nsd.type - description: query type - unit: "queries/s" - chart_type: stacked - dimensions: - - name: A - - name: NS - - name: CNAME - - name: SOA - - name: PTR - - name: HINFO - - name: MX - - name: NAPTR - - name: TXT - - name: AAAA - - name: SRV - - name: ANY - - name: nsd.transfer - description: transfer - unit: "queries/s" - chart_type: stacked - dimensions: - - name: NOTIFY - - name: AXFR - - name: nsd.rcode - description: return code - unit: "queries/s" - chart_type: stacked - dimensions: - - name: NOERROR - - name: FORMERR - - name: SERVFAIL - - name: NXDOMAIN - - name: NOTIMP - - name: REFUSED - - name: YXDOMAIN diff --git a/src/collectors/python.d.plugin/nsd/nsd.chart.py b/src/collectors/python.d.plugin/nsd/nsd.chart.py deleted file mode 100644 index 6f9b2cec8c89e3..00000000000000 --- a/src/collectors/python.d.plugin/nsd/nsd.chart.py +++ /dev/null @@ -1,105 +0,0 @@ -# -*- coding: utf-8 -*- -# Description: NSD `nsd-control stats_noreset` netdata python.d module -# Author: <383c57 at gmail.com> -# SPDX-License-Identifier: GPL-3.0-or-later - -import re - -from bases.FrameworkServices.ExecutableService import ExecutableService - -update_every = 30 - -NSD_CONTROL_COMMAND = 'nsd-control stats_noreset' -REGEX = re.compile(r'([A-Za-z0-9.]+)=(\d+)') - -ORDER = [ - 'queries', - 'zones', - 'protocol', - 'type', - 'transfer', - 'rcode', -] - -CHARTS = { - 'queries': { - 'options': [None, 'queries', 'queries/s', 'queries', 'nsd.queries', 'line'], - 'lines': [ - ['num_queries', 'queries', 'incremental'] - ] - }, - 'zones': { - 'options': [None, 'zones', 'zones', 'zones', 'nsd.zones', 'stacked'], - 'lines': [ - ['zone_master', 'master', 'absolute'], - ['zone_slave', 'slave', 'absolute'] - ] - }, - 'protocol': { - 'options': [None, 'protocol', 'queries/s', 'protocol', 'nsd.protocols', 'stacked'], - 'lines': [ - ['num_udp', 'udp', 'incremental'], - ['num_udp6', 'udp6', 'incremental'], - ['num_tcp', 'tcp', 'incremental'], - ['num_tcp6', 'tcp6', 'incremental'] - ] - }, - 'type': { - 'options': [None, 'query type', 'queries/s', 'query type', 'nsd.type', 'stacked'], - 'lines': [ - ['num_type_A', 'A', 'incremental'], - ['num_type_NS', 'NS', 'incremental'], - ['num_type_CNAME', 'CNAME', 'incremental'], - ['num_type_SOA', 'SOA', 'incremental'], - ['num_type_PTR', 'PTR', 'incremental'], - ['num_type_HINFO', 'HINFO', 'incremental'], - ['num_type_MX', 'MX', 'incremental'], - ['num_type_NAPTR', 'NAPTR', 'incremental'], - ['num_type_TXT', 'TXT', 'incremental'], - ['num_type_AAAA', 'AAAA', 'incremental'], - ['num_type_SRV', 'SRV', 'incremental'], - ['num_type_TYPE255', 'ANY', 'incremental'] - ] - }, - 'transfer': { - 'options': [None, 'transfer', 'queries/s', 'transfer', 'nsd.transfer', 'stacked'], - 'lines': [ - ['num_opcode_NOTIFY', 'NOTIFY', 'incremental'], - ['num_type_TYPE252', 'AXFR', 'incremental'] - ] - }, - 'rcode': { - 'options': [None, 'return code', 'queries/s', 'return code', 'nsd.rcode', 'stacked'], - 'lines': [ - ['num_rcode_NOERROR', 'NOERROR', 'incremental'], - ['num_rcode_FORMERR', 'FORMERR', 'incremental'], - ['num_rcode_SERVFAIL', 'SERVFAIL', 'incremental'], - ['num_rcode_NXDOMAIN', 'NXDOMAIN', 'incremental'], - ['num_rcode_NOTIMP', 'NOTIMP', 'incremental'], - ['num_rcode_REFUSED', 'REFUSED', 'incremental'], - ['num_rcode_YXDOMAIN', 'YXDOMAIN', 'incremental'] - ] - } -} - - -class Service(ExecutableService): - def __init__(self, configuration=None, name=None): - ExecutableService.__init__(self, configuration=configuration, name=name) - self.order = ORDER - self.definitions = CHARTS - self.command = NSD_CONTROL_COMMAND - - def _get_data(self): - lines = self._get_raw_data() - if not lines: - return None - - stats = dict( - (k.replace('.', '_'), int(v)) for k, v in REGEX.findall(''.join(lines)) - ) - stats.setdefault('num_opcode_NOTIFY', 0) - stats.setdefault('num_type_TYPE252', 0) - stats.setdefault('num_type_TYPE255', 0) - - return stats diff --git a/src/collectors/python.d.plugin/nsd/nsd.conf b/src/collectors/python.d.plugin/nsd/nsd.conf deleted file mode 100644 index 77a8a31775de29..00000000000000 --- a/src/collectors/python.d.plugin/nsd/nsd.conf +++ /dev/null @@ -1,91 +0,0 @@ -# netdata python.d.plugin configuration for nsd -# -# This file is in YaML format. Generally the format is: -# -# name: value -# -# There are 2 sections: -# - global variables -# - one or more JOBS -# -# JOBS allow you to collect values from multiple sources. -# Each source will have its own set of charts. -# -# JOB parameters have to be indented (using spaces only, example below). - -# ---------------------------------------------------------------------- -# Global Variables -# These variables set the defaults for all JOBs, however each JOB -# may define its own, overriding the defaults. - -# update_every sets the default data collection frequency. -# If unset, the python.d.plugin default is used. -# nsd-control is slow, so once every 30 seconds -# update_every: 30 - -# priority controls the order of charts at the netdata dashboard. -# Lower numbers move the charts towards the top of the page. -# If unset, the default for python.d.plugin is used. -# priority: 60000 - -# penalty indicates whether to apply penalty to update_every in case of failures. -# Penalty will increase every 5 failed updates in a row. Maximum penalty is 10 minutes. -# penalty: yes - -# autodetection_retry sets the job re-check interval in seconds. -# The job is not deleted if check fails. -# Attempts to start the job are made once every autodetection_retry. -# This feature is disabled by default. -# autodetection_retry: 0 - -# ---------------------------------------------------------------------- -# JOBS (data collection sources) -# -# The default JOBS share the same *name*. JOBS with the same name -# are mutually exclusive. Only one of them will be allowed running at -# any time. This allows autodetection to try several alternatives and -# pick the one that works. -# -# Any number of jobs is supported. -# -# All python.d.plugin JOBS (for all its modules) support a set of -# predefined parameters. These are: -# -# job_name: -# name: myname # the JOB's name as it will appear at the -# # dashboard (by default is the job_name) -# # JOBs sharing a name are mutually exclusive -# update_every: 1 # the JOB's data collection frequency -# priority: 60000 # the JOB's order on the dashboard -# penalty: yes # the JOB's penalty -# autodetection_retry: 0 # the JOB's re-check interval in seconds -# -# Additionally to the above, nsd also supports the following: -# -# command: 'nsd-control stats_noreset' # the command to run -# - -# ---------------------------------------------------------------------- -# IMPORTANT Information -# -# Netdata must have permissions to run `nsd-control stats_noreset` command -# -# - Example-1 (use "sudo") -# 1. sudoers (e.g. visudo -f /etc/sudoers.d/netdata) -# Defaults:netdata !requiretty -# netdata ALL=(ALL) NOPASSWD: /usr/sbin/nsd-control stats_noreset -# 2. etc/netdata/python.d/nsd.conf -# local: -# update_every: 30 -# command: 'sudo /usr/sbin/nsd-control stats_noreset' -# -# - Example-2 (add "netdata" user to "nsd" group) -# usermod -aG nsd netdata -# - -# ---------------------------------------------------------------------- -# AUTO-DETECTION JOBS - -local: - update_every: 30 - command: 'nsd-control stats_noreset' From 1cd805d515f0f9c5bb97d5abba680d28e1f6a351 Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Sun, 11 Aug 2024 19:12:59 +0300 Subject: [PATCH 04/27] add go.d/nsd (#18302) --- src/collectors/python.d.plugin/python.d.conf | 2 +- src/go/plugin/go.d/modules/init.go | 1 + src/go/plugin/go.d/modules/nsd/charts.go | 249 +++++++++++++ src/go/plugin/go.d/modules/nsd/collect.go | 81 +++++ .../go.d/modules/nsd/config_schema.json | 35 ++ src/go/plugin/go.d/modules/nsd/exec.go | 47 +++ src/go/plugin/go.d/modules/nsd/init.go | 23 ++ src/go/plugin/go.d/modules/nsd/metadata.yaml | 272 ++++++++++++++ src/go/plugin/go.d/modules/nsd/nsd.go | 97 +++++ src/go/plugin/go.d/modules/nsd/nsd_test.go | 337 ++++++++++++++++++ .../plugin/go.d/modules/nsd/stats_counters.go | 123 +++++++ .../go.d/modules/nsd/testdata/config.json | 4 + .../go.d/modules/nsd/testdata/config.yaml | 2 + .../go.d/modules/nsd/testdata/stats.txt | 95 +++++ 14 files changed, 1367 insertions(+), 1 deletion(-) create mode 100644 src/go/plugin/go.d/modules/nsd/charts.go create mode 100644 src/go/plugin/go.d/modules/nsd/collect.go create mode 100644 src/go/plugin/go.d/modules/nsd/config_schema.json create mode 100644 src/go/plugin/go.d/modules/nsd/exec.go create mode 100644 src/go/plugin/go.d/modules/nsd/init.go create mode 100644 src/go/plugin/go.d/modules/nsd/metadata.yaml create mode 100644 src/go/plugin/go.d/modules/nsd/nsd.go create mode 100644 src/go/plugin/go.d/modules/nsd/nsd_test.go create mode 100644 src/go/plugin/go.d/modules/nsd/stats_counters.go create mode 100644 src/go/plugin/go.d/modules/nsd/testdata/config.json create mode 100644 src/go/plugin/go.d/modules/nsd/testdata/config.yaml create mode 100644 src/go/plugin/go.d/modules/nsd/testdata/stats.txt diff --git a/src/collectors/python.d.plugin/python.d.conf b/src/collectors/python.d.plugin/python.d.conf index 5bf617fcc15cfe..0d09ab3b606dba 100644 --- a/src/collectors/python.d.plugin/python.d.conf +++ b/src/collectors/python.d.plugin/python.d.conf @@ -38,7 +38,6 @@ go_expvar: no # haproxy: yes # monit: yes # nvidia_smi: yes -# nsd: yes # openldap: yes # oracledb: yes # pandas: yes @@ -73,6 +72,7 @@ memcached: no # Removed (replaced with go.d/memcached). mongodb: no # Removed (replaced with go.d/mongodb). mysql: no # Removed (replaced with go.d/mysql). nginx: no # Removed (replaced with go.d/nginx). +nsd: no # Removed (replaced with go.d/nsd). postfix: no # Removed (replaced with go.d/postfix). postgres: no # Removed (replaced with go.d/postgres). proxysql: no # Removed (replaced with go.d/proxysql). diff --git a/src/go/plugin/go.d/modules/init.go b/src/go/plugin/go.d/modules/init.go index 2e1cbb5e148700..6b6cb7fbec7fb0 100644 --- a/src/go/plugin/go.d/modules/init.go +++ b/src/go/plugin/go.d/modules/init.go @@ -58,6 +58,7 @@ import ( _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/nginx" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/nginxplus" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/nginxvts" + _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/nsd" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/ntpd" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/nvidia_smi" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/nvme" diff --git a/src/go/plugin/go.d/modules/nsd/charts.go b/src/go/plugin/go.d/modules/nsd/charts.go new file mode 100644 index 00000000000000..aed4f3098d1973 --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/charts.go @@ -0,0 +1,249 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package nsd + +import ( + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" +) + +const ( + prioQueries = module.Priority + iota + prioQueriesByType + prioQueriesByOpcode + prioQueriesByClass + prioQueriesByProtocol + + prioAnswersByRcode + + prioErrors + + prioDrops + + prioZones + prioZoneTransfersRequests + prioZoneTransferMemory + + prioDatabaseSize + + prioUptime +) + +var charts = module.Charts{ + queriesChart.Copy(), + queriesByTypeChart.Copy(), + queriesByOpcodeChart.Copy(), + queriesByClassChart.Copy(), + queriesByProtocolChart.Copy(), + + answersByRcodeChart.Copy(), + + zonesChart.Copy(), + zoneTransfersRequestsChart.Copy(), + zoneTransferMemoryChart.Copy(), + + databaseSizeChart.Copy(), + + errorsChart.Copy(), + + dropsChart.Copy(), + + uptimeChart.Copy(), +} + +var ( + queriesChart = module.Chart{ + ID: "queries", + Title: "Queries", + Units: "queries/s", + Fam: "queries", + Ctx: "nsd.queries", + Priority: prioQueries, + Dims: module.Dims{ + {ID: "num.queries", Name: "queries", Algo: module.Incremental}, + }, + } + queriesByTypeChart = func() module.Chart { + chart := module.Chart{ + ID: "queries_by_type", + Title: "Queries Type", + Units: "queries/s", + Fam: "queries", + Ctx: "nsd.queries_by_type", + Priority: prioQueriesByType, + Type: module.Stacked, + } + for _, v := range queryTypes { + name := v + if s, ok := queryTypeNumberMap[v]; ok { + name = s + } + chart.Dims = append(chart.Dims, &module.Dim{ + ID: "num.type." + v, + Name: name, + Algo: module.Incremental, + }) + } + return chart + }() + queriesByOpcodeChart = func() module.Chart { + chart := module.Chart{ + ID: "queries_by_opcode", + Title: "Queries Opcode", + Units: "queries/s", + Fam: "queries", + Ctx: "nsd.queries_by_opcode", + Priority: prioQueriesByOpcode, + Type: module.Stacked, + } + for _, v := range queryOpcodes { + chart.Dims = append(chart.Dims, &module.Dim{ + ID: "num.opcode." + v, + Name: v, + Algo: module.Incremental, + }) + } + return chart + }() + queriesByClassChart = func() module.Chart { + chart := module.Chart{ + ID: "queries_by_class", + Title: "Queries Class", + Units: "queries/s", + Fam: "queries", + Ctx: "nsd.queries_by_class", + Priority: prioQueriesByClass, + Type: module.Stacked, + } + for _, v := range queryClasses { + chart.Dims = append(chart.Dims, &module.Dim{ + ID: "num.class." + v, + Name: v, + Algo: module.Incremental, + }) + } + return chart + }() + queriesByProtocolChart = module.Chart{ + ID: "queries_by_protocol", + Title: "Queries Protocol", + Units: "queries/s", + Fam: "queries", + Ctx: "nsd.queries_by_protocol", + Priority: prioQueriesByProtocol, + Type: module.Stacked, + Dims: module.Dims{ + {ID: "num.udp", Name: "udp", Algo: module.Incremental}, + {ID: "num.udp6", Name: "udp6", Algo: module.Incremental}, + {ID: "num.tcp", Name: "tcp", Algo: module.Incremental}, + {ID: "num.tcp6", Name: "tcp6", Algo: module.Incremental}, + {ID: "num.tls", Name: "tls", Algo: module.Incremental}, + {ID: "num.tls6", Name: "tls6", Algo: module.Incremental}, + }, + } + + answersByRcodeChart = func() module.Chart { + chart := module.Chart{ + ID: "answers_by_rcode", + Title: "Answers Rcode", + Units: "answers/s", + Fam: "answers", + Ctx: "nsd.answers_by_rcode", + Priority: prioAnswersByRcode, + Type: module.Stacked, + } + for _, v := range answerRcodes { + chart.Dims = append(chart.Dims, &module.Dim{ + ID: "num.rcode." + v, + Name: v, + Algo: module.Incremental, + }) + } + return chart + }() + + errorsChart = module.Chart{ + ID: "errors", + Title: "Errors", + Units: "errors/s", + Fam: "errors", + Ctx: "nsd.errors", + Priority: prioErrors, + Dims: module.Dims{ + {ID: "num.rxerr", Name: "query", Algo: module.Incremental}, + {ID: "num.txerr", Name: "answer", Mul: -1, Algo: module.Incremental}, + }, + } + + dropsChart = module.Chart{ + ID: "drops", + Title: "Drops", + Units: "drops/s", + Fam: "drops", + Ctx: "nsd.drops", + Priority: prioDrops, + Dims: module.Dims{ + {ID: "num.dropped", Name: "query", Algo: module.Incremental}, + }, + } + + zonesChart = module.Chart{ + ID: "zones", + Title: "Zones", + Units: "zones", + Fam: "zones", + Ctx: "nsd.zones", + Priority: prioZones, + Dims: module.Dims{ + {ID: "zone.master", Name: "master"}, + {ID: "zone.slave", Name: "slave"}, + }, + } + zoneTransfersRequestsChart = module.Chart{ + ID: "zone_transfers_requests", + Title: "Zone Transfers", + Units: "requests/s", + Fam: "zones", + Ctx: "nsd.zone_transfers_requests", + Priority: prioZoneTransfersRequests, + Dims: module.Dims{ + {ID: "num.raxfr", Name: "AXFR", Algo: module.Incremental}, + {ID: "num.rixfr", Name: "IXFR", Algo: module.Incremental}, + }, + } + zoneTransferMemoryChart = module.Chart{ + ID: "zone_transfer_memory", + Title: "Zone Transfer Memory", + Units: "bytes", + Fam: "zones", + Ctx: "nsd.zone_transfer_memory", + Priority: prioZoneTransferMemory, + Dims: module.Dims{ + {ID: "size.xfrd.mem", Name: "used"}, + }, + } + + databaseSizeChart = module.Chart{ + ID: "database_size", + Title: "Database Size", + Units: "bytes", + Fam: "database", + Ctx: "nsd.database_size", + Priority: prioDatabaseSize, + Dims: module.Dims{ + {ID: "size.db.disk", Name: "disk"}, + {ID: "size.db.mem", Name: "mem"}, + }, + } + + uptimeChart = module.Chart{ + ID: "uptime", + Title: "Uptime", + Units: "seconds", + Fam: "uptime", + Ctx: "nsd.uptime", + Priority: prioUptime, + Dims: module.Dims{ + {ID: "time.boot", Name: "uptime"}, + }, + } +) diff --git a/src/go/plugin/go.d/modules/nsd/collect.go b/src/go/plugin/go.d/modules/nsd/collect.go new file mode 100644 index 00000000000000..d07341df3c3378 --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/collect.go @@ -0,0 +1,81 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package nsd + +import ( + "bufio" + "bytes" + "errors" + "strconv" + "strings" +) + +func (n *Nsd) collect() (map[string]int64, error) { + stats, err := n.exec.stats() + if err != nil { + return nil, err + } + + if len(stats) == 0 { + return nil, errors.New("empty stats response") + } + + mx := make(map[string]int64) + + sc := bufio.NewScanner(bytes.NewReader(stats)) + + for sc.Scan() { + n.collectStatsLine(mx, sc.Text()) + } + + if len(mx) == 0 { + return nil, errors.New("unexpected stats response: no metrics found") + } + + addMissingMetrics(mx, "num.rcode.", answerRcodes) + addMissingMetrics(mx, "num.opcode.", queryOpcodes) + addMissingMetrics(mx, "num.class.", queryClasses) + addMissingMetrics(mx, "num.type.", queryTypes) + + return mx, nil +} + +func (n *Nsd) collectStatsLine(mx map[string]int64, line string) { + if line = strings.TrimSpace(line); line == "" { + return + } + + key, value, ok := strings.Cut(line, "=") + if !ok { + n.Debugf("invalid line in stats: '%s'", line) + return + } + + var v int64 + var f float64 + var err error + + switch key { + case "time.boot": + f, err = strconv.ParseFloat(value, 64) + v = int64(f) + default: + v, err = strconv.ParseInt(value, 10, 64) + } + + if err != nil { + n.Debugf("invalid value in stats line '%s': '%s'", line, value) + return + } + + mx[key] = v +} + +func addMissingMetrics(mx map[string]int64, prefix string, values []string) { + for _, v := range values { + k := prefix + v + if _, ok := mx[k]; !ok { + mx[k] = 0 + } + } +} diff --git a/src/go/plugin/go.d/modules/nsd/config_schema.json b/src/go/plugin/go.d/modules/nsd/config_schema.json new file mode 100644 index 00000000000000..d49107c716c379 --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/config_schema.json @@ -0,0 +1,35 @@ +{ + "jsonSchema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "NSD collector configuration.", + "type": "object", + "properties": { + "update_every": { + "title": "Update every", + "description": "Data collection interval, measured in seconds.", + "type": "integer", + "minimum": 1, + "default": 10 + }, + "timeout": { + "title": "Timeout", + "description": "Timeout for executing the binary, specified in seconds.", + "type": "number", + "minimum": 0.5, + "default": 2 + } + }, + "additionalProperties": false, + "patternProperties": { + "^name$": {} + } + }, + "uiSchema": { + "uiOptions": { + "fullPage": true + }, + "timeout": { + "ui:help": "Accepts decimals for precise control (e.g., type 1.5 for 1.5 seconds)." + } + } +} diff --git a/src/go/plugin/go.d/modules/nsd/exec.go b/src/go/plugin/go.d/modules/nsd/exec.go new file mode 100644 index 00000000000000..b05082f3c69f4e --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/exec.go @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package nsd + +import ( + "context" + "fmt" + "os/exec" + "time" + + "github.com/netdata/netdata/go/plugins/logger" +) + +type nsdControlBinary interface { + stats() ([]byte, error) +} + +func newNsdControlExec(ndsudoPath string, timeout time.Duration, log *logger.Logger) *nsdControlExec { + return &nsdControlExec{ + Logger: log, + ndsudoPath: ndsudoPath, + timeout: timeout, + } +} + +type nsdControlExec struct { + *logger.Logger + + ndsudoPath string + timeout time.Duration +} + +func (e *nsdControlExec) stats() ([]byte, error) { + ctx, cancel := context.WithTimeout(context.Background(), e.timeout) + defer cancel() + + cmd := exec.CommandContext(ctx, e.ndsudoPath, "nsd-control-stats") + + e.Debugf("executing '%s'", cmd) + + bs, err := cmd.Output() + if err != nil { + return nil, fmt.Errorf("error on '%s': %v", cmd, err) + } + + return bs, nil +} diff --git a/src/go/plugin/go.d/modules/nsd/init.go b/src/go/plugin/go.d/modules/nsd/init.go new file mode 100644 index 00000000000000..63843cababf443 --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/init.go @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package nsd + +import ( + "fmt" + "os" + "path/filepath" + + "github.com/netdata/netdata/go/plugins/pkg/executable" +) + +func (n *Nsd) initNsdControlExec() (nsdControlBinary, error) { + ndsudoPath := filepath.Join(executable.Directory, "ndsudo") + if _, err := os.Stat(ndsudoPath); err != nil { + return nil, fmt.Errorf("ndsudo executable not found: %v", err) + + } + + nsdControl := newNsdControlExec(ndsudoPath, n.Timeout.Duration(), n.Logger) + + return nsdControl, nil +} diff --git a/src/go/plugin/go.d/modules/nsd/metadata.yaml b/src/go/plugin/go.d/modules/nsd/metadata.yaml new file mode 100644 index 00000000000000..a31aa38af33554 --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/metadata.yaml @@ -0,0 +1,272 @@ +plugin_name: go.d.plugin +modules: + - meta: + id: collector-go.d.plugin-nsd + plugin_name: go.d.plugin + module_name: nsd + monitored_instance: + name: NSD + link: "https://nsd.docs.nlnetlabs.nl/en/latest" + icon_filename: 'nsd.svg' + categories: + - data-collection.dns-and-dhcp-servers + keywords: + - nsd + - dns + related_resources: + integrations: + list: [] + info_provided_to_referring_integrations: + description: "" + most_popular: false + overview: + data_collection: + metrics_description: > + This collector monitors NSD statistics like queries, zones, protocols, query types and more. + It relies on the [`nsd-control`](https://nsd.docs.nlnetlabs.nl/en/latest/manpages/nsd-control.html) CLI tool but avoids directly executing the binary. + Instead, it utilizes `ndsudo`, a Netdata helper specifically designed to run privileged commands securely within the Netdata environment. + This approach eliminates the need to use `sudo`, improving security and potentially simplifying permission management. + + Executed commands: + + - `nsd-control stats_noreset` + method_description: "" + supported_platforms: + include: [] + exclude: [] + multi_instance: false + additional_permissions: + description: "" + default_behavior: + auto_detection: + description: "" + limits: + description: "" + performance_impact: + description: "" + setup: + prerequisites: + list: [] + configuration: + file: + name: go.d/nsd.conf + options: + description: | + The following options can be defined globally: update_every. + folding: + title: Config options + enabled: true + list: + - name: update_every + description: Data collection frequency. + default_value: 10 + required: false + - name: timeout + description: nsd-control binary execution timeout. + default_value: 2 + required: false + examples: + folding: + title: Config + enabled: true + list: + - name: Custom update_every + description: Allows you to override the default data collection interval. + config: | + jobs: + - name: nsd + update_every: 5 # Collect logical volume statistics every 5 seconds + troubleshooting: + problems: + list: [] + alerts: [] + metrics: + folding: + title: Metrics + enabled: false + description: "" + availability: [] + scopes: + - name: global + description: These metrics refer to the the entire monitored application. + labels: [] + metrics: + - name: nsd.queries + description: Queries + unit: 'queries/s' + chart_type: line + dimensions: + - name: queries + - name: nsd.queries_by_type + description: Queries Type + unit: 'queries/s' + chart_type: stacked + dimensions: + - name: "A" + - name: "NS" + - name: "MD" + - name: "MF" + - name: "CNAME" + - name: "SOA" + - name: "MB" + - name: "MG" + - name: "MR" + - name: "NULL" + - name: "WKS" + - name: "PTR" + - name: "HINFO" + - name: "MINFO" + - name: "MX" + - name: "TXT" + - name: "RP" + - name: "AFSDB" + - name: "X25" + - name: "ISDN" + - name: "RT" + - name: "NSAP" + - name: "SIG" + - name: "KEY" + - name: "PX" + - name: "AAAA" + - name: "LOC" + - name: "NXT" + - name: "SRV" + - name: "NAPTR" + - name: "KX" + - name: "CERT" + - name: "DNAME" + - name: "OPT" + - name: "APL" + - name: "DS" + - name: "SSHFP" + - name: "IPSECKEY" + - name: "RRSIG" + - name: "NSEC" + - name: "DNSKEY" + - name: "DHCID" + - name: "NSEC3" + - name: "NSEC3PARAM" + - name: "TLSA" + - name: "SMIMEA" + - name: "CDS" + - name: "CDNSKEY" + - name: "OPENPGPKEY" + - name: "CSYNC" + - name: "ZONEMD" + - name: "SVCB" + - name: "HTTPS" + - name: "SPF" + - name: "NID" + - name: "L32" + - name: "L64" + - name: "LP" + - name: "EUI48" + - name: "EUI64" + - name: "URI" + - name: "CAA" + - name: "AVC" + - name: "DLV" + - name: "IXFR" + - name: "AXFR" + - name: "MAILB" + - name: "MAILA" + - name: "ANY" + - name: nsd.queries_by_opcode + description: Queries Opcode + unit: 'queries/s' + chart_type: stacked + dimensions: + - name: "QUERY" + - name: "IQUERY" + - name: "STATUS" + - name: "NOTIFY" + - name: "UPDATE" + - name: "OTHER" + - name: nsd.queries_by_class + description: Queries Class + unit: 'queries/s' + chart_type: stacked + dimensions: + - name: "IN" + - name: "CS" + - name: "CH" + - name: "HS" + - name: nsd.queries_by_protocol + description: Queries Protocol + unit: 'queries/s' + chart_type: stacked + dimensions: + - name: "udp" + - name: "udp6" + - name: "tcp" + - name: "tcp6" + - name: "tls" + - name: "tls6" + - name: nsd.answers_by_rcode + description: Answers Rcode + unit: 'answers/s' + chart_type: stacked + dimensions: + - name: "NOERROR" + - name: "FORMERR" + - name: "SERVFAIL" + - name: "NXDOMAIN" + - name: "NOTIMP" + - name: "REFUSED" + - name: "YXDOMAIN" + - name: "YXRRSET" + - name: "NXRRSET" + - name: "NOTAUTH" + - name: "NOTZONE" + - name: "RCODE11" + - name: "RCODE12" + - name: "RCODE13" + - name: "RCODE14" + - name: "RCODE15" + - name: "BADVERS" + - name: nsd.errors + description: Errors + unit: 'errors/s' + chart_type: line + dimensions: + - name: "query" + - name: "answer" + - name: nsd.drops + description: Drops + unit: 'drops/s' + chart_type: line + dimensions: + - name: "query" + - name: nsd.zones + description: Zones + unit: 'zones' + chart_type: line + dimensions: + - name: "master" + - name: "slave" + - name: nsd.zone_transfers_requests + description: Zone Transfers + unit: 'requests/s' + chart_type: line + dimensions: + - name: "AXFR" + - name: "IXFR" + - name: nsd.zone_transfer_memory + description: Zone Transfer Memory + unit: 'bytes' + chart_type: line + dimensions: + - name: "used" + - name: nsd.database_size + description: Database Size + unit: 'bytes' + chart_type: line + dimensions: + - name: "disk" + - name: "mem" + - name: nsd.uptime + description: Uptime + unit: 'seconds' + chart_type: line + dimensions: + - name: "uptime" diff --git a/src/go/plugin/go.d/modules/nsd/nsd.go b/src/go/plugin/go.d/modules/nsd/nsd.go new file mode 100644 index 00000000000000..39da660975d3db --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/nsd.go @@ -0,0 +1,97 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package nsd + +import ( + _ "embed" + "errors" + "time" + + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" + "github.com/netdata/netdata/go/plugins/plugin/go.d/pkg/web" +) + +//go:embed "config_schema.json" +var configSchema string + +func init() { + module.Register("nsd", module.Creator{ + JobConfigSchema: configSchema, + Defaults: module.Defaults{ + UpdateEvery: 10, + }, + Create: func() module.Module { return New() }, + Config: func() any { return &Config{} }, + }) +} + +func New() *Nsd { + return &Nsd{ + Config: Config{ + Timeout: web.Duration(time.Second * 2), + }, + charts: charts.Copy(), + } +} + +type Config struct { + UpdateEvery int `yaml:"update_every,omitempty" json:"update_every"` + Timeout web.Duration `yaml:"timeout,omitempty" json:"timeout"` +} + +type Nsd struct { + module.Base + Config `yaml:",inline" json:""` + + charts *module.Charts + + exec nsdControlBinary +} + +func (n *Nsd) Configuration() any { + return n.Config +} + +func (n *Nsd) Init() error { + nsdControl, err := n.initNsdControlExec() + if err != nil { + n.Errorf("nvm-control exec initialization: %v", err) + return err + } + n.exec = nsdControl + + return nil +} + +func (n *Nsd) Check() error { + mx, err := n.collect() + if err != nil { + n.Error(err) + return err + } + + if len(mx) == 0 { + return errors.New("no metrics collected") + } + + return nil +} + +func (n *Nsd) Charts() *module.Charts { + return n.charts +} + +func (n *Nsd) Collect() map[string]int64 { + mx, err := n.collect() + if err != nil { + n.Error(err) + } + + if len(mx) == 0 { + return nil + } + + return mx +} + +func (n *Nsd) Cleanup() {} diff --git a/src/go/plugin/go.d/modules/nsd/nsd_test.go b/src/go/plugin/go.d/modules/nsd/nsd_test.go new file mode 100644 index 00000000000000..24f38b512144bf --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/nsd_test.go @@ -0,0 +1,337 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package nsd + +import ( + "errors" + "os" + "testing" + + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +var ( + dataConfigJSON, _ = os.ReadFile("testdata/config.json") + dataConfigYAML, _ = os.ReadFile("testdata/config.yaml") + + dataStats, _ = os.ReadFile("testdata/stats.txt") +) + +func Test_testDataIsValid(t *testing.T) { + for name, data := range map[string][]byte{ + "dataConfigJSON": dataConfigJSON, + "dataConfigYAML": dataConfigYAML, + "dataStats": dataStats, + } { + require.NotNil(t, data, name) + + } +} + +func TestNsd_Configuration(t *testing.T) { + module.TestConfigurationSerialize(t, &Nsd{}, dataConfigJSON, dataConfigYAML) +} + +func TestNsd_Init(t *testing.T) { + tests := map[string]struct { + config Config + wantFail bool + }{ + "fails if failed to locate ndsudo": { + wantFail: true, + config: New().Config, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + nsd := New() + nsd.Config = test.config + + if test.wantFail { + assert.Error(t, nsd.Init()) + } else { + assert.NoError(t, nsd.Init()) + } + }) + } +} + +func TestNsd_Cleanup(t *testing.T) { + tests := map[string]struct { + prepare func() *Nsd + }{ + "not initialized exec": { + prepare: func() *Nsd { + return New() + }, + }, + "after check": { + prepare: func() *Nsd { + nsd := New() + nsd.exec = prepareMockOK() + _ = nsd.Check() + return nsd + }, + }, + "after collect": { + prepare: func() *Nsd { + nsd := New() + nsd.exec = prepareMockOK() + _ = nsd.Collect() + return nsd + }, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + nsd := test.prepare() + + assert.NotPanics(t, nsd.Cleanup) + }) + } +} + +func TestNsd_Charts(t *testing.T) { + assert.NotNil(t, New().Charts()) +} + +func TestNsd_Check(t *testing.T) { + tests := map[string]struct { + prepareMock func() *mockNsdControl + wantFail bool + }{ + "success case": { + prepareMock: prepareMockOK, + wantFail: false, + }, + "error on stats call": { + prepareMock: prepareMockErrOnStats, + wantFail: true, + }, + "empty response": { + prepareMock: prepareMockEmptyResponse, + wantFail: true, + }, + "unexpected response": { + prepareMock: prepareMockUnexpectedResponse, + wantFail: true, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + nsd := New() + mock := test.prepareMock() + nsd.exec = mock + + if test.wantFail { + assert.Error(t, nsd.Check()) + } else { + assert.NoError(t, nsd.Check()) + } + }) + } +} + +func TestNsd_Collect(t *testing.T) { + tests := map[string]struct { + prepareMock func() *mockNsdControl + wantMetrics map[string]int64 + }{ + "success case": { + prepareMock: prepareMockOK, + wantMetrics: map[string]int64{ + "num.answer_wo_aa": 1, + "num.class.CH": 0, + "num.class.CS": 0, + "num.class.HS": 0, + "num.class.IN": 1, + "num.dropped": 1, + "num.edns": 1, + "num.ednserr": 1, + "num.opcode.IQUERY": 0, + "num.opcode.NOTIFY": 0, + "num.opcode.OTHER": 0, + "num.opcode.QUERY": 1, + "num.opcode.STATUS": 0, + "num.opcode.UPDATE": 0, + "num.queries": 1, + "num.raxfr": 1, + "num.rcode.BADVERS": 0, + "num.rcode.FORMERR": 1, + "num.rcode.NOERROR": 1, + "num.rcode.NOTAUTH": 0, + "num.rcode.NOTIMP": 1, + "num.rcode.NOTZONE": 0, + "num.rcode.NXDOMAIN": 1, + "num.rcode.NXRRSET": 0, + "num.rcode.RCODE11": 0, + "num.rcode.RCODE12": 0, + "num.rcode.RCODE13": 0, + "num.rcode.RCODE14": 0, + "num.rcode.RCODE15": 0, + "num.rcode.REFUSED": 1, + "num.rcode.SERVFAIL": 1, + "num.rcode.YXDOMAIN": 1, + "num.rcode.YXRRSET": 0, + "num.rixfr": 1, + "num.rxerr": 1, + "num.tcp": 1, + "num.tcp6": 1, + "num.tls": 1, + "num.tls6": 1, + "num.truncated": 1, + "num.txerr": 1, + "num.type.A": 1, + "num.type.AAAA": 1, + "num.type.AFSDB": 1, + "num.type.APL": 1, + "num.type.AVC": 0, + "num.type.CAA": 0, + "num.type.CDNSKEY": 1, + "num.type.CDS": 1, + "num.type.CERT": 1, + "num.type.CNAME": 1, + "num.type.CSYNC": 1, + "num.type.DHCID": 1, + "num.type.DLV": 0, + "num.type.DNAME": 1, + "num.type.DNSKEY": 1, + "num.type.DS": 1, + "num.type.EUI48": 1, + "num.type.EUI64": 1, + "num.type.HINFO": 1, + "num.type.HTTPS": 1, + "num.type.IPSECKEY": 1, + "num.type.ISDN": 1, + "num.type.KEY": 1, + "num.type.KX": 1, + "num.type.L32": 1, + "num.type.L64": 1, + "num.type.LOC": 1, + "num.type.LP": 1, + "num.type.MB": 1, + "num.type.MD": 1, + "num.type.MF": 1, + "num.type.MG": 1, + "num.type.MINFO": 1, + "num.type.MR": 1, + "num.type.MX": 1, + "num.type.NAPTR": 1, + "num.type.NID": 1, + "num.type.NS": 1, + "num.type.NSAP": 1, + "num.type.NSEC": 1, + "num.type.NSEC3": 1, + "num.type.NSEC3PARAM": 1, + "num.type.NULL": 1, + "num.type.NXT": 1, + "num.type.OPENPGPKEY": 1, + "num.type.OPT": 1, + "num.type.PTR": 1, + "num.type.PX": 1, + "num.type.RP": 1, + "num.type.RRSIG": 1, + "num.type.RT": 1, + "num.type.SIG": 1, + "num.type.SMIMEA": 1, + "num.type.SOA": 1, + "num.type.SPF": 1, + "num.type.SRV": 1, + "num.type.SSHFP": 1, + "num.type.SVCB": 1, + "num.type.TLSA": 1, + "num.type.TXT": 1, + "num.type.TYPE252": 0, + "num.type.TYPE255": 0, + "num.type.URI": 0, + "num.type.WKS": 1, + "num.type.X25": 1, + "num.type.ZONEMD": 1, + "num.udp": 1, + "num.udp6": 1, + "server0.queries": 1, + "size.config.disk": 1, + "size.config.mem": 1064, + "size.db.disk": 576, + "size.db.mem": 920, + "size.xfrd.mem": 1160464, + "time.boot": 556, + "zone.master": 1, + "zone.slave": 1, + }, + }, + "error on lvs report call": { + prepareMock: prepareMockErrOnStats, + wantMetrics: nil, + }, + "empty response": { + prepareMock: prepareMockEmptyResponse, + wantMetrics: nil, + }, + "unexpected response": { + prepareMock: prepareMockUnexpectedResponse, + wantMetrics: nil, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + nsd := New() + mock := test.prepareMock() + nsd.exec = mock + + mx := nsd.Collect() + + assert.Equal(t, test.wantMetrics, mx) + + if len(test.wantMetrics) > 0 { + assert.Len(t, *nsd.Charts(), len(charts)) + module.TestMetricsHasAllChartsDims(t, nsd.Charts(), mx) + } + }) + } +} + +func prepareMockOK() *mockNsdControl { + return &mockNsdControl{ + dataStats: dataStats, + } +} + +func prepareMockErrOnStats() *mockNsdControl { + return &mockNsdControl{ + errOnStatus: true, + } +} + +func prepareMockEmptyResponse() *mockNsdControl { + return &mockNsdControl{} +} + +func prepareMockUnexpectedResponse() *mockNsdControl { + return &mockNsdControl{ + dataStats: []byte(` +Lorem ipsum dolor sit amet, consectetur adipiscing elit. +Nulla malesuada erat id magna mattis, eu viverra tellus rhoncus. +Fusce et felis pulvinar, posuere sem non, porttitor eros. +`), + } +} + +type mockNsdControl struct { + errOnStatus bool + dataStats []byte +} + +func (m *mockNsdControl) stats() ([]byte, error) { + if m.errOnStatus { + return nil, errors.New("mock.status() error") + } + return m.dataStats, nil +} diff --git a/src/go/plugin/go.d/modules/nsd/stats_counters.go b/src/go/plugin/go.d/modules/nsd/stats_counters.go new file mode 100644 index 00000000000000..8ebe706a5b1f3a --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/stats_counters.go @@ -0,0 +1,123 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package nsd + +// Docs: https://nsd.docs.nlnetlabs.nl/en/latest/manpages/nsd-control.html?highlight=elapsed#statistics-counters +// Source: https://github.com/NLnetLabs/nsd/blob/b4a5ccd2235a1f8f71f7c640390e409bf123c963/remote.c#L2735 + +// https://github.com/NLnetLabs/nsd/blob/b4a5ccd2235a1f8f71f7c640390e409bf123c963/remote.c#L2737 +var answerRcodes = []string{ + "NOERROR", + "FORMERR", + "SERVFAIL", + "NXDOMAIN", + "NOTIMP", + "REFUSED", + "YXDOMAIN", + "YXRRSET", + "NXRRSET", + "NOTAUTH", + "NOTZONE", + "RCODE11", + "RCODE12", + "RCODE13", + "RCODE14", + "RCODE15", + "BADVERS", +} + +// https://github.com/NLnetLabs/nsd/blob/b4a5ccd2235a1f8f71f7c640390e409bf123c963/remote.c#L2706 +var queryOpcodes = []string{ + "QUERY", + "IQUERY", + "STATUS", + "NOTIFY", + "UPDATE", + "OTHER", +} + +// https://github.com/NLnetLabs/nsd/blob/b4a5ccd2235a1f8f71f7c640390e409bf123c963/dns.c#L27 +var queryClasses = []string{ + "IN", + "CS", + "CH", + "HS", +} + +// https://github.com/NLnetLabs/nsd/blob/b4a5ccd2235a1f8f71f7c640390e409bf123c963/dns.c#L35 +var queryTypes = []string{ + "A", + "NS", + "MD", + "MF", + "CNAME", + "SOA", + "MB", + "MG", + "MR", + "NULL", + "WKS", + "PTR", + "HINFO", + "MINFO", + "MX", + "TXT", + "RP", + "AFSDB", + "X25", + "ISDN", + "RT", + "NSAP", + "SIG", + "KEY", + "PX", + "AAAA", + "LOC", + "NXT", + "SRV", + "NAPTR", + "KX", + "CERT", + "DNAME", + "OPT", + "APL", + "DS", + "SSHFP", + "IPSECKEY", + "RRSIG", + "NSEC", + "DNSKEY", + "DHCID", + "NSEC3", + "NSEC3PARAM", + "TLSA", + "SMIMEA", + "CDS", + "CDNSKEY", + "OPENPGPKEY", + "CSYNC", + "ZONEMD", + "SVCB", + "HTTPS", + "SPF", + "NID", + "L32", + "L64", + "LP", + "EUI48", + "EUI64", + "URI", + "CAA", + "AVC", + "DLV", + "TYPE252", + "TYPE255", +} + +var queryTypeNumberMap = map[string]string{ + "TYPE251": "IXFR", + "TYPE252": "AXFR", + "TYPE253": "MAILB", + "TYPE254": "MAILA", + "TYPE255": "ANY", +} diff --git a/src/go/plugin/go.d/modules/nsd/testdata/config.json b/src/go/plugin/go.d/modules/nsd/testdata/config.json new file mode 100644 index 00000000000000..291ecee3d63d06 --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/testdata/config.json @@ -0,0 +1,4 @@ +{ + "update_every": 123, + "timeout": 123.123 +} diff --git a/src/go/plugin/go.d/modules/nsd/testdata/config.yaml b/src/go/plugin/go.d/modules/nsd/testdata/config.yaml new file mode 100644 index 00000000000000..25b0b4c780de56 --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/testdata/config.yaml @@ -0,0 +1,2 @@ +update_every: 123 +timeout: 123.123 diff --git a/src/go/plugin/go.d/modules/nsd/testdata/stats.txt b/src/go/plugin/go.d/modules/nsd/testdata/stats.txt new file mode 100644 index 00000000000000..cb6d8b82914b5c --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/testdata/stats.txt @@ -0,0 +1,95 @@ +server0.queries=1 +num.queries=1 +time.boot=556.488415 +time.elapsed=556.488415 +size.db.disk=576 +size.db.mem=920 +size.xfrd.mem=1160464 +size.config.disk=1 +size.config.mem=1064 +num.type.A=1 +num.type.NS=1 +num.type.MD=1 +num.type.MF=1 +num.type.CNAME=1 +num.type.SOA=1 +num.type.MB=1 +num.type.MG=1 +num.type.MR=1 +num.type.NULL=1 +num.type.WKS=1 +num.type.PTR=1 +num.type.HINFO=1 +num.type.MINFO=1 +num.type.MX=1 +num.type.TXT=1 +num.type.RP=1 +num.type.AFSDB=1 +num.type.X25=1 +num.type.ISDN=1 +num.type.RT=1 +num.type.NSAP=1 +num.type.SIG=1 +num.type.KEY=1 +num.type.PX=1 +num.type.AAAA=1 +num.type.LOC=1 +num.type.NXT=1 +num.type.SRV=1 +num.type.NAPTR=1 +num.type.KX=1 +num.type.CERT=1 +num.type.DNAME=1 +num.type.OPT=1 +num.type.APL=1 +num.type.DS=1 +num.type.SSHFP=1 +num.type.IPSECKEY=1 +num.type.RRSIG=1 +num.type.NSEC=1 +num.type.DNSKEY=1 +num.type.DHCID=1 +num.type.NSEC3=1 +num.type.NSEC3PARAM=1 +num.type.TLSA=1 +num.type.SMIMEA=1 +num.type.CDS=1 +num.type.CDNSKEY=1 +num.type.OPENPGPKEY=1 +num.type.CSYNC=1 +num.type.ZONEMD=1 +num.type.SVCB=1 +num.type.HTTPS=1 +num.type.SPF=1 +num.type.NID=1 +num.type.L32=1 +num.type.L64=1 +num.type.LP=1 +num.type.EUI48=1 +num.type.EUI64=1 +num.opcode.QUERY=1 +num.class.IN=1 +num.rcode.NOERROR=1 +num.rcode.FORMERR=1 +num.rcode.SERVFAIL=1 +num.rcode.NXDOMAIN=1 +num.rcode.NOTIMP=1 +num.rcode.REFUSED=1 +num.rcode.YXDOMAIN=1 +num.edns=1 +num.ednserr=1 +num.udp=1 +num.udp6=1 +num.tcp=1 +num.tcp6=1 +num.tls=1 +num.tls6=1 +num.answer_wo_aa=1 +num.rxerr=1 +num.txerr=1 +num.raxfr=1 +num.rixfr=1 +num.truncated=1 +num.dropped=1 +zone.master=1 +zone.slave=1 From bebf19b139732a10269e8d99a0875190e8b70cef Mon Sep 17 00:00:00 2001 From: Netdata bot <43409846+netdatabot@users.noreply.github.com> Date: Sun, 11 Aug 2024 12:25:15 -0400 Subject: [PATCH 05/27] Regenerate integrations.js (#18303) Co-authored-by: ilyam8 <22274335+ilyam8@users.noreply.github.com> --- integrations/integrations.js | 73 +++---- integrations/integrations.json | 73 +++---- src/collectors/COLLECTORS.md | 2 +- src/go/plugin/go.d/modules/nsd/README.md | 1 + .../go.d/modules/nsd/integrations/nsd.md | 201 ++++++++++++++++++ 5 files changed, 277 insertions(+), 73 deletions(-) create mode 120000 src/go/plugin/go.d/modules/nsd/README.md create mode 100644 src/go/plugin/go.d/modules/nsd/integrations/nsd.md diff --git a/integrations/integrations.js b/integrations/integrations.js index c7d4b290027e3f..3b64742b87dc4f 100644 --- a/integrations/integrations.js +++ b/integrations/integrations.js @@ -5400,6 +5400,43 @@ export const integrations = [ "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/nginxvts/metadata.yaml", "related_resources": "" }, + { + "meta": { + "id": "collector-go.d.plugin-nsd", + "plugin_name": "go.d.plugin", + "module_name": "nsd", + "monitored_instance": { + "name": "NSD", + "link": "https://nsd.docs.nlnetlabs.nl/en/latest", + "icon_filename": "nsd.svg", + "categories": [ + "data-collection.dns-and-dhcp-servers" + ] + }, + "keywords": [ + "nsd", + "dns" + ], + "related_resources": { + "integrations": { + "list": [] + } + }, + "info_provided_to_referring_integrations": { + "description": "" + }, + "most_popular": false + }, + "overview": "# NSD\n\nPlugin: go.d.plugin\nModule: nsd\n\n## Overview\n\nThis collector monitors NSD statistics like queries, zones, protocols, query types and more. It relies on the [`nsd-control`](https://nsd.docs.nlnetlabs.nl/en/latest/manpages/nsd-control.html) CLI tool but avoids directly executing the binary. Instead, it utilizes `ndsudo`, a Netdata helper specifically designed to run privileged commands securely within the Netdata environment. This approach eliminates the need to use `sudo`, improving security and potentially simplifying permission management.\nExecuted commands:\n- `nsd-control stats_noreset`\n\n\n\n\nThis collector is supported on all platforms.\n\nThis collector only supports collecting metrics from a single instance of this integration.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis integration doesn't support auto-detection.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", + "setup": "## Setup\n\n### Prerequisites\n\nNo action required.\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/nsd.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/nsd.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| timeout | nsd-control binary execution timeout. | 2 | no |\n\n{% /details %}\n#### Examples\n\n##### Custom update_every\n\nAllows you to override the default data collection interval.\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: nsd\n update_every: 5 # Collect logical volume statistics every 5 seconds\n\n```\n{% /details %}\n", + "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `nsd` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m nsd\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `nsd` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep nsd\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep nsd /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep nsd\n```\n\n", + "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", + "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per NSD instance\n\nThese metrics refer to the the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nsd.queries | queries | queries/s |\n| nsd.queries_by_type | A, NS, MD, MF, CNAME, SOA, MB, MG, MR, NULL, WKS, PTR, HINFO, MINFO, MX, TXT, RP, AFSDB, X25, ISDN, RT, NSAP, SIG, KEY, PX, AAAA, LOC, NXT, SRV, NAPTR, KX, CERT, DNAME, OPT, APL, DS, SSHFP, IPSECKEY, RRSIG, NSEC, DNSKEY, DHCID, NSEC3, NSEC3PARAM, TLSA, SMIMEA, CDS, CDNSKEY, OPENPGPKEY, CSYNC, ZONEMD, SVCB, HTTPS, SPF, NID, L32, L64, LP, EUI48, EUI64, URI, CAA, AVC, DLV, IXFR, AXFR, MAILB, MAILA, ANY | queries/s |\n| nsd.queries_by_opcode | QUERY, IQUERY, STATUS, NOTIFY, UPDATE, OTHER | queries/s |\n| nsd.queries_by_class | IN, CS, CH, HS | queries/s |\n| nsd.queries_by_protocol | udp, udp6, tcp, tcp6, tls, tls6 | queries/s |\n| nsd.answers_by_rcode | NOERROR, FORMERR, SERVFAIL, NXDOMAIN, NOTIMP, REFUSED, YXDOMAIN, YXRRSET, NXRRSET, NOTAUTH, NOTZONE, RCODE11, RCODE12, RCODE13, RCODE14, RCODE15, BADVERS | answers/s |\n| nsd.errors | query, answer | errors/s |\n| nsd.drops | query | drops/s |\n| nsd.zones | master, slave | zones |\n| nsd.zone_transfers_requests | AXFR, IXFR | requests/s |\n| nsd.zone_transfer_memory | used | bytes |\n| nsd.database_size | disk, mem | bytes |\n| nsd.uptime | uptime | seconds |\n\n", + "integration_type": "collector", + "id": "go.d.plugin-nsd-NSD", + "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/nsd/metadata.yaml", + "related_resources": "" + }, { "meta": { "id": "collector-go.d.plugin-ntpd", @@ -19068,42 +19105,6 @@ export const integrations = [ "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/monit/metadata.yaml", "related_resources": "" }, - { - "meta": { - "plugin_name": "python.d.plugin", - "module_name": "nsd", - "monitored_instance": { - "name": "Name Server Daemon", - "link": "https://nsd.docs.nlnetlabs.nl/en/latest/#", - "categories": [ - "data-collection.dns-and-dhcp-servers" - ], - "icon_filename": "nsd.svg" - }, - "related_resources": { - "integrations": { - "list": [] - } - }, - "info_provided_to_referring_integrations": { - "description": "" - }, - "keywords": [ - "nsd", - "name server daemon" - ], - "most_popular": false - }, - "overview": "# Name Server Daemon\n\nPlugin: python.d.plugin\nModule: nsd\n\n## Overview\n\nThis collector monitors NSD statistics like queries, zones, protocols, query types and more.\n\n\nIt uses the `nsd-control stats_noreset` command to gather metrics.\n\n\nThis collector is supported on all platforms.\n\nThis collector only supports collecting metrics from a single instance of this integration.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nIf permissions are satisfied, the collector will be able to run `nsd-control stats_noreset`, thus collecting metrics.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### NSD version\n\nThe version of `nsd` must be 4.0+.\n\n\n#### Provide Netdata the permissions to run the command\n\nNetdata must have permissions to run the `nsd-control stats_noreset` command.\n\nYou can:\n\n- Add \"netdata\" user to \"nsd\" group:\n ```\n usermod -aG nsd netdata\n ```\n- Add Netdata to sudoers\n 1. Edit the sudoers file:\n ```\n visudo -f /etc/sudoers.d/netdata\n ```\n 2. Add the entry:\n ```\n Defaults:netdata !requiretty\n netdata ALL=(ALL) NOPASSWD: /usr/sbin/nsd-control stats_noreset\n ```\n\n > Note that you will need to set the `command` option to `sudo /usr/sbin/nsd-control stats_noreset` if you use this method.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `python.d/nsd.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config python.d/nsd.conf\n```\n#### Options\n\nThis particular collector does not need further configuration to work if permissions are satisfied, but you can always customize it's data collection behavior.\n\nThere are 2 sections:\n\n* Global variables\n* One or more JOBS that can define multiple different instances to monitor.\n\nThe following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values.\n\nAdditionally, the following collapsed table contains all the options that can be configured inside a JOB definition.\n\nEvery configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Sets the default data collection frequency. | 30 | no |\n| priority | Controls the order of charts at the netdata dashboard. | 60000 | no |\n| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no |\n| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no |\n| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | no |\n| command | The command to run | nsd-control stats_noreset | no |\n\n{% /details %}\n#### Examples\n\n##### Basic\n\nA basic configuration example.\n\n```yaml\nlocal:\n name: 'nsd_local'\n command: 'nsd-control stats_noreset'\n\n```\n", - "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `nsd` collector, run the `python.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `python.d.plugin` to debug the collector:\n\n ```bash\n ./python.d.plugin nsd debug trace\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `nsd` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep nsd\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep nsd /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep nsd\n```\n\n", - "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", - "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per Name Server Daemon instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nsd.queries | queries | queries/s |\n| nsd.zones | master, slave | zones |\n| nsd.protocols | udp, udp6, tcp, tcp6 | queries/s |\n| nsd.type | A, NS, CNAME, SOA, PTR, HINFO, MX, NAPTR, TXT, AAAA, SRV, ANY | queries/s |\n| nsd.transfer | NOTIFY, AXFR | queries/s |\n| nsd.rcode | NOERROR, FORMERR, SERVFAIL, NXDOMAIN, NOTIMP, REFUSED, YXDOMAIN | queries/s |\n\n", - "integration_type": "collector", - "id": "python.d.plugin-nsd-Name_Server_Daemon", - "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/nsd/metadata.yaml", - "related_resources": "" - }, { "meta": { "plugin_name": "python.d.plugin", diff --git a/integrations/integrations.json b/integrations/integrations.json index e8783b4e5ded9d..90a68a43b68a08 100644 --- a/integrations/integrations.json +++ b/integrations/integrations.json @@ -5398,6 +5398,43 @@ "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/nginxvts/metadata.yaml", "related_resources": "" }, + { + "meta": { + "id": "collector-go.d.plugin-nsd", + "plugin_name": "go.d.plugin", + "module_name": "nsd", + "monitored_instance": { + "name": "NSD", + "link": "https://nsd.docs.nlnetlabs.nl/en/latest", + "icon_filename": "nsd.svg", + "categories": [ + "data-collection.dns-and-dhcp-servers" + ] + }, + "keywords": [ + "nsd", + "dns" + ], + "related_resources": { + "integrations": { + "list": [] + } + }, + "info_provided_to_referring_integrations": { + "description": "" + }, + "most_popular": false + }, + "overview": "# NSD\n\nPlugin: go.d.plugin\nModule: nsd\n\n## Overview\n\nThis collector monitors NSD statistics like queries, zones, protocols, query types and more. It relies on the [`nsd-control`](https://nsd.docs.nlnetlabs.nl/en/latest/manpages/nsd-control.html) CLI tool but avoids directly executing the binary. Instead, it utilizes `ndsudo`, a Netdata helper specifically designed to run privileged commands securely within the Netdata environment. This approach eliminates the need to use `sudo`, improving security and potentially simplifying permission management.\nExecuted commands:\n- `nsd-control stats_noreset`\n\n\n\n\nThis collector is supported on all platforms.\n\nThis collector only supports collecting metrics from a single instance of this integration.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis integration doesn't support auto-detection.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", + "setup": "## Setup\n\n### Prerequisites\n\nNo action required.\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/nsd.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/nsd.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| timeout | nsd-control binary execution timeout. | 2 | no |\n\n#### Examples\n\n##### Custom update_every\n\nAllows you to override the default data collection interval.\n\n```yaml\njobs:\n - name: nsd\n update_every: 5 # Collect logical volume statistics every 5 seconds\n\n```\n", + "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `nsd` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m nsd\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `nsd` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep nsd\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep nsd /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep nsd\n```\n\n", + "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", + "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per NSD instance\n\nThese metrics refer to the the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nsd.queries | queries | queries/s |\n| nsd.queries_by_type | A, NS, MD, MF, CNAME, SOA, MB, MG, MR, NULL, WKS, PTR, HINFO, MINFO, MX, TXT, RP, AFSDB, X25, ISDN, RT, NSAP, SIG, KEY, PX, AAAA, LOC, NXT, SRV, NAPTR, KX, CERT, DNAME, OPT, APL, DS, SSHFP, IPSECKEY, RRSIG, NSEC, DNSKEY, DHCID, NSEC3, NSEC3PARAM, TLSA, SMIMEA, CDS, CDNSKEY, OPENPGPKEY, CSYNC, ZONEMD, SVCB, HTTPS, SPF, NID, L32, L64, LP, EUI48, EUI64, URI, CAA, AVC, DLV, IXFR, AXFR, MAILB, MAILA, ANY | queries/s |\n| nsd.queries_by_opcode | QUERY, IQUERY, STATUS, NOTIFY, UPDATE, OTHER | queries/s |\n| nsd.queries_by_class | IN, CS, CH, HS | queries/s |\n| nsd.queries_by_protocol | udp, udp6, tcp, tcp6, tls, tls6 | queries/s |\n| nsd.answers_by_rcode | NOERROR, FORMERR, SERVFAIL, NXDOMAIN, NOTIMP, REFUSED, YXDOMAIN, YXRRSET, NXRRSET, NOTAUTH, NOTZONE, RCODE11, RCODE12, RCODE13, RCODE14, RCODE15, BADVERS | answers/s |\n| nsd.errors | query, answer | errors/s |\n| nsd.drops | query | drops/s |\n| nsd.zones | master, slave | zones |\n| nsd.zone_transfers_requests | AXFR, IXFR | requests/s |\n| nsd.zone_transfer_memory | used | bytes |\n| nsd.database_size | disk, mem | bytes |\n| nsd.uptime | uptime | seconds |\n\n", + "integration_type": "collector", + "id": "go.d.plugin-nsd-NSD", + "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/nsd/metadata.yaml", + "related_resources": "" + }, { "meta": { "id": "collector-go.d.plugin-ntpd", @@ -19066,42 +19103,6 @@ "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/monit/metadata.yaml", "related_resources": "" }, - { - "meta": { - "plugin_name": "python.d.plugin", - "module_name": "nsd", - "monitored_instance": { - "name": "Name Server Daemon", - "link": "https://nsd.docs.nlnetlabs.nl/en/latest/#", - "categories": [ - "data-collection.dns-and-dhcp-servers" - ], - "icon_filename": "nsd.svg" - }, - "related_resources": { - "integrations": { - "list": [] - } - }, - "info_provided_to_referring_integrations": { - "description": "" - }, - "keywords": [ - "nsd", - "name server daemon" - ], - "most_popular": false - }, - "overview": "# Name Server Daemon\n\nPlugin: python.d.plugin\nModule: nsd\n\n## Overview\n\nThis collector monitors NSD statistics like queries, zones, protocols, query types and more.\n\n\nIt uses the `nsd-control stats_noreset` command to gather metrics.\n\n\nThis collector is supported on all platforms.\n\nThis collector only supports collecting metrics from a single instance of this integration.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nIf permissions are satisfied, the collector will be able to run `nsd-control stats_noreset`, thus collecting metrics.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### NSD version\n\nThe version of `nsd` must be 4.0+.\n\n\n#### Provide Netdata the permissions to run the command\n\nNetdata must have permissions to run the `nsd-control stats_noreset` command.\n\nYou can:\n\n- Add \"netdata\" user to \"nsd\" group:\n ```\n usermod -aG nsd netdata\n ```\n- Add Netdata to sudoers\n 1. Edit the sudoers file:\n ```\n visudo -f /etc/sudoers.d/netdata\n ```\n 2. Add the entry:\n ```\n Defaults:netdata !requiretty\n netdata ALL=(ALL) NOPASSWD: /usr/sbin/nsd-control stats_noreset\n ```\n\n > Note that you will need to set the `command` option to `sudo /usr/sbin/nsd-control stats_noreset` if you use this method.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `python.d/nsd.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config python.d/nsd.conf\n```\n#### Options\n\nThis particular collector does not need further configuration to work if permissions are satisfied, but you can always customize it's data collection behavior.\n\nThere are 2 sections:\n\n* Global variables\n* One or more JOBS that can define multiple different instances to monitor.\n\nThe following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values.\n\nAdditionally, the following collapsed table contains all the options that can be configured inside a JOB definition.\n\nEvery configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Sets the default data collection frequency. | 30 | no |\n| priority | Controls the order of charts at the netdata dashboard. | 60000 | no |\n| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no |\n| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no |\n| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | no |\n| command | The command to run | nsd-control stats_noreset | no |\n\n#### Examples\n\n##### Basic\n\nA basic configuration example.\n\n```yaml\nlocal:\n name: 'nsd_local'\n command: 'nsd-control stats_noreset'\n\n```\n", - "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `nsd` collector, run the `python.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `python.d.plugin` to debug the collector:\n\n ```bash\n ./python.d.plugin nsd debug trace\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `nsd` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep nsd\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep nsd /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep nsd\n```\n\n", - "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", - "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per Name Server Daemon instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nsd.queries | queries | queries/s |\n| nsd.zones | master, slave | zones |\n| nsd.protocols | udp, udp6, tcp, tcp6 | queries/s |\n| nsd.type | A, NS, CNAME, SOA, PTR, HINFO, MX, NAPTR, TXT, AAAA, SRV, ANY | queries/s |\n| nsd.transfer | NOTIFY, AXFR | queries/s |\n| nsd.rcode | NOERROR, FORMERR, SERVFAIL, NXDOMAIN, NOTIMP, REFUSED, YXDOMAIN | queries/s |\n\n", - "integration_type": "collector", - "id": "python.d.plugin-nsd-Name_Server_Daemon", - "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/nsd/metadata.yaml", - "related_resources": "" - }, { "meta": { "plugin_name": "python.d.plugin", diff --git a/src/collectors/COLLECTORS.md b/src/collectors/COLLECTORS.md index 7fdd627b087383..8e42c310ae818c 100644 --- a/src/collectors/COLLECTORS.md +++ b/src/collectors/COLLECTORS.md @@ -303,7 +303,7 @@ If you don't see the app/service you'd like to monitor in this list: - [ISC DHCP](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/isc_dhcpd/integrations/isc_dhcp.md) -- [Name Server Daemon](https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/nsd/integrations/name_server_daemon.md) +- [NSD](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/nsd/integrations/nsd.md) - [NextDNS](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/prometheus/integrations/nextdns.md) diff --git a/src/go/plugin/go.d/modules/nsd/README.md b/src/go/plugin/go.d/modules/nsd/README.md new file mode 120000 index 00000000000000..a5cb8c98b3568a --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/README.md @@ -0,0 +1 @@ +integrations/nsd.md \ No newline at end of file diff --git a/src/go/plugin/go.d/modules/nsd/integrations/nsd.md b/src/go/plugin/go.d/modules/nsd/integrations/nsd.md new file mode 100644 index 00000000000000..a127712b74c791 --- /dev/null +++ b/src/go/plugin/go.d/modules/nsd/integrations/nsd.md @@ -0,0 +1,201 @@ + + +# NSD + + + + + +Plugin: go.d.plugin +Module: nsd + + + +## Overview + +This collector monitors NSD statistics like queries, zones, protocols, query types and more. It relies on the [`nsd-control`](https://nsd.docs.nlnetlabs.nl/en/latest/manpages/nsd-control.html) CLI tool but avoids directly executing the binary. Instead, it utilizes `ndsudo`, a Netdata helper specifically designed to run privileged commands securely within the Netdata environment. This approach eliminates the need to use `sudo`, improving security and potentially simplifying permission management. +Executed commands: +- `nsd-control stats_noreset` + + + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per NSD instance + +These metrics refer to the the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| nsd.queries | queries | queries/s | +| nsd.queries_by_type | A, NS, MD, MF, CNAME, SOA, MB, MG, MR, NULL, WKS, PTR, HINFO, MINFO, MX, TXT, RP, AFSDB, X25, ISDN, RT, NSAP, SIG, KEY, PX, AAAA, LOC, NXT, SRV, NAPTR, KX, CERT, DNAME, OPT, APL, DS, SSHFP, IPSECKEY, RRSIG, NSEC, DNSKEY, DHCID, NSEC3, NSEC3PARAM, TLSA, SMIMEA, CDS, CDNSKEY, OPENPGPKEY, CSYNC, ZONEMD, SVCB, HTTPS, SPF, NID, L32, L64, LP, EUI48, EUI64, URI, CAA, AVC, DLV, IXFR, AXFR, MAILB, MAILA, ANY | queries/s | +| nsd.queries_by_opcode | QUERY, IQUERY, STATUS, NOTIFY, UPDATE, OTHER | queries/s | +| nsd.queries_by_class | IN, CS, CH, HS | queries/s | +| nsd.queries_by_protocol | udp, udp6, tcp, tcp6, tls, tls6 | queries/s | +| nsd.answers_by_rcode | NOERROR, FORMERR, SERVFAIL, NXDOMAIN, NOTIMP, REFUSED, YXDOMAIN, YXRRSET, NXRRSET, NOTAUTH, NOTZONE, RCODE11, RCODE12, RCODE13, RCODE14, RCODE15, BADVERS | answers/s | +| nsd.errors | query, answer | errors/s | +| nsd.drops | query | drops/s | +| nsd.zones | master, slave | zones | +| nsd.zone_transfers_requests | AXFR, IXFR | requests/s | +| nsd.zone_transfer_memory | used | bytes | +| nsd.database_size | disk, mem | bytes | +| nsd.uptime | uptime | seconds | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `go.d/nsd.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config go.d/nsd.conf +``` +#### Options + +The following options can be defined globally: update_every. + + +
Config options + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Data collection frequency. | 10 | no | +| timeout | nsd-control binary execution timeout. | 2 | no | + +
+ +#### Examples + +##### Custom update_every + +Allows you to override the default data collection interval. + +
Config + +```yaml +jobs: + - name: nsd + update_every: 5 # Collect logical volume statistics every 5 seconds + +``` +
+ + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `nsd` collector, run the `go.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `go.d.plugin` to debug the collector: + + ```bash + ./go.d.plugin -d -m nsd + ``` + +### Getting Logs + +If you're encountering problems with the `nsd` collector, follow these steps to retrieve logs and identify potential issues: + +- **Run the command** specific to your system (systemd, non-systemd, or Docker container). +- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem. + +#### System with systemd + +Use the following command to view logs generated since the last Netdata service restart: + +```bash +journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep nsd +``` + +#### System without systemd + +Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name: + +```bash +grep nsd /var/log/netdata/collector.log +``` + +**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues. + +#### Docker Container + +If your Netdata runs in a Docker container named "netdata" (replace if different), use this command: + +```bash +docker logs netdata 2>&1 | grep nsd +``` + + From 36cf648c0e0b0469883732f537c5bd95f1f398a9 Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Sun, 11 Aug 2024 20:47:16 +0300 Subject: [PATCH 06/27] add exim to ndsudo (#18304) --- src/collectors/plugins.d/ndsudo.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/collectors/plugins.d/ndsudo.c b/src/collectors/plugins.d/ndsudo.c index dfdc3089aaa298..d2cf4fae11d44a 100644 --- a/src/collectors/plugins.d/ndsudo.c +++ b/src/collectors/plugins.d/ndsudo.c @@ -13,6 +13,15 @@ struct command { const char *params; const char *search[MAX_SEARCH]; } allowed_commands[] = { + { + .name = "exim-bpc", + .params = "-bpc", + .search = + { + [0] = "exim", + [1] = NULL, + }, + }, { .name = "nsd-control-stats", .params = "stats_noreset", From 80b50196ab1ee94bbdb15ab73ddc20c1a9b4b0cc Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Sun, 11 Aug 2024 21:05:36 +0300 Subject: [PATCH 07/27] add go.d/exim (#18306) --- src/collectors/python.d.plugin/python.d.conf | 2 +- src/go/plugin/go.d/README.md | 2 + src/go/plugin/go.d/config/go.d.conf | 2 + src/go/plugin/go.d/config/go.d/exim.conf | 5 + src/go/plugin/go.d/config/go.d/nsd.conf | 5 + src/go/plugin/go.d/modules/exim/charts.go | 27 +++ src/go/plugin/go.d/modules/exim/collect.go | 43 ++++ .../go.d/modules/exim/config_schema.json | 35 +++ src/go/plugin/go.d/modules/exim/exec.go | 47 ++++ src/go/plugin/go.d/modules/exim/exim.go | 97 ++++++++ src/go/plugin/go.d/modules/exim/exim_test.go | 217 ++++++++++++++++++ src/go/plugin/go.d/modules/exim/init.go | 23 ++ src/go/plugin/go.d/modules/exim/metadata.yaml | 100 ++++++++ .../go.d/modules/exim/testdata/config.json | 4 + .../go.d/modules/exim/testdata/config.yaml | 2 + src/go/plugin/go.d/modules/init.go | 1 + src/go/plugin/go.d/modules/nsd/nsd.go | 2 +- 17 files changed, 612 insertions(+), 2 deletions(-) create mode 100644 src/go/plugin/go.d/config/go.d/exim.conf create mode 100644 src/go/plugin/go.d/config/go.d/nsd.conf create mode 100644 src/go/plugin/go.d/modules/exim/charts.go create mode 100644 src/go/plugin/go.d/modules/exim/collect.go create mode 100644 src/go/plugin/go.d/modules/exim/config_schema.json create mode 100644 src/go/plugin/go.d/modules/exim/exec.go create mode 100644 src/go/plugin/go.d/modules/exim/exim.go create mode 100644 src/go/plugin/go.d/modules/exim/exim_test.go create mode 100644 src/go/plugin/go.d/modules/exim/init.go create mode 100644 src/go/plugin/go.d/modules/exim/metadata.yaml create mode 100644 src/go/plugin/go.d/modules/exim/testdata/config.json create mode 100644 src/go/plugin/go.d/modules/exim/testdata/config.yaml diff --git a/src/collectors/python.d.plugin/python.d.conf b/src/collectors/python.d.plugin/python.d.conf index 0d09ab3b606dba..ca024b4301277a 100644 --- a/src/collectors/python.d.plugin/python.d.conf +++ b/src/collectors/python.d.plugin/python.d.conf @@ -33,7 +33,6 @@ gc_interval: 300 # dovecot: yes # this is just an example example: no -# exim: yes go_expvar: no # haproxy: yes # monit: yes @@ -59,6 +58,7 @@ adaptec_raid: no # Removed (replaced with go.d/adaptercraid). apache: no # Removed (replaced with go.d/apache). beanstalk: no # Removed (replaced with go.d/beanstalk). elasticsearch: no # Removed (replaced with go.d/elasticsearch). +exim: no # Removed (replaced with go.d/exim). fail2ban: no # Removed (replaced with go.d/fail2ban). freeradius: no # Removed (replaced with go.d/freeradius). gearman: no # Removed (replaced with go.d/gearman). diff --git a/src/go/plugin/go.d/README.md b/src/go/plugin/go.d/README.md index 0f28bd90ee337c..91111034f8aa77 100644 --- a/src/go/plugin/go.d/README.md +++ b/src/go/plugin/go.d/README.md @@ -74,6 +74,7 @@ see the appropriate collector readme. | [elasticsearch](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/elasticsearch) | Elasticsearch/OpenSearch | | [envoy](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/envoy) | Envoy | | [example](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/example) | - | +| [exim](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/exim) | Exim | | [fail2ban](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/fail2ban) | Fail2Ban Jails | | [filecheck](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/filecheck) | Files and Directories | | [fluentd](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/fluentd) | Fluentd | @@ -103,6 +104,7 @@ see the appropriate collector readme. | [nginx](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/nginx) | NGINX | | [nginxplus](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/nginxplus) | NGINX Plus | | [nginxvts](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/nginxvts) | NGINX VTS | +| [nsd](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/nsd) | NSD (NLnet Labs) | | [ntpd](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/ntpd) | NTP daemon | | [nvme](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/nvme) | NVMe devices | | [openvpn](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/openvpn) | OpenVPN | diff --git a/src/go/plugin/go.d/config/go.d.conf b/src/go/plugin/go.d/config/go.d.conf index 76f6f3b46cffc8..439eb8b462fb95 100644 --- a/src/go/plugin/go.d/config/go.d.conf +++ b/src/go/plugin/go.d/config/go.d.conf @@ -39,6 +39,7 @@ modules: # elasticsearch: yes # envoy: yes # example: no +# exim: yes # fail2ban: yes # filecheck: yes # fluentd: yes @@ -67,6 +68,7 @@ modules: # nginx: yes # nginxplus: yes # nginxvts: yes +# nsd: yes # ntpd: yes # nvme: yes # nvidia_smi: no diff --git a/src/go/plugin/go.d/config/go.d/exim.conf b/src/go/plugin/go.d/config/go.d/exim.conf new file mode 100644 index 00000000000000..db881315289ec7 --- /dev/null +++ b/src/go/plugin/go.d/config/go.d/exim.conf @@ -0,0 +1,5 @@ +## All available configuration options, their descriptions and default values: +## https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/exim#readme + +jobs: + - name: exim diff --git a/src/go/plugin/go.d/config/go.d/nsd.conf b/src/go/plugin/go.d/config/go.d/nsd.conf new file mode 100644 index 00000000000000..b3c0a7868f700a --- /dev/null +++ b/src/go/plugin/go.d/config/go.d/nsd.conf @@ -0,0 +1,5 @@ +## All available configuration options, their descriptions and default values: +## https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/nsd#readme + +jobs: + - name: nsd diff --git a/src/go/plugin/go.d/modules/exim/charts.go b/src/go/plugin/go.d/modules/exim/charts.go new file mode 100644 index 00000000000000..f09faf1d0cd787 --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/charts.go @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package exim + +import ( + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" +) + +const ( + prioQueueEmailsCount = module.Priority + iota +) + +var charts = module.Charts{ + queueEmailsCountChart.Copy(), +} + +var queueEmailsCountChart = module.Chart{ + ID: "qemails", + Title: "Exim Queue Emails", + Units: "emails", + Fam: "queue", + Ctx: "exim.qemails", + Priority: prioQueueEmailsCount, + Dims: module.Dims{ + {ID: "emails"}, + }, +} diff --git a/src/go/plugin/go.d/modules/exim/collect.go b/src/go/plugin/go.d/modules/exim/collect.go new file mode 100644 index 00000000000000..ce1a3472918e70 --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/collect.go @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package exim + +import ( + "bufio" + "bytes" + "fmt" + "strconv" + "strings" +) + +func (e *Exim) collect() (map[string]int64, error) { + resp, err := e.exec.countMessagesInQueue() + if err != nil { + return nil, err + } + + emails, err := parseResponse(resp) + if err != nil { + return nil, err + } + + mx := map[string]int64{ + "emails": emails, + } + + return mx, nil +} + +func parseResponse(resp []byte) (int64, error) { + sc := bufio.NewScanner(bytes.NewReader(resp)) + sc.Scan() + + line := strings.TrimSpace(sc.Text()) + + emails, err := strconv.ParseInt(line, 10, 64) + if err != nil { + return 0, fmt.Errorf("invalid response '%s': %v", line, err) + } + + return emails, nil +} diff --git a/src/go/plugin/go.d/modules/exim/config_schema.json b/src/go/plugin/go.d/modules/exim/config_schema.json new file mode 100644 index 00000000000000..6561ea34fb4be0 --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/config_schema.json @@ -0,0 +1,35 @@ +{ + "jsonSchema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "Exim collector configuration.", + "type": "object", + "properties": { + "update_every": { + "title": "Update every", + "description": "Data collection interval, measured in seconds.", + "type": "integer", + "minimum": 1, + "default": 10 + }, + "timeout": { + "title": "Timeout", + "description": "Timeout for executing the binary, specified in seconds.", + "type": "number", + "minimum": 0.5, + "default": 2 + } + }, + "additionalProperties": false, + "patternProperties": { + "^name$": {} + } + }, + "uiSchema": { + "uiOptions": { + "fullPage": true + }, + "timeout": { + "ui:help": "Accepts decimals for precise control (e.g., type 1.5 for 1.5 seconds)." + } + } +} diff --git a/src/go/plugin/go.d/modules/exim/exec.go b/src/go/plugin/go.d/modules/exim/exec.go new file mode 100644 index 00000000000000..241c72acaa4f85 --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/exec.go @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package exim + +import ( + "context" + "fmt" + "os/exec" + "time" + + "github.com/netdata/netdata/go/plugins/logger" +) + +type eximBinary interface { + countMessagesInQueue() ([]byte, error) +} + +func newEximExec(ndsudoPath string, timeout time.Duration, log *logger.Logger) *eximExec { + return &eximExec{ + Logger: log, + ndsudoPath: ndsudoPath, + timeout: timeout, + } +} + +type eximExec struct { + *logger.Logger + + ndsudoPath string + timeout time.Duration +} + +func (e *eximExec) countMessagesInQueue() ([]byte, error) { + ctx, cancel := context.WithTimeout(context.Background(), e.timeout) + defer cancel() + + cmd := exec.CommandContext(ctx, e.ndsudoPath, "exim-bpc") + + e.Debugf("executing '%s'", cmd) + + bs, err := cmd.Output() + if err != nil { + return nil, fmt.Errorf("error on '%s': %v", cmd, err) + } + + return bs, nil +} diff --git a/src/go/plugin/go.d/modules/exim/exim.go b/src/go/plugin/go.d/modules/exim/exim.go new file mode 100644 index 00000000000000..f3c3e6e7837725 --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/exim.go @@ -0,0 +1,97 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package exim + +import ( + _ "embed" + "errors" + "time" + + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" + "github.com/netdata/netdata/go/plugins/plugin/go.d/pkg/web" +) + +//go:embed "config_schema.json" +var configSchema string + +func init() { + module.Register("exim", module.Creator{ + JobConfigSchema: configSchema, + Defaults: module.Defaults{ + UpdateEvery: 10, + }, + Create: func() module.Module { return New() }, + Config: func() any { return &Config{} }, + }) +} + +func New() *Exim { + return &Exim{ + Config: Config{ + Timeout: web.Duration(time.Second * 2), + }, + charts: charts.Copy(), + } +} + +type Config struct { + UpdateEvery int `yaml:"update_every,omitempty" json:"update_every"` + Timeout web.Duration `yaml:"timeout,omitempty" json:"timeout"` +} + +type Exim struct { + module.Base + Config `yaml:",inline" json:""` + + charts *module.Charts + + exec eximBinary +} + +func (e *Exim) Configuration() any { + return e.Config +} + +func (e *Exim) Init() error { + exim, err := e.initEximExec() + if err != nil { + e.Errorf("exim exec initialization: %v", err) + return err + } + e.exec = exim + + return nil +} + +func (e *Exim) Check() error { + mx, err := e.collect() + if err != nil { + e.Error(err) + return err + } + + if len(mx) == 0 { + return errors.New("no metrics collected") + } + + return nil +} + +func (e *Exim) Charts() *module.Charts { + return e.charts +} + +func (e *Exim) Collect() map[string]int64 { + mx, err := e.collect() + if err != nil { + e.Error(err) + } + + if len(mx) == 0 { + return nil + } + + return mx +} + +func (e *Exim) Cleanup() {} diff --git a/src/go/plugin/go.d/modules/exim/exim_test.go b/src/go/plugin/go.d/modules/exim/exim_test.go new file mode 100644 index 00000000000000..16eb025e184acb --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/exim_test.go @@ -0,0 +1,217 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package exim + +import ( + "errors" + "os" + "testing" + + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +var ( + dataConfigJSON, _ = os.ReadFile("testdata/config.json") + dataConfigYAML, _ = os.ReadFile("testdata/config.yaml") +) + +func Test_testDataIsValid(t *testing.T) { + for name, data := range map[string][]byte{ + "dataConfigJSON": dataConfigJSON, + "dataConfigYAML": dataConfigYAML, + } { + require.NotNil(t, data, name) + + } +} + +func TestExim_Configuration(t *testing.T) { + module.TestConfigurationSerialize(t, &Exim{}, dataConfigJSON, dataConfigYAML) +} + +func TestExim_Init(t *testing.T) { + tests := map[string]struct { + config Config + wantFail bool + }{ + "fails if failed to locate ndsudo": { + wantFail: true, + config: New().Config, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + exim := New() + exim.Config = test.config + + if test.wantFail { + assert.Error(t, exim.Init()) + } else { + assert.NoError(t, exim.Init()) + } + }) + } +} + +func TestExim_Cleanup(t *testing.T) { + tests := map[string]struct { + prepare func() *Exim + }{ + "not initialized exec": { + prepare: func() *Exim { + return New() + }, + }, + "after check": { + prepare: func() *Exim { + exim := New() + exim.exec = prepareMockOK() + _ = exim.Check() + return exim + }, + }, + "after collect": { + prepare: func() *Exim { + exim := New() + exim.exec = prepareMockOK() + _ = exim.Collect() + return exim + }, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + exim := test.prepare() + + assert.NotPanics(t, exim.Cleanup) + }) + } +} + +func TestEximCharts(t *testing.T) { + assert.NotNil(t, New().Charts()) +} + +func TestExim_Check(t *testing.T) { + tests := map[string]struct { + prepareMock func() *mockEximExec + wantFail bool + }{ + "success case": { + prepareMock: prepareMockOK, + wantFail: false, + }, + "error on exec": { + prepareMock: prepareMockErr, + wantFail: true, + }, + "empty response": { + prepareMock: prepareMockEmptyResponse, + wantFail: true, + }, + "unexpected response": { + prepareMock: prepareMockUnexpectedResponse, + wantFail: true, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + exim := New() + mock := test.prepareMock() + exim.exec = mock + + if test.wantFail { + assert.Error(t, exim.Check()) + } else { + assert.NoError(t, exim.Check()) + } + }) + } +} + +func TestExim_Collect(t *testing.T) { + tests := map[string]struct { + prepareMock func() *mockEximExec + wantMetrics map[string]int64 + }{ + "success case": { + prepareMock: prepareMockOK, + wantMetrics: map[string]int64{ + "emails": 99, + }, + }, + "error on exec": { + prepareMock: prepareMockErr, + wantMetrics: nil, + }, + "empty response": { + prepareMock: prepareMockEmptyResponse, + wantMetrics: nil, + }, + "unexpected response": { + prepareMock: prepareMockUnexpectedResponse, + wantMetrics: nil, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + exim := New() + mock := test.prepareMock() + exim.exec = mock + + mx := exim.Collect() + + assert.Equal(t, test.wantMetrics, mx) + + if len(test.wantMetrics) > 0 { + assert.Len(t, *exim.Charts(), len(charts)) + module.TestMetricsHasAllChartsDims(t, exim.Charts(), mx) + } + }) + } +} + +func prepareMockOK() *mockEximExec { + return &mockEximExec{ + data: []byte("99"), + } +} + +func prepareMockErr() *mockEximExec { + return &mockEximExec{ + err: true, + } +} + +func prepareMockEmptyResponse() *mockEximExec { + return &mockEximExec{} +} + +func prepareMockUnexpectedResponse() *mockEximExec { + return &mockEximExec{ + data: []byte(` +Lorem ipsum dolor sit amet, consectetur adipiscing elit. +Nulla malesuada erat id magna mattis, eu viverra tellus rhoncus. +Fusce et felis pulvinar, posuere sem non, porttitor eros. +`), + } +} + +type mockEximExec struct { + err bool + data []byte +} + +func (m *mockEximExec) countMessagesInQueue() ([]byte, error) { + if m.err { + return nil, errors.New("mock.countMessagesInQueue() error") + } + return m.data, nil +} diff --git a/src/go/plugin/go.d/modules/exim/init.go b/src/go/plugin/go.d/modules/exim/init.go new file mode 100644 index 00000000000000..d1d5c079350742 --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/init.go @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package exim + +import ( + "fmt" + "os" + "path/filepath" + + "github.com/netdata/netdata/go/plugins/pkg/executable" +) + +func (e *Exim) initEximExec() (eximBinary, error) { + ndsudoPath := filepath.Join(executable.Directory, "ndsudo") + if _, err := os.Stat(ndsudoPath); err != nil { + return nil, fmt.Errorf("ndsudo executable not found: %v", err) + + } + + exim := newEximExec(ndsudoPath, e.Timeout.Duration(), e.Logger) + + return exim, nil +} diff --git a/src/go/plugin/go.d/modules/exim/metadata.yaml b/src/go/plugin/go.d/modules/exim/metadata.yaml new file mode 100644 index 00000000000000..c7f4a7a98e3942 --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/metadata.yaml @@ -0,0 +1,100 @@ +plugin_name: go.d.plugin +modules: + - meta: + id: collector-go.d.plugin-exim + plugin_name: go.d.plugin + module_name: exim + monitored_instance: + name: Exim + link: "https://www.exim.org/" + icon_filename: 'exim.jpg' + categories: + - data-collection.mail-servers + keywords: + - exim + - mail + - email + related_resources: + integrations: + list: [] + info_provided_to_referring_integrations: + description: "" + most_popular: false + overview: + data_collection: + metrics_description: > + This collector monitors Exim mail queue. + It relies on the [`exim`](https://www.exim.org/exim-html-3.20/doc/html/spec_5.html) CLI tool but avoids directly executing the binary. + Instead, it utilizes `ndsudo`, a Netdata helper specifically designed to run privileged commands securely within the Netdata environment. + This approach eliminates the need to use `sudo`, improving security and potentially simplifying permission management. + + Executed commands: + + - `exim -bpc` + method_description: "" + supported_platforms: + include: [] + exclude: [] + multi_instance: false + additional_permissions: + description: "" + default_behavior: + auto_detection: + description: "" + limits: + description: "" + performance_impact: + description: "" + setup: + prerequisites: + list: [] + configuration: + file: + name: go.d/exim.conf + options: + description: | + The following options can be defined globally: update_every. + folding: + title: Config options + enabled: true + list: + - name: update_every + description: Data collection frequency. + default_value: 10 + required: false + - name: timeout + description: exim binary execution timeout. + default_value: 2 + required: false + examples: + folding: + title: Config + enabled: true + list: + - name: Custom update_every + description: Allows you to override the default data collection interval. + config: | + jobs: + - name: exim + update_every: 5 # Collect logical volume statistics every 5 seconds + troubleshooting: + problems: + list: [] + alerts: [] + metrics: + folding: + title: Metrics + enabled: false + description: "" + availability: [] + scopes: + - name: global + description: These metrics refer to the the entire monitored application. + labels: [] + metrics: + - name: exim.qemails + description: Exim Queue Emails + unit: 'emails' + chart_type: line + dimensions: + - name: emails diff --git a/src/go/plugin/go.d/modules/exim/testdata/config.json b/src/go/plugin/go.d/modules/exim/testdata/config.json new file mode 100644 index 00000000000000..291ecee3d63d06 --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/testdata/config.json @@ -0,0 +1,4 @@ +{ + "update_every": 123, + "timeout": 123.123 +} diff --git a/src/go/plugin/go.d/modules/exim/testdata/config.yaml b/src/go/plugin/go.d/modules/exim/testdata/config.yaml new file mode 100644 index 00000000000000..25b0b4c780de56 --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/testdata/config.yaml @@ -0,0 +1,2 @@ +update_every: 123 +timeout: 123.123 diff --git a/src/go/plugin/go.d/modules/init.go b/src/go/plugin/go.d/modules/init.go index 6b6cb7fbec7fb0..5592bfb79fb351 100644 --- a/src/go/plugin/go.d/modules/init.go +++ b/src/go/plugin/go.d/modules/init.go @@ -28,6 +28,7 @@ import ( _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/elasticsearch" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/envoy" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/example" + _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/exim" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/fail2ban" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/filecheck" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/fluentd" diff --git a/src/go/plugin/go.d/modules/nsd/nsd.go b/src/go/plugin/go.d/modules/nsd/nsd.go index 39da660975d3db..fae0f67f3480e8 100644 --- a/src/go/plugin/go.d/modules/nsd/nsd.go +++ b/src/go/plugin/go.d/modules/nsd/nsd.go @@ -55,7 +55,7 @@ func (n *Nsd) Configuration() any { func (n *Nsd) Init() error { nsdControl, err := n.initNsdControlExec() if err != nil { - n.Errorf("nvm-control exec initialization: %v", err) + n.Errorf("nsd-control exec initialization: %v", err) return err } n.exec = nsdControl From 7317cf4537aaade461ac3247c136c826940b6a4e Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Sun, 11 Aug 2024 21:23:54 +0300 Subject: [PATCH 08/27] remove python.d/exim (#18305) --- CMakeLists.txt | 2 - src/collectors/python.d.plugin/exim/README.md | 1 - .../python.d.plugin/exim/exim.chart.py | 39 ---- src/collectors/python.d.plugin/exim/exim.conf | 91 -------- .../python.d.plugin/exim/integrations/exim.md | 214 ------------------ .../python.d.plugin/exim/metadata.yaml | 132 ----------- 6 files changed, 479 deletions(-) delete mode 120000 src/collectors/python.d.plugin/exim/README.md delete mode 100644 src/collectors/python.d.plugin/exim/exim.chart.py delete mode 100644 src/collectors/python.d.plugin/exim/exim.conf delete mode 100644 src/collectors/python.d.plugin/exim/integrations/exim.md delete mode 100644 src/collectors/python.d.plugin/exim/metadata.yaml diff --git a/CMakeLists.txt b/CMakeLists.txt index eee142c7678b81..3eb90ea75e97be 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -2782,7 +2782,6 @@ install(FILES src/collectors/python.d.plugin/changefinder/changefinder.conf src/collectors/python.d.plugin/dovecot/dovecot.conf src/collectors/python.d.plugin/example/example.conf - src/collectors/python.d.plugin/exim/exim.conf src/collectors/python.d.plugin/go_expvar/go_expvar.conf src/collectors/python.d.plugin/haproxy/haproxy.conf src/collectors/python.d.plugin/monit/monit.conf @@ -2811,7 +2810,6 @@ install(FILES src/collectors/python.d.plugin/changefinder/changefinder.chart.py src/collectors/python.d.plugin/dovecot/dovecot.chart.py src/collectors/python.d.plugin/example/example.chart.py - src/collectors/python.d.plugin/exim/exim.chart.py src/collectors/python.d.plugin/go_expvar/go_expvar.chart.py src/collectors/python.d.plugin/haproxy/haproxy.chart.py src/collectors/python.d.plugin/monit/monit.chart.py diff --git a/src/collectors/python.d.plugin/exim/README.md b/src/collectors/python.d.plugin/exim/README.md deleted file mode 120000 index f1f2ef9f927dd8..00000000000000 --- a/src/collectors/python.d.plugin/exim/README.md +++ /dev/null @@ -1 +0,0 @@ -integrations/exim.md \ No newline at end of file diff --git a/src/collectors/python.d.plugin/exim/exim.chart.py b/src/collectors/python.d.plugin/exim/exim.chart.py deleted file mode 100644 index 7238a1beaa5743..00000000000000 --- a/src/collectors/python.d.plugin/exim/exim.chart.py +++ /dev/null @@ -1,39 +0,0 @@ -# -*- coding: utf-8 -*- -# Description: exim netdata python.d module -# Author: Pawel Krupa (paulfantom) -# SPDX-License-Identifier: GPL-3.0-or-later - -from bases.FrameworkServices.ExecutableService import ExecutableService - -EXIM_COMMAND = 'exim -bpc' - -ORDER = [ - 'qemails', -] - -CHARTS = { - 'qemails': { - 'options': [None, 'Exim Queue Emails', 'emails', 'queue', 'exim.qemails', 'line'], - 'lines': [ - ['emails', None, 'absolute'] - ] - } -} - - -class Service(ExecutableService): - def __init__(self, configuration=None, name=None): - ExecutableService.__init__(self, configuration=configuration, name=name) - self.order = ORDER - self.definitions = CHARTS - self.command = EXIM_COMMAND - - def _get_data(self): - """ - Format data received from shell command - :return: dict - """ - try: - return {'emails': int(self._get_raw_data()[0])} - except (ValueError, AttributeError): - return None diff --git a/src/collectors/python.d.plugin/exim/exim.conf b/src/collectors/python.d.plugin/exim/exim.conf deleted file mode 100644 index 3b7e659227947e..00000000000000 --- a/src/collectors/python.d.plugin/exim/exim.conf +++ /dev/null @@ -1,91 +0,0 @@ -# netdata python.d.plugin configuration for exim -# -# This file is in YaML format. Generally the format is: -# -# name: value -# -# There are 2 sections: -# - global variables -# - one or more JOBS -# -# JOBS allow you to collect values from multiple sources. -# Each source will have its own set of charts. -# -# JOB parameters have to be indented (using spaces only, example below). - -# ---------------------------------------------------------------------- -# Global Variables -# These variables set the defaults for all JOBs, however each JOB -# may define its own, overriding the defaults. - -# update_every sets the default data collection frequency. -# If unset, the python.d.plugin default is used. -# exim is slow, so once every 10 seconds -update_every: 10 - -# priority controls the order of charts at the netdata dashboard. -# Lower numbers move the charts towards the top of the page. -# If unset, the default for python.d.plugin is used. -# priority: 60000 - -# penalty indicates whether to apply penalty to update_every in case of failures. -# Penalty will increase every 5 failed updates in a row. Maximum penalty is 10 minutes. -# penalty: yes - -# autodetection_retry sets the job re-check interval in seconds. -# The job is not deleted if check fails. -# Attempts to start the job are made once every autodetection_retry. -# This feature is disabled by default. -# autodetection_retry: 0 - -# ---------------------------------------------------------------------- -# JOBS (data collection sources) -# -# The default JOBS share the same *name*. JOBS with the same name -# are mutually exclusive. Only one of them will be allowed running at -# any time. This allows autodetection to try several alternatives and -# pick the one that works. -# -# Any number of jobs is supported. -# -# All python.d.plugin JOBS (for all its modules) support a set of -# predefined parameters. These are: -# -# job_name: -# name: myname # the JOB's name as it will appear at the -# # dashboard (by default is the job_name) -# # JOBs sharing a name are mutually exclusive -# update_every: 1 # the JOB's data collection frequency -# priority: 60000 # the JOB's order on the dashboard -# penalty: yes # the JOB's penalty -# autodetection_retry: 0 # the JOB's re-check interval in seconds -# -# Additionally to the above, exim also supports the following: -# -# command: 'exim -bpc' # the command to run -# - -# ---------------------------------------------------------------------- -# REQUIRED exim CONFIGURATION -# -# netdata will query exim as user netdata. -# By default exim will refuse to respond. -# -# To allow querying exim as non-admin user, please set the following -# to your exim configuration: -# -# queue_list_requires_admin = false -# -# Your exim configuration should be in -# -# /etc/exim/exim4.conf -# or -# /etc/exim4/conf.d/main/000_local_options -# -# Please consult your distribution information to find the exact file. - -# ---------------------------------------------------------------------- -# AUTO-DETECTION JOBS - -local: - command: 'exim -bpc' diff --git a/src/collectors/python.d.plugin/exim/integrations/exim.md b/src/collectors/python.d.plugin/exim/integrations/exim.md deleted file mode 100644 index 5fd67e8c2b0653..00000000000000 --- a/src/collectors/python.d.plugin/exim/integrations/exim.md +++ /dev/null @@ -1,214 +0,0 @@ - - -# Exim - - - - - -Plugin: python.d.plugin -Module: exim - - - -## Overview - -This collector monitors Exim mail queue. - -It uses the `exim` command line binary to get the statistics. - -This collector is supported on all platforms. - -This collector only supports collecting metrics from a single instance of this integration. - - -### Default Behavior - -#### Auto-Detection - -Assuming setup prerequisites are met, the collector will try to gather statistics using the method described above, even without any configuration. - -#### Limits - -The default configuration for this integration does not impose any limits on data collection. - -#### Performance Impact - -The default configuration for this integration is not expected to impose a significant performance impact on the system. - - -## Metrics - -Metrics grouped by *scope*. - -The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. - - - -### Per Exim instance - -These metrics refer to the entire monitored application. - -This scope has no labels. - -Metrics: - -| Metric | Dimensions | Unit | -|:------|:----------|:----| -| exim.qemails | emails | emails | - - - -## Alerts - -There are no alerts configured by default for this integration. - - -## Setup - -### Prerequisites - -#### Exim configuration - local installation - -The module uses the `exim` binary, which can only be executed as root by default. We need to allow other users to `exim` binary. We solve that adding `queue_list_requires_admin` statement in exim configuration and set to `false`, because it is `true` by default. On many Linux distributions, the default location of `exim` configuration is in `/etc/exim.conf`. - -1. Edit the `exim` configuration with your preferred editor and add: -`queue_list_requires_admin = false` -2. Restart `exim` and Netdata - - -#### Exim configuration - WHM (CPanel) server - -On a WHM server, you can reconfigure `exim` over the WHM interface with the following steps. - -1. Login to WHM -2. Navigate to Service Configuration --> Exim Configuration Manager --> tab Advanced Editor -3. Scroll down to the button **Add additional configuration setting** and click on it. -4. In the new dropdown which will appear above we need to find and choose: -`queue_list_requires_admin` and set to `false` -5. Scroll to the end and click the **Save** button. - - - -### Configuration - -#### File - -The configuration file name for this integration is `python.d/exim.conf`. - - -You can edit the configuration file using the `edit-config` script from the -Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). - -```bash -cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata -sudo ./edit-config python.d/exim.conf -``` -#### Options - -There are 2 sections: - -* Global variables -* One or more JOBS that can define multiple different instances to monitor. - -The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. - -Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. - -Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. - - -
Config options - -| Name | Description | Default | Required | -|:----|:-----------|:-------|:--------:| -| update_every | Sets the default data collection frequency. | 5 | no | -| priority | Controls the order of charts at the netdata dashboard. | 60000 | no | -| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no | -| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no | -| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | no | -| command | Path and command to the `exim` binary | exim -bpc | no | - -
- -#### Examples - -##### Local exim install - -A basic local exim install - -```yaml -local: - command: 'exim -bpc' - -``` - - -## Troubleshooting - -### Debug Mode - -To troubleshoot issues with the `exim` collector, run the `python.d.plugin` with the debug option enabled. The output -should give you clues as to why the collector isn't working. - -- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on - your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. - - ```bash - cd /usr/libexec/netdata/plugins.d/ - ``` - -- Switch to the `netdata` user. - - ```bash - sudo -u netdata -s - ``` - -- Run the `python.d.plugin` to debug the collector: - - ```bash - ./python.d.plugin exim debug trace - ``` - -### Getting Logs - -If you're encountering problems with the `exim` collector, follow these steps to retrieve logs and identify potential issues: - -- **Run the command** specific to your system (systemd, non-systemd, or Docker container). -- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem. - -#### System with systemd - -Use the following command to view logs generated since the last Netdata service restart: - -```bash -journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep exim -``` - -#### System without systemd - -Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name: - -```bash -grep exim /var/log/netdata/collector.log -``` - -**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues. - -#### Docker Container - -If your Netdata runs in a Docker container named "netdata" (replace if different), use this command: - -```bash -docker logs netdata 2>&1 | grep exim -``` - - diff --git a/src/collectors/python.d.plugin/exim/metadata.yaml b/src/collectors/python.d.plugin/exim/metadata.yaml deleted file mode 100644 index a8be02d9977a03..00000000000000 --- a/src/collectors/python.d.plugin/exim/metadata.yaml +++ /dev/null @@ -1,132 +0,0 @@ -plugin_name: python.d.plugin -modules: - - meta: - plugin_name: python.d.plugin - module_name: exim - monitored_instance: - name: Exim - link: "https://www.exim.org/" - categories: - - data-collection.mail-servers - icon_filename: "exim.jpg" - related_resources: - integrations: - list: [] - info_provided_to_referring_integrations: - description: "" - keywords: - - exim - - mail - - server - most_popular: false - overview: - data_collection: - metrics_description: "This collector monitors Exim mail queue." - method_description: "It uses the `exim` command line binary to get the statistics." - supported_platforms: - include: [] - exclude: [] - multi_instance: false - additional_permissions: - description: "" - default_behavior: - auto_detection: - description: "Assuming setup prerequisites are met, the collector will try to gather statistics using the method described above, even without any configuration." - limits: - description: "" - performance_impact: - description: "" - setup: - prerequisites: - list: - - title: "Exim configuration - local installation" - description: | - The module uses the `exim` binary, which can only be executed as root by default. We need to allow other users to `exim` binary. We solve that adding `queue_list_requires_admin` statement in exim configuration and set to `false`, because it is `true` by default. On many Linux distributions, the default location of `exim` configuration is in `/etc/exim.conf`. - - 1. Edit the `exim` configuration with your preferred editor and add: - `queue_list_requires_admin = false` - 2. Restart `exim` and Netdata - - title: "Exim configuration - WHM (CPanel) server" - description: | - On a WHM server, you can reconfigure `exim` over the WHM interface with the following steps. - - 1. Login to WHM - 2. Navigate to Service Configuration --> Exim Configuration Manager --> tab Advanced Editor - 3. Scroll down to the button **Add additional configuration setting** and click on it. - 4. In the new dropdown which will appear above we need to find and choose: - `queue_list_requires_admin` and set to `false` - 5. Scroll to the end and click the **Save** button. - configuration: - file: - name: python.d/exim.conf - options: - description: | - There are 2 sections: - - * Global variables - * One or more JOBS that can define multiple different instances to monitor. - - The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. - - Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. - - Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. - folding: - title: "Config options" - enabled: true - list: - - name: update_every - description: Sets the default data collection frequency. - default_value: 5 - required: false - - name: priority - description: Controls the order of charts at the netdata dashboard. - default_value: 60000 - required: false - - name: autodetection_retry - description: Sets the job re-check interval in seconds. - default_value: 0 - required: false - - name: penalty - description: Indicates whether to apply penalty to update_every in case of failures. - default_value: yes - required: false - - name: name - description: Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. - default_value: "" - required: false - - name: command - description: Path and command to the `exim` binary - default_value: "exim -bpc" - required: false - examples: - folding: - enabled: false - title: "Config" - list: - - name: Local exim install - description: A basic local exim install - config: | - local: - command: 'exim -bpc' - troubleshooting: - problems: - list: [] - alerts: [] - metrics: - folding: - title: Metrics - enabled: false - description: "" - availability: [] - scopes: - - name: global - description: "These metrics refer to the entire monitored application." - labels: [] - metrics: - - name: exim.qemails - description: Exim Queue Emails - unit: "emails" - chart_type: line - dimensions: - - name: emails From ccc33c7302356cfba73ed87ff630c849c16652f9 Mon Sep 17 00:00:00 2001 From: Netdata bot <43409846+netdatabot@users.noreply.github.com> Date: Sun, 11 Aug 2024 14:46:06 -0400 Subject: [PATCH 09/27] Regenerate integrations.js (#18308) Co-authored-by: ilyam8 <22274335+ilyam8@users.noreply.github.com> --- integrations/integrations.js | 75 +++---- integrations/integrations.json | 75 +++---- src/collectors/COLLECTORS.md | 2 +- src/go/plugin/go.d/modules/exim/README.md | 1 + .../go.d/modules/exim/integrations/exim.md | 189 ++++++++++++++++++ 5 files changed, 267 insertions(+), 75 deletions(-) create mode 120000 src/go/plugin/go.d/modules/exim/README.md create mode 100644 src/go/plugin/go.d/modules/exim/integrations/exim.md diff --git a/integrations/integrations.js b/integrations/integrations.js index 3b64742b87dc4f..89ceb53c9f8ee0 100644 --- a/integrations/integrations.js +++ b/integrations/integrations.js @@ -4098,6 +4098,44 @@ export const integrations = [ "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/envoy/metadata.yaml", "related_resources": "" }, + { + "meta": { + "id": "collector-go.d.plugin-exim", + "plugin_name": "go.d.plugin", + "module_name": "exim", + "monitored_instance": { + "name": "Exim", + "link": "https://www.exim.org/", + "icon_filename": "exim.jpg", + "categories": [ + "data-collection.mail-servers" + ] + }, + "keywords": [ + "exim", + "mail", + "email" + ], + "related_resources": { + "integrations": { + "list": [] + } + }, + "info_provided_to_referring_integrations": { + "description": "" + }, + "most_popular": false + }, + "overview": "# Exim\n\nPlugin: go.d.plugin\nModule: exim\n\n## Overview\n\nThis collector monitors Exim mail queue. It relies on the [`exim`](https://www.exim.org/exim-html-3.20/doc/html/spec_5.html) CLI tool but avoids directly executing the binary. Instead, it utilizes `ndsudo`, a Netdata helper specifically designed to run privileged commands securely within the Netdata environment. This approach eliminates the need to use `sudo`, improving security and potentially simplifying permission management.\nExecuted commands:\n- `exim -bpc`\n\n\n\n\nThis collector is supported on all platforms.\n\nThis collector only supports collecting metrics from a single instance of this integration.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis integration doesn't support auto-detection.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", + "setup": "## Setup\n\n### Prerequisites\n\nNo action required.\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/exim.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/exim.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| timeout | exim binary execution timeout. | 2 | no |\n\n{% /details %}\n#### Examples\n\n##### Custom update_every\n\nAllows you to override the default data collection interval.\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: exim\n update_every: 5 # Collect logical volume statistics every 5 seconds\n\n```\n{% /details %}\n", + "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `exim` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m exim\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `exim` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep exim\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep exim /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep exim\n```\n\n", + "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", + "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per Exim instance\n\nThese metrics refer to the the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| exim.qemails | emails | emails |\n\n", + "integration_type": "collector", + "id": "go.d.plugin-exim-Exim", + "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/exim/metadata.yaml", + "related_resources": "" + }, { "meta": { "id": "collector-go.d.plugin-fail2ban", @@ -18993,43 +19031,6 @@ export const integrations = [ "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/example/metadata.yaml", "related_resources": "" }, - { - "meta": { - "plugin_name": "python.d.plugin", - "module_name": "exim", - "monitored_instance": { - "name": "Exim", - "link": "https://www.exim.org/", - "categories": [ - "data-collection.mail-servers" - ], - "icon_filename": "exim.jpg" - }, - "related_resources": { - "integrations": { - "list": [] - } - }, - "info_provided_to_referring_integrations": { - "description": "" - }, - "keywords": [ - "exim", - "mail", - "server" - ], - "most_popular": false - }, - "overview": "# Exim\n\nPlugin: python.d.plugin\nModule: exim\n\n## Overview\n\nThis collector monitors Exim mail queue.\n\nIt uses the `exim` command line binary to get the statistics.\n\nThis collector is supported on all platforms.\n\nThis collector only supports collecting metrics from a single instance of this integration.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nAssuming setup prerequisites are met, the collector will try to gather statistics using the method described above, even without any configuration.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### Exim configuration - local installation\n\nThe module uses the `exim` binary, which can only be executed as root by default. We need to allow other users to `exim` binary. We solve that adding `queue_list_requires_admin` statement in exim configuration and set to `false`, because it is `true` by default. On many Linux distributions, the default location of `exim` configuration is in `/etc/exim.conf`.\n\n1. Edit the `exim` configuration with your preferred editor and add:\n`queue_list_requires_admin = false`\n2. Restart `exim` and Netdata\n\n\n#### Exim configuration - WHM (CPanel) server\n\nOn a WHM server, you can reconfigure `exim` over the WHM interface with the following steps.\n\n1. Login to WHM\n2. Navigate to Service Configuration --> Exim Configuration Manager --> tab Advanced Editor\n3. Scroll down to the button **Add additional configuration setting** and click on it.\n4. In the new dropdown which will appear above we need to find and choose:\n`queue_list_requires_admin` and set to `false`\n5. Scroll to the end and click the **Save** button.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `python.d/exim.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config python.d/exim.conf\n```\n#### Options\n\nThere are 2 sections:\n\n* Global variables\n* One or more JOBS that can define multiple different instances to monitor.\n\nThe following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values.\n\nAdditionally, the following collapsed table contains all the options that can be configured inside a JOB definition.\n\nEvery configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Sets the default data collection frequency. | 5 | no |\n| priority | Controls the order of charts at the netdata dashboard. | 60000 | no |\n| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no |\n| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no |\n| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | no |\n| command | Path and command to the `exim` binary | exim -bpc | no |\n\n{% /details %}\n#### Examples\n\n##### Local exim install\n\nA basic local exim install\n\n```yaml\nlocal:\n command: 'exim -bpc'\n\n```\n", - "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `exim` collector, run the `python.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `python.d.plugin` to debug the collector:\n\n ```bash\n ./python.d.plugin exim debug trace\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `exim` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep exim\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep exim /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep exim\n```\n\n", - "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", - "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per Exim instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| exim.qemails | emails | emails |\n\n", - "integration_type": "collector", - "id": "python.d.plugin-exim-Exim", - "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/exim/metadata.yaml", - "related_resources": "" - }, { "meta": { "plugin_name": "python.d.plugin", diff --git a/integrations/integrations.json b/integrations/integrations.json index 90a68a43b68a08..2689389089ba1e 100644 --- a/integrations/integrations.json +++ b/integrations/integrations.json @@ -4096,6 +4096,44 @@ "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/envoy/metadata.yaml", "related_resources": "" }, + { + "meta": { + "id": "collector-go.d.plugin-exim", + "plugin_name": "go.d.plugin", + "module_name": "exim", + "monitored_instance": { + "name": "Exim", + "link": "https://www.exim.org/", + "icon_filename": "exim.jpg", + "categories": [ + "data-collection.mail-servers" + ] + }, + "keywords": [ + "exim", + "mail", + "email" + ], + "related_resources": { + "integrations": { + "list": [] + } + }, + "info_provided_to_referring_integrations": { + "description": "" + }, + "most_popular": false + }, + "overview": "# Exim\n\nPlugin: go.d.plugin\nModule: exim\n\n## Overview\n\nThis collector monitors Exim mail queue. It relies on the [`exim`](https://www.exim.org/exim-html-3.20/doc/html/spec_5.html) CLI tool but avoids directly executing the binary. Instead, it utilizes `ndsudo`, a Netdata helper specifically designed to run privileged commands securely within the Netdata environment. This approach eliminates the need to use `sudo`, improving security and potentially simplifying permission management.\nExecuted commands:\n- `exim -bpc`\n\n\n\n\nThis collector is supported on all platforms.\n\nThis collector only supports collecting metrics from a single instance of this integration.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis integration doesn't support auto-detection.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", + "setup": "## Setup\n\n### Prerequisites\n\nNo action required.\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/exim.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/exim.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| timeout | exim binary execution timeout. | 2 | no |\n\n#### Examples\n\n##### Custom update_every\n\nAllows you to override the default data collection interval.\n\n```yaml\njobs:\n - name: exim\n update_every: 5 # Collect logical volume statistics every 5 seconds\n\n```\n", + "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `exim` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m exim\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `exim` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep exim\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep exim /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep exim\n```\n\n", + "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", + "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per Exim instance\n\nThese metrics refer to the the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| exim.qemails | emails | emails |\n\n", + "integration_type": "collector", + "id": "go.d.plugin-exim-Exim", + "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/exim/metadata.yaml", + "related_resources": "" + }, { "meta": { "id": "collector-go.d.plugin-fail2ban", @@ -18991,43 +19029,6 @@ "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/example/metadata.yaml", "related_resources": "" }, - { - "meta": { - "plugin_name": "python.d.plugin", - "module_name": "exim", - "monitored_instance": { - "name": "Exim", - "link": "https://www.exim.org/", - "categories": [ - "data-collection.mail-servers" - ], - "icon_filename": "exim.jpg" - }, - "related_resources": { - "integrations": { - "list": [] - } - }, - "info_provided_to_referring_integrations": { - "description": "" - }, - "keywords": [ - "exim", - "mail", - "server" - ], - "most_popular": false - }, - "overview": "# Exim\n\nPlugin: python.d.plugin\nModule: exim\n\n## Overview\n\nThis collector monitors Exim mail queue.\n\nIt uses the `exim` command line binary to get the statistics.\n\nThis collector is supported on all platforms.\n\nThis collector only supports collecting metrics from a single instance of this integration.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nAssuming setup prerequisites are met, the collector will try to gather statistics using the method described above, even without any configuration.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### Exim configuration - local installation\n\nThe module uses the `exim` binary, which can only be executed as root by default. We need to allow other users to `exim` binary. We solve that adding `queue_list_requires_admin` statement in exim configuration and set to `false`, because it is `true` by default. On many Linux distributions, the default location of `exim` configuration is in `/etc/exim.conf`.\n\n1. Edit the `exim` configuration with your preferred editor and add:\n`queue_list_requires_admin = false`\n2. Restart `exim` and Netdata\n\n\n#### Exim configuration - WHM (CPanel) server\n\nOn a WHM server, you can reconfigure `exim` over the WHM interface with the following steps.\n\n1. Login to WHM\n2. Navigate to Service Configuration --> Exim Configuration Manager --> tab Advanced Editor\n3. Scroll down to the button **Add additional configuration setting** and click on it.\n4. In the new dropdown which will appear above we need to find and choose:\n`queue_list_requires_admin` and set to `false`\n5. Scroll to the end and click the **Save** button.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `python.d/exim.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config python.d/exim.conf\n```\n#### Options\n\nThere are 2 sections:\n\n* Global variables\n* One or more JOBS that can define multiple different instances to monitor.\n\nThe following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values.\n\nAdditionally, the following collapsed table contains all the options that can be configured inside a JOB definition.\n\nEvery configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Sets the default data collection frequency. | 5 | no |\n| priority | Controls the order of charts at the netdata dashboard. | 60000 | no |\n| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no |\n| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no |\n| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | no |\n| command | Path and command to the `exim` binary | exim -bpc | no |\n\n#### Examples\n\n##### Local exim install\n\nA basic local exim install\n\n```yaml\nlocal:\n command: 'exim -bpc'\n\n```\n", - "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `exim` collector, run the `python.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `python.d.plugin` to debug the collector:\n\n ```bash\n ./python.d.plugin exim debug trace\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `exim` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep exim\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep exim /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep exim\n```\n\n", - "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", - "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per Exim instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| exim.qemails | emails | emails |\n\n", - "integration_type": "collector", - "id": "python.d.plugin-exim-Exim", - "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/exim/metadata.yaml", - "related_resources": "" - }, { "meta": { "plugin_name": "python.d.plugin", diff --git a/src/collectors/COLLECTORS.md b/src/collectors/COLLECTORS.md index 8e42c310ae818c..5a2615d2a83215 100644 --- a/src/collectors/COLLECTORS.md +++ b/src/collectors/COLLECTORS.md @@ -747,7 +747,7 @@ If you don't see the app/service you'd like to monitor in this list: - [Dovecot](https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/dovecot/integrations/dovecot.md) -- [Exim](https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/exim/integrations/exim.md) +- [Exim](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/exim/integrations/exim.md) - [Halon](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/prometheus/integrations/halon.md) diff --git a/src/go/plugin/go.d/modules/exim/README.md b/src/go/plugin/go.d/modules/exim/README.md new file mode 120000 index 00000000000000..f1f2ef9f927dd8 --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/README.md @@ -0,0 +1 @@ +integrations/exim.md \ No newline at end of file diff --git a/src/go/plugin/go.d/modules/exim/integrations/exim.md b/src/go/plugin/go.d/modules/exim/integrations/exim.md new file mode 100644 index 00000000000000..b110e7a1f80927 --- /dev/null +++ b/src/go/plugin/go.d/modules/exim/integrations/exim.md @@ -0,0 +1,189 @@ + + +# Exim + + + + + +Plugin: go.d.plugin +Module: exim + + + +## Overview + +This collector monitors Exim mail queue. It relies on the [`exim`](https://www.exim.org/exim-html-3.20/doc/html/spec_5.html) CLI tool but avoids directly executing the binary. Instead, it utilizes `ndsudo`, a Netdata helper specifically designed to run privileged commands securely within the Netdata environment. This approach eliminates the need to use `sudo`, improving security and potentially simplifying permission management. +Executed commands: +- `exim -bpc` + + + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Exim instance + +These metrics refer to the the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| exim.qemails | emails | emails | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `go.d/exim.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config go.d/exim.conf +``` +#### Options + +The following options can be defined globally: update_every. + + +
Config options + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Data collection frequency. | 10 | no | +| timeout | exim binary execution timeout. | 2 | no | + +
+ +#### Examples + +##### Custom update_every + +Allows you to override the default data collection interval. + +
Config + +```yaml +jobs: + - name: exim + update_every: 5 # Collect logical volume statistics every 5 seconds + +``` +
+ + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `exim` collector, run the `go.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `go.d.plugin` to debug the collector: + + ```bash + ./go.d.plugin -d -m exim + ``` + +### Getting Logs + +If you're encountering problems with the `exim` collector, follow these steps to retrieve logs and identify potential issues: + +- **Run the command** specific to your system (systemd, non-systemd, or Docker container). +- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem. + +#### System with systemd + +Use the following command to view logs generated since the last Netdata service restart: + +```bash +journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep exim +``` + +#### System without systemd + +Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name: + +```bash +grep exim /var/log/netdata/collector.log +``` + +**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues. + +#### Docker Container + +If your Netdata runs in a Docker container named "netdata" (replace if different), use this command: + +```bash +docker logs netdata 2>&1 | grep exim +``` + + From f7a6066db5c03b775febaabea103784eb00b83ac Mon Sep 17 00:00:00 2001 From: netdatabot Date: Mon, 12 Aug 2024 00:18:18 +0000 Subject: [PATCH 10/27] [ci skip] Update changelog and version for nightly build: v1.46.0-282-nightly. --- CHANGELOG.md | 14 +++++++------- packaging/version | 2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 2c0df0747773d6..9130e7e2300400 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,7 +6,14 @@ **Merged pull requests:** +- Regenerate integrations.js [\#18308](https://github.com/netdata/netdata/pull/18308) ([netdatabot](https://github.com/netdatabot)) +- add go.d/exim [\#18306](https://github.com/netdata/netdata/pull/18306) ([ilyam8](https://github.com/ilyam8)) +- remove python.d/exim [\#18305](https://github.com/netdata/netdata/pull/18305) ([ilyam8](https://github.com/ilyam8)) +- add exim to ndsudo [\#18304](https://github.com/netdata/netdata/pull/18304) ([ilyam8](https://github.com/ilyam8)) +- Regenerate integrations.js [\#18303](https://github.com/netdata/netdata/pull/18303) ([netdatabot](https://github.com/netdatabot)) +- add go.d/nsd [\#18302](https://github.com/netdata/netdata/pull/18302) ([ilyam8](https://github.com/ilyam8)) - add nsd-control to ndsudo [\#18301](https://github.com/netdata/netdata/pull/18301) ([ilyam8](https://github.com/ilyam8)) +- remove python.d/nsd [\#18300](https://github.com/netdata/netdata/pull/18300) ([ilyam8](https://github.com/ilyam8)) - Regenerate integrations.js [\#18299](https://github.com/netdata/netdata/pull/18299) ([netdatabot](https://github.com/netdatabot)) - go.d gearman fix meta [\#18298](https://github.com/netdata/netdata/pull/18298) ([ilyam8](https://github.com/ilyam8)) - add go.d/gearman [\#18294](https://github.com/netdata/netdata/pull/18294) ([ilyam8](https://github.com/ilyam8)) @@ -408,13 +415,6 @@ - go.d systemdunits add "skip\_transient" [\#17725](https://github.com/netdata/netdata/pull/17725) ([ilyam8](https://github.com/ilyam8)) - minor fix on link [\#17722](https://github.com/netdata/netdata/pull/17722) ([Ancairon](https://github.com/Ancairon)) - Regenerate integrations.js [\#17721](https://github.com/netdata/netdata/pull/17721) ([netdatabot](https://github.com/netdatabot)) -- PR to change absolute links to relative [\#17720](https://github.com/netdata/netdata/pull/17720) ([Ancairon](https://github.com/Ancairon)) -- Change links to relative links in one doc [\#17719](https://github.com/netdata/netdata/pull/17719) ([Ancairon](https://github.com/Ancairon)) -- fix proc plugin disk\_avgsz [\#17718](https://github.com/netdata/netdata/pull/17718) ([ilyam8](https://github.com/ilyam8)) -- go.d weblog ignore reqProcTime on HTTP 101 [\#17717](https://github.com/netdata/netdata/pull/17717) ([ilyam8](https://github.com/ilyam8)) -- Fix mongodb default config indentation [\#17715](https://github.com/netdata/netdata/pull/17715) ([louis-lau](https://github.com/louis-lau)) -- Fix compilation with disable-cloud [\#17714](https://github.com/netdata/netdata/pull/17714) ([stelfrag](https://github.com/stelfrag)) -- fix on link [\#17712](https://github.com/netdata/netdata/pull/17712) ([Ancairon](https://github.com/Ancairon)) ## [v1.45.6](https://github.com/netdata/netdata/tree/v1.45.6) (2024-06-05) diff --git a/packaging/version b/packaging/version index 590d8d56a02c56..5963b73da04264 100644 --- a/packaging/version +++ b/packaging/version @@ -1 +1 @@ -v1.46.0-274-nightly +v1.46.0-282-nightly From 80c0093d11f798e885b8a83c284c7e43f84a857a Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Mon, 12 Aug 2024 11:44:55 +0300 Subject: [PATCH 11/27] go.d nvidia_smi remove "csv" mode (#18311) --- .../plugin/go.d/modules/nvidia_smi/charts.go | 47 +- .../plugin/go.d/modules/nvidia_smi/collect.go | 142 +++++- .../go.d/modules/nvidia_smi/collect_csv.go | 198 --------- .../go.d/modules/nvidia_smi/collect_xml.go | 265 ----------- .../modules/nvidia_smi/config_schema.json | 6 - src/go/plugin/go.d/modules/nvidia_smi/exec.go | 54 +-- .../go.d/modules/nvidia_smi/gpu_info.go | 121 +++++ src/go/plugin/go.d/modules/nvidia_smi/init.go | 4 +- .../go.d/modules/nvidia_smi/metadata.yaml | 63 +-- .../go.d/modules/nvidia_smi/nvidia_smi.go | 55 +-- .../modules/nvidia_smi/nvidia_smi_test.go | 234 +++------- .../modules/nvidia_smi/testdata/config.json | 3 +- .../modules/nvidia_smi/testdata/config.yaml | 1 - .../nvidia_smi/testdata/help-query-gpu.txt | 414 ------------------ .../nvidia_smi/testdata/tesla-p100.csv | 2 - 15 files changed, 356 insertions(+), 1253 deletions(-) delete mode 100644 src/go/plugin/go.d/modules/nvidia_smi/collect_csv.go delete mode 100644 src/go/plugin/go.d/modules/nvidia_smi/collect_xml.go create mode 100644 src/go/plugin/go.d/modules/nvidia_smi/gpu_info.go delete mode 100644 src/go/plugin/go.d/modules/nvidia_smi/testdata/help-query-gpu.txt delete mode 100644 src/go/plugin/go.d/modules/nvidia_smi/testdata/tesla-p100.csv diff --git a/src/go/plugin/go.d/modules/nvidia_smi/charts.go b/src/go/plugin/go.d/modules/nvidia_smi/charts.go index d89eb30c8a1314..746c8eed39226f 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/charts.go +++ b/src/go/plugin/go.d/modules/nvidia_smi/charts.go @@ -53,16 +53,6 @@ var ( migDeviceFrameBufferMemoryUsageChartTmpl.Copy(), migDeviceBAR1MemoryUsageChartTmpl.Copy(), } - gpuCSVCharts = module.Charts{ - gpuFanSpeedPercChartTmpl.Copy(), - gpuUtilizationChartTmpl.Copy(), - gpuMemUtilizationChartTmpl.Copy(), - gpuFrameBufferMemoryUsageChartTmpl.Copy(), - gpuTemperatureChartTmpl.Copy(), - gpuClockFreqChartTmpl.Copy(), - gpuPowerDrawChartTmpl.Copy(), - gpuPerformanceStateChartTmpl.Copy(), - } ) var ( @@ -271,7 +261,7 @@ var ( } ) -func (nv *NvidiaSMI) addGPUXMLCharts(gpu xmlGPUInfo) { +func (nv *NvidiaSmi) addGPUXMLCharts(gpu gpuInfo) { charts := gpuXMLCharts.Copy() if !isValidValue(gpu.Utilization.GpuUtil) { @@ -318,37 +308,6 @@ func (nv *NvidiaSMI) addGPUXMLCharts(gpu xmlGPUInfo) { } } -func (nv *NvidiaSMI) addGPUCSVCharts(gpu csvGPUInfo) { - charts := gpuCSVCharts.Copy() - - if !isValidValue(gpu.utilizationGPU) { - _ = charts.Remove(gpuUtilizationChartTmpl.ID) - } - if !isValidValue(gpu.utilizationMemory) { - _ = charts.Remove(gpuMemUtilizationChartTmpl.ID) - } - if !isValidValue(gpu.fanSpeed) { - _ = charts.Remove(gpuFanSpeedPercChartTmpl.ID) - } - if !isValidValue(gpu.powerDraw) { - _ = charts.Remove(gpuPowerDrawChartTmpl.ID) - } - - for _, c := range *charts { - c.ID = fmt.Sprintf(c.ID, strings.ToLower(gpu.uuid)) - c.Labels = []module.Label{ - {Key: "product_name", Value: gpu.name}, - } - for _, d := range c.Dims { - d.ID = fmt.Sprintf(d.ID, gpu.uuid) - } - } - - if err := nv.Charts().Add(*charts...); err != nil { - nv.Warning(err) - } -} - var ( migDeviceFrameBufferMemoryUsageChartTmpl = module.Chart{ ID: "mig_instance_%s_gpu_%s_frame_buffer_memory_usage", @@ -379,7 +338,7 @@ var ( } ) -func (nv *NvidiaSMI) addMIGDeviceXMLCharts(gpu xmlGPUInfo, mig xmlMIGDeviceInfo) { +func (nv *NvidiaSmi) addMIGDeviceCharts(gpu gpuInfo, mig gpuMIGDeviceInfo) { charts := migDeviceXMLCharts.Copy() for _, c := range *charts { @@ -399,7 +358,7 @@ func (nv *NvidiaSMI) addMIGDeviceXMLCharts(gpu xmlGPUInfo, mig xmlMIGDeviceInfo) } } -func (nv *NvidiaSMI) removeCharts(prefix string) { +func (nv *NvidiaSmi) removeCharts(prefix string) { prefix = strings.ToLower(prefix) for _, c := range *nv.Charts() { diff --git a/src/go/plugin/go.d/modules/nvidia_smi/collect.go b/src/go/plugin/go.d/modules/nvidia_smi/collect.go index 0830b54a361525..f621d191ba56e9 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/collect.go +++ b/src/go/plugin/go.d/modules/nvidia_smi/collect.go @@ -3,12 +3,14 @@ package nvidia_smi import ( + "encoding/xml" "errors" + "fmt" "strconv" "strings" ) -func (nv *NvidiaSMI) collect() (map[string]int64, error) { +func (nv *NvidiaSmi) collect() (map[string]int64, error) { if nv.exec == nil { return nil, errors.New("nvidia-smi exec is not initialized") } @@ -22,11 +24,141 @@ func (nv *NvidiaSMI) collect() (map[string]int64, error) { return mx, nil } -func (nv *NvidiaSMI) collectGPUInfo(mx map[string]int64) error { - if nv.UseCSVFormat { - return nv.collectGPUInfoCSV(mx) +func (nv *NvidiaSmi) collectGPUInfo(mx map[string]int64) error { + bs, err := nv.exec.queryGPUInfo() + if err != nil { + return fmt.Errorf("error on quering XML GPU info: %v", err) + } + + info := &gpusInfo{} + if err := xml.Unmarshal(bs, info); err != nil { + return fmt.Errorf("error on unmarshaling XML GPU info response: %v", err) + } + + seenGPU := make(map[string]bool) + seenMIG := make(map[string]bool) + + for _, gpu := range info.GPUs { + if !isValidValue(gpu.UUID) { + continue + } + + px := "gpu_" + gpu.UUID + "_" + + seenGPU[px] = true + + if !nv.gpus[px] { + nv.gpus[px] = true + nv.addGPUXMLCharts(gpu) + } + + addMetric(mx, px+"pcie_bandwidth_usage_rx", gpu.PCI.RxUtil, 1024) // KB => bytes + addMetric(mx, px+"pcie_bandwidth_usage_tx", gpu.PCI.TxUtil, 1024) // KB => bytes + if maxBw := calcMaxPCIEBandwidth(gpu); maxBw > 0 { + rx := parseFloat(gpu.PCI.RxUtil) * 1024 // KB => bytes + tx := parseFloat(gpu.PCI.TxUtil) * 1024 // KB => bytes + mx[px+"pcie_bandwidth_utilization_rx"] = int64((rx * 100 / maxBw) * 100) + mx[px+"pcie_bandwidth_utilization_tx"] = int64((tx * 100 / maxBw) * 100) + } + addMetric(mx, px+"fan_speed_perc", gpu.FanSpeed, 0) + addMetric(mx, px+"gpu_utilization", gpu.Utilization.GpuUtil, 0) + addMetric(mx, px+"mem_utilization", gpu.Utilization.MemoryUtil, 0) + addMetric(mx, px+"decoder_utilization", gpu.Utilization.DecoderUtil, 0) + addMetric(mx, px+"encoder_utilization", gpu.Utilization.EncoderUtil, 0) + addMetric(mx, px+"frame_buffer_memory_usage_free", gpu.FBMemoryUsage.Free, 1024*1024) // MiB => bytes + addMetric(mx, px+"frame_buffer_memory_usage_used", gpu.FBMemoryUsage.Used, 1024*1024) // MiB => bytes + addMetric(mx, px+"frame_buffer_memory_usage_reserved", gpu.FBMemoryUsage.Reserved, 1024*1024) // MiB => bytes + addMetric(mx, px+"bar1_memory_usage_free", gpu.Bar1MemoryUsage.Free, 1024*1024) // MiB => bytes + addMetric(mx, px+"bar1_memory_usage_used", gpu.Bar1MemoryUsage.Used, 1024*1024) // MiB => bytes + addMetric(mx, px+"temperature", gpu.Temperature.GpuTemp, 0) + addMetric(mx, px+"graphics_clock", gpu.Clocks.GraphicsClock, 0) + addMetric(mx, px+"video_clock", gpu.Clocks.VideoClock, 0) + addMetric(mx, px+"sm_clock", gpu.Clocks.SmClock, 0) + addMetric(mx, px+"mem_clock", gpu.Clocks.MemClock, 0) + if gpu.PowerReadings != nil { + addMetric(mx, px+"power_draw", gpu.PowerReadings.PowerDraw, 0) + } else if gpu.GPUPowerReadings != nil { + addMetric(mx, px+"power_draw", gpu.GPUPowerReadings.PowerDraw, 0) + } + addMetric(mx, px+"voltage", gpu.Voltage.GraphicsVolt, 0) + for i := 0; i < 16; i++ { + s := "P" + strconv.Itoa(i) + mx[px+"performance_state_"+s] = boolToInt(gpu.PerformanceState == s) + } + if isValidValue(gpu.MIGMode.CurrentMIG) { + mode := strings.ToLower(gpu.MIGMode.CurrentMIG) + mx[px+"mig_current_mode_enabled"] = boolToInt(mode == "enabled") + mx[px+"mig_current_mode_disabled"] = boolToInt(mode == "disabled") + mx[px+"mig_devices_count"] = int64(len(gpu.MIGDevices.MIGDevice)) + } + + for _, mig := range gpu.MIGDevices.MIGDevice { + if !isValidValue(mig.GPUInstanceID) { + continue + } + + px := "mig_instance_" + mig.GPUInstanceID + "_" + px + + seenMIG[px] = true + + if !nv.migs[px] { + nv.migs[px] = true + nv.addMIGDeviceCharts(gpu, mig) + } + + addMetric(mx, px+"ecc_error_sram_uncorrectable", mig.ECCErrorCount.VolatileCount.SRAMUncorrectable, 0) + addMetric(mx, px+"frame_buffer_memory_usage_free", mig.FBMemoryUsage.Free, 1024*1024) // MiB => bytes + addMetric(mx, px+"frame_buffer_memory_usage_used", mig.FBMemoryUsage.Used, 1024*1024) // MiB => bytes + addMetric(mx, px+"frame_buffer_memory_usage_reserved", mig.FBMemoryUsage.Reserved, 1024*1024) // MiB => bytes + addMetric(mx, px+"bar1_memory_usage_free", mig.BAR1MemoryUsage.Free, 1024*1024) // MiB => bytes + addMetric(mx, px+"bar1_memory_usage_used", mig.BAR1MemoryUsage.Used, 1024*1024) // MiB => bytes + } + } + + for px := range nv.gpus { + if !seenGPU[px] { + delete(nv.gpus, px) + nv.removeCharts(px) + } + } + + for px := range nv.migs { + if !seenMIG[px] { + delete(nv.migs, px) + nv.removeCharts(px) + } } - return nv.collectGPUInfoXML(mx) + + return nil +} + +func calcMaxPCIEBandwidth(gpu gpuInfo) float64 { + gen := gpu.PCI.PCIGPULinkInfo.PCIEGen.MaxLinkGen + width := strings.TrimSuffix(gpu.PCI.PCIGPULinkInfo.LinkWidths.MaxLinkWidth, "x") + + if !isValidValue(gen) || !isValidValue(width) { + return 0 + } + + // https://enterprise-support.nvidia.com/s/article/understanding-pcie-configuration-for-maximum-performance + var speed, enc float64 + switch gen { + case "1": + speed, enc = 2.5, 1.0/5.0 + case "2": + speed, enc = 5, 1.0/5.0 + case "3": + speed, enc = 8, 2.0/130.0 + case "4": + speed, enc = 16, 2.0/130.0 + case "5": + speed, enc = 32, 2.0/130.0 + default: + return 0 + } + + // Maximum PCIe Bandwidth = SPEED * WIDTH * (1 - ENCODING) - 1Gb/s + return (speed*parseFloat(width)*(1-enc) - 1) * 1e9 / 8 // Gb/s => bytes } func addMetric(mx map[string]int64, key, value string, mul int) { diff --git a/src/go/plugin/go.d/modules/nvidia_smi/collect_csv.go b/src/go/plugin/go.d/modules/nvidia_smi/collect_csv.go deleted file mode 100644 index 2584aaffe37ae3..00000000000000 --- a/src/go/plugin/go.d/modules/nvidia_smi/collect_csv.go +++ /dev/null @@ -1,198 +0,0 @@ -// SPDX-License-Identifier: GPL-3.0-or-later - -package nvidia_smi - -import ( - "bufio" - "bytes" - "encoding/csv" - "errors" - "fmt" - "io" - "regexp" - "strconv" - "strings" -) - -// use of property aliases is not implemented ('"" or ""' in help-query-gpu) -var knownProperties = map[string]bool{ - "uuid": true, - "name": true, - "fan.speed": true, - "pstate": true, - "utilization.gpu": true, - "utilization.memory": true, - "memory.used": true, - "memory.free": true, - "memory.reserved": true, - "temperature.gpu": true, - "clocks.current.graphics": true, - "clocks.current.video": true, - "clocks.current.sm": true, - "clocks.current.memory": true, - "power.draw": true, -} - -var reHelpProperty = regexp.MustCompile(`"([a-zA-Z_.]+)"`) - -func (nv *NvidiaSMI) collectGPUInfoCSV(mx map[string]int64) error { - if len(nv.gpuQueryProperties) == 0 { - bs, err := nv.exec.queryHelpQueryGPU() - if err != nil { - return err - } - - sc := bufio.NewScanner(bytes.NewBuffer(bs)) - - for sc.Scan() { - if !strings.HasPrefix(sc.Text(), "\"") { - continue - } - matches := reHelpProperty.FindAllString(sc.Text(), -1) - if len(matches) == 0 { - continue - } - for _, v := range matches { - if v = strings.Trim(v, "\""); knownProperties[v] { - nv.gpuQueryProperties = append(nv.gpuQueryProperties, v) - } - } - } - nv.Debugf("found query GPU properties: %v", nv.gpuQueryProperties) - } - - bs, err := nv.exec.queryGPUInfoCSV(nv.gpuQueryProperties) - if err != nil { - return err - } - - nv.Debugf("GPU info:\n%s", bs) - - r := csv.NewReader(bytes.NewBuffer(bs)) - r.Comma = ',' - r.ReuseRecord = true - r.TrimLeadingSpace = true - - // skip headers - if _, err := r.Read(); err != nil && err != io.EOF { - return err - } - - var gpusInfo []csvGPUInfo - for { - record, err := r.Read() - if err != nil { - if errors.Is(err, io.EOF) { - break - } - return err - } - - if len(record) != len(nv.gpuQueryProperties) { - return fmt.Errorf("record values (%d) != queried properties (%d)", len(record), len(nv.gpuQueryProperties)) - } - - var gpu csvGPUInfo - for i, v := range record { - switch nv.gpuQueryProperties[i] { - case "uuid": - gpu.uuid = v - case "name": - gpu.name = v - case "fan.speed": - gpu.fanSpeed = v - case "pstate": - gpu.pstate = v - case "utilization.gpu": - gpu.utilizationGPU = v - case "utilization.memory": - gpu.utilizationMemory = v - case "memory.used": - gpu.memoryUsed = v - case "memory.free": - gpu.memoryFree = v - case "memory.reserved": - gpu.memoryReserved = v - case "temperature.gpu": - gpu.temperatureGPU = v - case "clocks.current.graphics": - gpu.clocksCurrentGraphics = v - case "clocks.current.video": - gpu.clocksCurrentVideo = v - case "clocks.current.sm": - gpu.clocksCurrentSM = v - case "clocks.current.memory": - gpu.clocksCurrentMemory = v - case "power.draw": - gpu.powerDraw = v - } - } - gpusInfo = append(gpusInfo, gpu) - } - - seen := make(map[string]bool) - - for _, gpu := range gpusInfo { - if !isValidValue(gpu.uuid) || !isValidValue(gpu.name) { - continue - } - - px := "gpu_" + gpu.uuid + "_" - - seen[px] = true - - if !nv.gpus[px] { - nv.gpus[px] = true - nv.addGPUCSVCharts(gpu) - } - - addMetric(mx, px+"fan_speed_perc", gpu.fanSpeed, 0) - addMetric(mx, px+"gpu_utilization", gpu.utilizationGPU, 0) - addMetric(mx, px+"mem_utilization", gpu.utilizationMemory, 0) - addMetric(mx, px+"frame_buffer_memory_usage_free", gpu.memoryFree, 1024*1024) // MiB => bytes - addMetric(mx, px+"frame_buffer_memory_usage_used", gpu.memoryUsed, 1024*1024) // MiB => bytes - addMetric(mx, px+"frame_buffer_memory_usage_reserved", gpu.memoryReserved, 1024*1024) // MiB => bytes - addMetric(mx, px+"temperature", gpu.temperatureGPU, 0) - addMetric(mx, px+"graphics_clock", gpu.clocksCurrentGraphics, 0) - addMetric(mx, px+"video_clock", gpu.clocksCurrentVideo, 0) - addMetric(mx, px+"sm_clock", gpu.clocksCurrentSM, 0) - addMetric(mx, px+"mem_clock", gpu.clocksCurrentMemory, 0) - addMetric(mx, px+"power_draw", gpu.powerDraw, 0) - for i := 0; i < 16; i++ { - if s := "P" + strconv.Itoa(i); gpu.pstate == s { - mx[px+"performance_state_"+s] = 1 - } else { - mx[px+"performance_state_"+s] = 0 - } - } - } - - for px := range nv.gpus { - if !seen[px] { - delete(nv.gpus, px) - nv.removeCharts(px) - } - } - - return nil -} - -type ( - csvGPUInfo struct { - uuid string - name string - fanSpeed string - pstate string - utilizationGPU string - utilizationMemory string - memoryUsed string - memoryFree string - memoryReserved string - temperatureGPU string - clocksCurrentGraphics string - clocksCurrentVideo string - clocksCurrentSM string - clocksCurrentMemory string - powerDraw string - } -) diff --git a/src/go/plugin/go.d/modules/nvidia_smi/collect_xml.go b/src/go/plugin/go.d/modules/nvidia_smi/collect_xml.go deleted file mode 100644 index 2ab3180a8102b6..00000000000000 --- a/src/go/plugin/go.d/modules/nvidia_smi/collect_xml.go +++ /dev/null @@ -1,265 +0,0 @@ -// SPDX-License-Identifier: GPL-3.0-or-later - -package nvidia_smi - -import ( - "encoding/xml" - "fmt" - "strconv" - "strings" -) - -func (nv *NvidiaSMI) collectGPUInfoXML(mx map[string]int64) error { - bs, err := nv.exec.queryGPUInfoXML() - if err != nil { - return fmt.Errorf("error on quering XML GPU info: %v", err) - } - - info := &xmlInfo{} - if err := xml.Unmarshal(bs, info); err != nil { - return fmt.Errorf("error on unmarshaling XML GPU info response: %v", err) - } - - seenGPU := make(map[string]bool) - seenMIG := make(map[string]bool) - - for _, gpu := range info.GPUs { - if !isValidValue(gpu.UUID) { - continue - } - - px := "gpu_" + gpu.UUID + "_" - - seenGPU[px] = true - - if !nv.gpus[px] { - nv.gpus[px] = true - nv.addGPUXMLCharts(gpu) - } - - addMetric(mx, px+"pcie_bandwidth_usage_rx", gpu.PCI.RxUtil, 1024) // KB => bytes - addMetric(mx, px+"pcie_bandwidth_usage_tx", gpu.PCI.TxUtil, 1024) // KB => bytes - if max := calcMaxPCIEBandwidth(gpu); max > 0 { - rx := parseFloat(gpu.PCI.RxUtil) * 1024 // KB => bytes - tx := parseFloat(gpu.PCI.TxUtil) * 1024 // KB => bytes - mx[px+"pcie_bandwidth_utilization_rx"] = int64((rx * 100 / max) * 100) - mx[px+"pcie_bandwidth_utilization_tx"] = int64((tx * 100 / max) * 100) - } - addMetric(mx, px+"fan_speed_perc", gpu.FanSpeed, 0) - addMetric(mx, px+"gpu_utilization", gpu.Utilization.GpuUtil, 0) - addMetric(mx, px+"mem_utilization", gpu.Utilization.MemoryUtil, 0) - addMetric(mx, px+"decoder_utilization", gpu.Utilization.DecoderUtil, 0) - addMetric(mx, px+"encoder_utilization", gpu.Utilization.EncoderUtil, 0) - addMetric(mx, px+"frame_buffer_memory_usage_free", gpu.FBMemoryUsage.Free, 1024*1024) // MiB => bytes - addMetric(mx, px+"frame_buffer_memory_usage_used", gpu.FBMemoryUsage.Used, 1024*1024) // MiB => bytes - addMetric(mx, px+"frame_buffer_memory_usage_reserved", gpu.FBMemoryUsage.Reserved, 1024*1024) // MiB => bytes - addMetric(mx, px+"bar1_memory_usage_free", gpu.Bar1MemoryUsage.Free, 1024*1024) // MiB => bytes - addMetric(mx, px+"bar1_memory_usage_used", gpu.Bar1MemoryUsage.Used, 1024*1024) // MiB => bytes - addMetric(mx, px+"temperature", gpu.Temperature.GpuTemp, 0) - addMetric(mx, px+"graphics_clock", gpu.Clocks.GraphicsClock, 0) - addMetric(mx, px+"video_clock", gpu.Clocks.VideoClock, 0) - addMetric(mx, px+"sm_clock", gpu.Clocks.SmClock, 0) - addMetric(mx, px+"mem_clock", gpu.Clocks.MemClock, 0) - if gpu.PowerReadings != nil { - addMetric(mx, px+"power_draw", gpu.PowerReadings.PowerDraw, 0) - } else if gpu.GPUPowerReadings != nil { - addMetric(mx, px+"power_draw", gpu.GPUPowerReadings.PowerDraw, 0) - } - addMetric(mx, px+"voltage", gpu.Voltage.GraphicsVolt, 0) - for i := 0; i < 16; i++ { - s := "P" + strconv.Itoa(i) - mx[px+"performance_state_"+s] = boolToInt(gpu.PerformanceState == s) - } - if isValidValue(gpu.MIGMode.CurrentMIG) { - mode := strings.ToLower(gpu.MIGMode.CurrentMIG) - mx[px+"mig_current_mode_enabled"] = boolToInt(mode == "enabled") - mx[px+"mig_current_mode_disabled"] = boolToInt(mode == "disabled") - mx[px+"mig_devices_count"] = int64(len(gpu.MIGDevices.MIGDevice)) - } - - for _, mig := range gpu.MIGDevices.MIGDevice { - if !isValidValue(mig.GPUInstanceID) { - continue - } - - px := "mig_instance_" + mig.GPUInstanceID + "_" + px - - seenMIG[px] = true - - if !nv.migs[px] { - nv.migs[px] = true - nv.addMIGDeviceXMLCharts(gpu, mig) - } - - addMetric(mx, px+"ecc_error_sram_uncorrectable", mig.ECCErrorCount.VolatileCount.SRAMUncorrectable, 0) - addMetric(mx, px+"frame_buffer_memory_usage_free", mig.FBMemoryUsage.Free, 1024*1024) // MiB => bytes - addMetric(mx, px+"frame_buffer_memory_usage_used", mig.FBMemoryUsage.Used, 1024*1024) // MiB => bytes - addMetric(mx, px+"frame_buffer_memory_usage_reserved", mig.FBMemoryUsage.Reserved, 1024*1024) // MiB => bytes - addMetric(mx, px+"bar1_memory_usage_free", mig.BAR1MemoryUsage.Free, 1024*1024) // MiB => bytes - addMetric(mx, px+"bar1_memory_usage_used", mig.BAR1MemoryUsage.Used, 1024*1024) // MiB => bytes - } - } - - for px := range nv.gpus { - if !seenGPU[px] { - delete(nv.gpus, px) - nv.removeCharts(px) - } - } - - for px := range nv.migs { - if !seenMIG[px] { - delete(nv.migs, px) - nv.removeCharts(px) - } - } - - return nil -} - -func calcMaxPCIEBandwidth(gpu xmlGPUInfo) float64 { - gen := gpu.PCI.PCIGPULinkInfo.PCIEGen.MaxLinkGen - width := strings.TrimSuffix(gpu.PCI.PCIGPULinkInfo.LinkWidths.MaxLinkWidth, "x") - - if !isValidValue(gen) || !isValidValue(width) { - return 0 - } - - // https://enterprise-support.nvidia.com/s/article/understanding-pcie-configuration-for-maximum-performance - var speed, enc float64 - switch gen { - case "1": - speed, enc = 2.5, 1.0/5.0 - case "2": - speed, enc = 5, 1.0/5.0 - case "3": - speed, enc = 8, 2.0/130.0 - case "4": - speed, enc = 16, 2.0/130.0 - case "5": - speed, enc = 32, 2.0/130.0 - default: - return 0 - } - - // Maximum PCIe Bandwidth = SPEED * WIDTH * (1 - ENCODING) - 1Gb/s - return (speed*parseFloat(width)*(1-enc) - 1) * 1e9 / 8 // Gb/s => bytes -} - -type ( - xmlInfo struct { - GPUs []xmlGPUInfo `xml:"gpu"` - } - xmlGPUInfo struct { - ID string `xml:"id,attr"` - ProductName string `xml:"product_name"` - ProductBrand string `xml:"product_brand"` - ProductArchitecture string `xml:"product_architecture"` - UUID string `xml:"uuid"` - FanSpeed string `xml:"fan_speed"` - PerformanceState string `xml:"performance_state"` - MIGMode struct { - CurrentMIG string `xml:"current_mig"` - } `xml:"mig_mode"` - MIGDevices struct { - MIGDevice []xmlMIGDeviceInfo `xml:"mig_device"` - } `xml:"mig_devices"` - PCI struct { - TxUtil string `xml:"tx_util"` - RxUtil string `xml:"rx_util"` - PCIGPULinkInfo struct { - PCIEGen struct { - MaxLinkGen string `xml:"max_link_gen"` - } `xml:"pcie_gen"` - LinkWidths struct { - MaxLinkWidth string `xml:"max_link_width"` - } `xml:"link_widths"` - } `xml:"pci_gpu_link_info"` - } `xml:"pci"` - Utilization struct { - GpuUtil string `xml:"gpu_util"` - MemoryUtil string `xml:"memory_util"` - EncoderUtil string `xml:"encoder_util"` - DecoderUtil string `xml:"decoder_util"` - } `xml:"utilization"` - FBMemoryUsage struct { - Total string `xml:"total"` - Reserved string `xml:"reserved"` - Used string `xml:"used"` - Free string `xml:"free"` - } `xml:"fb_memory_usage"` - Bar1MemoryUsage struct { - Total string `xml:"total"` - Used string `xml:"used"` - Free string `xml:"free"` - } `xml:"bar1_memory_usage"` - Temperature struct { - GpuTemp string `xml:"gpu_temp"` - GpuTempMaxThreshold string `xml:"gpu_temp_max_threshold"` - GpuTempSlowThreshold string `xml:"gpu_temp_slow_threshold"` - GpuTempMaxGpuThreshold string `xml:"gpu_temp_max_gpu_threshold"` - GpuTargetTemperature string `xml:"gpu_target_temperature"` - MemoryTemp string `xml:"memory_temp"` - GpuTempMaxMemThreshold string `xml:"gpu_temp_max_mem_threshold"` - } `xml:"temperature"` - Clocks struct { - GraphicsClock string `xml:"graphics_clock"` - SmClock string `xml:"sm_clock"` - MemClock string `xml:"mem_clock"` - VideoClock string `xml:"video_clock"` - } `xml:"clocks"` - PowerReadings *xmlPowerReadings `xml:"power_readings"` - GPUPowerReadings *xmlPowerReadings `xml:"gpu_power_readings"` - Voltage struct { - GraphicsVolt string `xml:"graphics_volt"` - } `xml:"voltage"` - Processes struct { - ProcessInfo []struct { - PID string `xml:"pid"` - ProcessName string `xml:"process_name"` - UsedMemory string `xml:"used_memory"` - } `sml:"process_info"` - } `xml:"processes"` - } - - xmlPowerReadings struct { - //PowerState string `xml:"power_state"` - //PowerManagement string `xml:"power_management"` - PowerDraw string `xml:"power_draw"` - //PowerLimit string `xml:"power_limit"` - //DefaultPowerLimit string `xml:"default_power_limit"` - //EnforcedPowerLimit string `xml:"enforced_power_limit"` - //MinPowerLimit string `xml:"min_power_limit"` - //MaxPowerLimit string `xml:"max_power_limit"` - } - - xmlMIGDeviceInfo struct { - Index string `xml:"index"` - GPUInstanceID string `xml:"gpu_instance_id"` - ComputeInstanceID string `xml:"compute_instance_id"` - DeviceAttributes struct { - Shared struct { - MultiprocessorCount string `xml:"multiprocessor_count"` - CopyEngineCount string `xml:"copy_engine_count"` - EncoderCount string `xml:"encoder_count"` - DecoderCount string `xml:"decoder_count"` - OFACount string `xml:"ofa_count"` - JPGCount string `xml:"jpg_count"` - } `xml:"shared"` - } `xml:"device_attributes"` - ECCErrorCount struct { - VolatileCount struct { - SRAMUncorrectable string `xml:"sram_uncorrectable"` - } `xml:"volatile_count"` - } `xml:"ecc_error_count"` - FBMemoryUsage struct { - Free string `xml:"free"` - Used string `xml:"used"` - Reserved string `xml:"reserved"` - } `xml:"fb_memory_usage"` - BAR1MemoryUsage struct { - Free string `xml:"free"` - Used string `xml:"used"` - } `xml:"bar1_memory_usage"` - } -) diff --git a/src/go/plugin/go.d/modules/nvidia_smi/config_schema.json b/src/go/plugin/go.d/modules/nvidia_smi/config_schema.json index 0f4bb5a6936124..823cd781804842 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/config_schema.json +++ b/src/go/plugin/go.d/modules/nvidia_smi/config_schema.json @@ -23,12 +23,6 @@ "type": "number", "minimum": 0.5, "default": 10 - }, - "use_csv_format": { - "title": "Use CSV format", - "description": "Determines the format used for requesting GPU information. If set, CSV format is used, otherwise XML.", - "type": "boolean", - "default": false } }, "required": [ diff --git a/src/go/plugin/go.d/modules/nvidia_smi/exec.go b/src/go/plugin/go.d/modules/nvidia_smi/exec.go index eb6bea4bbc8e01..4acb3f2c00c62e 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/exec.go +++ b/src/go/plugin/go.d/modules/nvidia_smi/exec.go @@ -4,30 +4,33 @@ package nvidia_smi import ( "context" - "errors" "fmt" "os/exec" - "strings" "time" "github.com/netdata/netdata/go/plugins/logger" ) -func newNvidiaSMIExec(path string, cfg Config, log *logger.Logger) (*nvidiaSMIExec, error) { - return &nvidiaSMIExec{ +type nvidiaSmiBinary interface { + queryGPUInfo() ([]byte, error) +} + +func newNvidiaSmiExec(path string, cfg Config, log *logger.Logger) (*nvidiaSmiExec, error) { + return &nvidiaSmiExec{ + Logger: log, binPath: path, timeout: cfg.Timeout.Duration(), - Logger: log, }, nil } -type nvidiaSMIExec struct { +type nvidiaSmiExec struct { + *logger.Logger + binPath string timeout time.Duration - *logger.Logger } -func (e *nvidiaSMIExec) queryGPUInfoXML() ([]byte, error) { +func (e *nvidiaSmiExec) queryGPUInfo() ([]byte, error) { ctx, cancel := context.WithTimeout(context.Background(), e.timeout) defer cancel() @@ -41,38 +44,3 @@ func (e *nvidiaSMIExec) queryGPUInfoXML() ([]byte, error) { return bs, nil } - -func (e *nvidiaSMIExec) queryGPUInfoCSV(properties []string) ([]byte, error) { - if len(properties) == 0 { - return nil, errors.New("can not query CSV GPU Info without properties") - } - - ctx, cancel := context.WithTimeout(context.Background(), e.timeout) - defer cancel() - - cmd := exec.CommandContext(ctx, e.binPath, "--query-gpu="+strings.Join(properties, ","), "--format=csv,nounits") - - e.Debugf("executing '%s'", cmd) - - bs, err := cmd.Output() - if err != nil { - return nil, fmt.Errorf("error on '%s': %v", cmd, err) - } - - return bs, nil -} - -func (e *nvidiaSMIExec) queryHelpQueryGPU() ([]byte, error) { - ctx, cancel := context.WithTimeout(context.Background(), e.timeout) - defer cancel() - - cmd := exec.CommandContext(ctx, e.binPath, "--help-query-gpu") - - e.Debugf("executing '%s'", cmd) - bs, err := cmd.Output() - if err != nil { - return nil, fmt.Errorf("error on '%s': %v", cmd, err) - } - - return bs, err -} diff --git a/src/go/plugin/go.d/modules/nvidia_smi/gpu_info.go b/src/go/plugin/go.d/modules/nvidia_smi/gpu_info.go new file mode 100644 index 00000000000000..506d36f6eefe75 --- /dev/null +++ b/src/go/plugin/go.d/modules/nvidia_smi/gpu_info.go @@ -0,0 +1,121 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package nvidia_smi + +type gpusInfo struct { + GPUs []gpuInfo `xml:"gpu"` +} + +type ( + gpuInfo struct { + ID string `xml:"id,attr"` + ProductName string `xml:"product_name"` + ProductBrand string `xml:"product_brand"` + ProductArchitecture string `xml:"product_architecture"` + UUID string `xml:"uuid"` + FanSpeed string `xml:"fan_speed"` + PerformanceState string `xml:"performance_state"` + MIGMode struct { + CurrentMIG string `xml:"current_mig"` + } `xml:"mig_mode"` + MIGDevices struct { + MIGDevice []gpuMIGDeviceInfo `xml:"mig_device"` + } `xml:"mig_devices"` + PCI struct { + TxUtil string `xml:"tx_util"` + RxUtil string `xml:"rx_util"` + PCIGPULinkInfo struct { + PCIEGen struct { + MaxLinkGen string `xml:"max_link_gen"` + } `xml:"pcie_gen"` + LinkWidths struct { + MaxLinkWidth string `xml:"max_link_width"` + } `xml:"link_widths"` + } `xml:"pci_gpu_link_info"` + } `xml:"pci"` + Utilization struct { + GpuUtil string `xml:"gpu_util"` + MemoryUtil string `xml:"memory_util"` + EncoderUtil string `xml:"encoder_util"` + DecoderUtil string `xml:"decoder_util"` + } `xml:"utilization"` + FBMemoryUsage struct { + Total string `xml:"total"` + Reserved string `xml:"reserved"` + Used string `xml:"used"` + Free string `xml:"free"` + } `xml:"fb_memory_usage"` + Bar1MemoryUsage struct { + Total string `xml:"total"` + Used string `xml:"used"` + Free string `xml:"free"` + } `xml:"bar1_memory_usage"` + Temperature struct { + GpuTemp string `xml:"gpu_temp"` + GpuTempMaxThreshold string `xml:"gpu_temp_max_threshold"` + GpuTempSlowThreshold string `xml:"gpu_temp_slow_threshold"` + GpuTempMaxGpuThreshold string `xml:"gpu_temp_max_gpu_threshold"` + GpuTargetTemperature string `xml:"gpu_target_temperature"` + MemoryTemp string `xml:"memory_temp"` + GpuTempMaxMemThreshold string `xml:"gpu_temp_max_mem_threshold"` + } `xml:"temperature"` + Clocks struct { + GraphicsClock string `xml:"graphics_clock"` + SmClock string `xml:"sm_clock"` + MemClock string `xml:"mem_clock"` + VideoClock string `xml:"video_clock"` + } `xml:"clocks"` + PowerReadings *gpuPowerReadings `xml:"power_readings"` + GPUPowerReadings *gpuPowerReadings `xml:"gpu_power_readings"` + Voltage struct { + GraphicsVolt string `xml:"graphics_volt"` + } `xml:"voltage"` + Processes struct { + ProcessInfo []struct { + PID string `xml:"pid"` + ProcessName string `xml:"process_name"` + UsedMemory string `xml:"used_memory"` + } `sml:"process_info"` + } `xml:"processes"` + } + gpuPowerReadings struct { + //PowerState string `xml:"power_state"` + //PowerManagement string `xml:"power_management"` + PowerDraw string `xml:"power_draw"` + //PowerLimit string `xml:"power_limit"` + //DefaultPowerLimit string `xml:"default_power_limit"` + //EnforcedPowerLimit string `xml:"enforced_power_limit"` + //MinPowerLimit string `xml:"min_power_limit"` + //MaxPowerLimit string `xml:"max_power_limit"` + } + + gpuMIGDeviceInfo struct { + Index string `xml:"index"` + GPUInstanceID string `xml:"gpu_instance_id"` + ComputeInstanceID string `xml:"compute_instance_id"` + DeviceAttributes struct { + Shared struct { + MultiprocessorCount string `xml:"multiprocessor_count"` + CopyEngineCount string `xml:"copy_engine_count"` + EncoderCount string `xml:"encoder_count"` + DecoderCount string `xml:"decoder_count"` + OFACount string `xml:"ofa_count"` + JPGCount string `xml:"jpg_count"` + } `xml:"shared"` + } `xml:"device_attributes"` + ECCErrorCount struct { + VolatileCount struct { + SRAMUncorrectable string `xml:"sram_uncorrectable"` + } `xml:"volatile_count"` + } `xml:"ecc_error_count"` + FBMemoryUsage struct { + Free string `xml:"free"` + Used string `xml:"used"` + Reserved string `xml:"reserved"` + } `xml:"fb_memory_usage"` + BAR1MemoryUsage struct { + Free string `xml:"free"` + Used string `xml:"used"` + } `xml:"bar1_memory_usage"` + } +) diff --git a/src/go/plugin/go.d/modules/nvidia_smi/init.go b/src/go/plugin/go.d/modules/nvidia_smi/init.go index d8a815bb44327e..471cfe733ada27 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/init.go +++ b/src/go/plugin/go.d/modules/nvidia_smi/init.go @@ -8,7 +8,7 @@ import ( "os/exec" ) -func (nv *NvidiaSMI) initNvidiaSMIExec() (nvidiaSMI, error) { +func (nv *NvidiaSmi) initNvidiaSmiExec() (nvidiaSmiBinary, error) { binPath := nv.BinaryPath if _, err := os.Stat(binPath); os.IsNotExist(err) { path, err := exec.LookPath(nv.binName) @@ -18,5 +18,5 @@ func (nv *NvidiaSMI) initNvidiaSMIExec() (nvidiaSMI, error) { binPath = path } - return newNvidiaSMIExec(binPath, nv.Config, nv.Logger) + return newNvidiaSmiExec(binPath, nv.Config, nv.Logger) } diff --git a/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml b/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml index 630037d72b8252..d35716284a8045 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml +++ b/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml @@ -73,26 +73,11 @@ modules: description: nvidia_smi binary execution timeout. default_value: 2 required: false - - name: use_csv_format - description: Used format when requesting GPU information. XML is used if set to 'no'. - default_value: false - required: false - details: | - This module supports data collection in CSV and XML formats. The default is XML. - - - XML provides more metrics, but requesting GPU information consumes more CPU, especially if there are multiple GPUs in the system. - - CSV provides fewer metrics, but is much lighter than XML in terms of CPU usage. examples: folding: title: Config enabled: true list: - - name: CSV format - description: Use CSV format when requesting GPU information. - config: | - jobs: - - name: nvidia_smi - use_csv_format: yes - name: Custom binary path description: The executable is not in the directories specified in the PATH environment variable. config: | @@ -108,9 +93,7 @@ modules: title: Metrics enabled: false description: "" - availability: - - XML - - CSV + availability: [] scopes: - name: gpu description: These metrics refer to the GPU. @@ -121,8 +104,6 @@ modules: description: GPU product name (e.g. NVIDIA A100-SXM4-40GB) metrics: - name: nvidia_smi.gpu_pcie_bandwidth_usage - availability: - - XML description: PCI Express Bandwidth Usage unit: B/s chart_type: line @@ -130,8 +111,6 @@ modules: - name: rx - name: tx - name: nvidia_smi.gpu_pcie_bandwidth_utilization - availability: - - XML description: PCI Express Bandwidth Utilization unit: '%' chart_type: line @@ -139,52 +118,36 @@ modules: - name: rx - name: tx - name: nvidia_smi.gpu_fan_speed_perc - availability: - - XML - - CSV description: Fan speed unit: '%' chart_type: line dimensions: - name: fan_speed - name: nvidia_smi.gpu_utilization - availability: - - XML - - CSV description: GPU utilization unit: '%' chart_type: line dimensions: - name: gpu - name: nvidia_smi.gpu_memory_utilization - availability: - - XML - - CSV description: Memory utilization unit: '%' chart_type: line dimensions: - name: memory - name: nvidia_smi.gpu_decoder_utilization - availability: - - XML description: Decoder utilization unit: '%' chart_type: line dimensions: - name: decoder - name: nvidia_smi.gpu_encoder_utilization - availability: - - XML description: Encoder utilization unit: '%' chart_type: line dimensions: - name: encoder - name: nvidia_smi.gpu_frame_buffer_memory_usage - availability: - - XML - - CSV description: Frame buffer memory usage unit: B chart_type: stacked @@ -193,8 +156,6 @@ modules: - name: used - name: reserved - name: nvidia_smi.gpu_bar1_memory_usage - availability: - - XML description: BAR1 memory usage unit: B chart_type: stacked @@ -202,26 +163,18 @@ modules: - name: free - name: used - name: nvidia_smi.gpu_temperature - availability: - - XML - - CSV description: Temperature unit: Celsius chart_type: line dimensions: - name: temperature - name: nvidia_smi.gpu_voltage - availability: - - XML description: Voltage unit: V chart_type: line dimensions: - name: voltage - name: nvidia_smi.gpu_clock_freq - availability: - - XML - - CSV description: Clock current frequency unit: MHz chart_type: line @@ -231,26 +184,18 @@ modules: - name: sm - name: mem - name: nvidia_smi.gpu_power_draw - availability: - - XML - - CSV description: Power draw unit: Watts chart_type: line dimensions: - name: power_draw - name: nvidia_smi.gpu_performance_state - availability: - - XML - - CSV description: Performance state unit: state chart_type: line dimensions: - name: P0-P15 - name: nvidia_smi.gpu_mig_mode_current_status - availability: - - XML description: MIG current mode unit: status chart_type: line @@ -258,8 +203,6 @@ modules: - name: enabled - name: disabled - name: nvidia_smi.gpu_mig_devices_count - availability: - - XML description: MIG devices unit: devices chart_type: line @@ -276,8 +219,6 @@ modules: description: GPU instance id (e.g. 1) metrics: - name: nvidia_smi.gpu_mig_frame_buffer_memory_usage - availability: - - XML description: Frame buffer memory usage unit: B chart_type: stacked @@ -286,8 +227,6 @@ modules: - name: used - name: reserved - name: nvidia_smi.gpu_mig_bar1_memory_usage - availability: - - XML description: BAR1 memory usage unit: B chart_type: stacked diff --git a/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go b/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go index 4027f3f2183888..66872ce77784e5 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go +++ b/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go @@ -26,11 +26,10 @@ func init() { }) } -func New() *NvidiaSMI { - return &NvidiaSMI{ +func New() *NvidiaSmi { + return &NvidiaSmi{ Config: Config{ - Timeout: web.Duration(time.Second * 10), - UseCSVFormat: false, + Timeout: web.Duration(time.Second * 10), }, binName: "nvidia-smi", charts: &module.Charts{}, @@ -41,41 +40,31 @@ func New() *NvidiaSMI { } type Config struct { - UpdateEvery int `yaml:"update_every,omitempty" json:"update_every"` - Timeout web.Duration `yaml:"timeout,omitempty" json:"timeout"` - BinaryPath string `yaml:"binary_path" json:"binary_path"` - UseCSVFormat bool `yaml:"use_csv_format" json:"use_csv_format"` + UpdateEvery int `yaml:"update_every,omitempty" json:"update_every"` + Timeout web.Duration `yaml:"timeout,omitempty" json:"timeout"` + BinaryPath string `yaml:"binary_path" json:"binary_path"` } -type ( - NvidiaSMI struct { - module.Base - Config `yaml:",inline" json:""` +type NvidiaSmi struct { + module.Base + Config `yaml:",inline" json:""` - charts *module.Charts + charts *module.Charts - exec nvidiaSMI - binName string + exec nvidiaSmiBinary + binName string - gpuQueryProperties []string - - gpus map[string]bool - migs map[string]bool - } - nvidiaSMI interface { - queryGPUInfoXML() ([]byte, error) - queryGPUInfoCSV(properties []string) ([]byte, error) - queryHelpQueryGPU() ([]byte, error) - } -) + gpus map[string]bool + migs map[string]bool +} -func (nv *NvidiaSMI) Configuration() any { +func (nv *NvidiaSmi) Configuration() any { return nv.Config } -func (nv *NvidiaSMI) Init() error { +func (nv *NvidiaSmi) Init() error { if nv.exec == nil { - smi, err := nv.initNvidiaSMIExec() + smi, err := nv.initNvidiaSmiExec() if err != nil { nv.Error(err) return err @@ -86,7 +75,7 @@ func (nv *NvidiaSMI) Init() error { return nil } -func (nv *NvidiaSMI) Check() error { +func (nv *NvidiaSmi) Check() error { mx, err := nv.collect() if err != nil { nv.Error(err) @@ -98,11 +87,11 @@ func (nv *NvidiaSMI) Check() error { return nil } -func (nv *NvidiaSMI) Charts() *module.Charts { +func (nv *NvidiaSmi) Charts() *module.Charts { return nv.charts } -func (nv *NvidiaSMI) Collect() map[string]int64 { +func (nv *NvidiaSmi) Collect() map[string]int64 { mx, err := nv.collect() if err != nil { nv.Error(err) @@ -114,4 +103,4 @@ func (nv *NvidiaSMI) Collect() map[string]int64 { return mx } -func (nv *NvidiaSMI) Cleanup() {} +func (nv *NvidiaSmi) Cleanup() {} diff --git a/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi_test.go b/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi_test.go index 7338bf6e728a14..f93279e19cdbe8 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi_test.go +++ b/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi_test.go @@ -24,9 +24,6 @@ var ( dataXMLTeslaP100, _ = os.ReadFile("testdata/tesla-p100.xml") dataXMLA100SXM4MIG, _ = os.ReadFile("testdata/a100-sxm4-mig.xml") - - dataHelpQueryGPU, _ = os.ReadFile("testdata/help-query-gpu.txt") - dataCSVTeslaP100, _ = os.ReadFile("testdata/tesla-p100.csv") ) func Test_testDataIsValid(t *testing.T) { @@ -38,25 +35,23 @@ func Test_testDataIsValid(t *testing.T) { "dataXMLRTX3060": dataXMLRTX3060, "dataXMLTeslaP100": dataXMLTeslaP100, "dataXMLA100SXM4MIG": dataXMLA100SXM4MIG, - "dataHelpQueryGPU": dataHelpQueryGPU, - "dataCSVTeslaP100": dataCSVTeslaP100, } { require.NotNil(t, data, name) } } -func TestNvidiaSMI_ConfigurationSerialize(t *testing.T) { - module.TestConfigurationSerialize(t, &NvidiaSMI{}, dataConfigJSON, dataConfigYAML) +func TestNvidiaSmi_ConfigurationSerialize(t *testing.T) { + module.TestConfigurationSerialize(t, &NvidiaSmi{}, dataConfigJSON, dataConfigYAML) } -func TestNvidiaSMI_Init(t *testing.T) { +func TestNvidiaSmi_Init(t *testing.T) { tests := map[string]struct { - prepare func(nv *NvidiaSMI) + prepare func(nv *NvidiaSmi) wantFail bool }{ "fails if can't local nvidia-smi": { wantFail: true, - prepare: func(nv *NvidiaSMI) { + prepare: func(nv *NvidiaSmi) { nv.binName += "!!!" }, }, @@ -77,46 +72,34 @@ func TestNvidiaSMI_Init(t *testing.T) { } } -func TestNvidiaSMI_Charts(t *testing.T) { +func TestNvidiaSmi_Charts(t *testing.T) { assert.NotNil(t, New().Charts()) } -func TestNvidiaSMI_Check(t *testing.T) { +func TestNvidiaSmi_Check(t *testing.T) { tests := map[string]struct { - prepare func(nv *NvidiaSMI) + prepare func(nv *NvidiaSmi) wantFail bool }{ - "success A100-SXM4 MIG [XML]": { - wantFail: false, - prepare: prepareCaseMIGA100formatXML, - }, - "success RTX 3060 [XML]": { + "success A100-SXM4 MIG": { wantFail: false, - prepare: prepareCaseRTX3060formatXML, + prepare: prepareCaseMIGA100, }, - "success Tesla P100 [XML]": { + "success RTX 3060": { wantFail: false, - prepare: prepareCaseTeslaP100formatXML, + prepare: prepareCaseRTX3060, }, - "success Tesla P100 [CSV]": { + "success Tesla P100": { wantFail: false, - prepare: prepareCaseTeslaP100formatCSV, + prepare: prepareCaseTeslaP100, }, - "success RTX 2080 Win [XML]": { + "success RTX 2080 Win": { wantFail: false, - prepare: prepareCaseRTX2080WinFormatXML, - }, - "fail on queryGPUInfoXML error": { - wantFail: true, - prepare: prepareCaseErrOnQueryGPUInfoXML, - }, - "fail on queryGPUInfoCSV error": { - wantFail: true, - prepare: prepareCaseErrOnQueryGPUInfoCSV, + prepare: prepareCaseRTX2080Win, }, - "fail on queryHelpQueryGPU error": { + "fail on queryGPUInfo error": { wantFail: true, - prepare: prepareCaseErrOnQueryHelpQueryGPU, + prepare: prepareCaseErrOnQueryGPUInfo, }, } @@ -135,16 +118,16 @@ func TestNvidiaSMI_Check(t *testing.T) { } } -func TestNvidiaSMI_Collect(t *testing.T) { +func TestNvidiaSmi_Collect(t *testing.T) { type testCaseStep struct { - prepare func(nv *NvidiaSMI) - check func(t *testing.T, nv *NvidiaSMI) + prepare func(nv *NvidiaSmi) + check func(t *testing.T, nv *NvidiaSmi) } tests := map[string][]testCaseStep{ - "success A100-SXM4 MIG [XML]": { + "success A100-SXM4 MIG": { { - prepare: prepareCaseMIGA100formatXML, - check: func(t *testing.T, nv *NvidiaSMI) { + prepare: prepareCaseMIGA100, + check: func(t *testing.T, nv *NvidiaSmi) { mx := nv.Collect() expected := map[string]int64{ @@ -201,10 +184,10 @@ func TestNvidiaSMI_Collect(t *testing.T) { }, }, }, - "success RTX 4090 Driver 535 [XML]": { + "success RTX 4090 Driver 535": { { - prepare: prepareCaseRTX4090Driver535formatXML, - check: func(t *testing.T, nv *NvidiaSMI) { + prepare: prepareCaseRTX4090Driver535, + check: func(t *testing.T, nv *NvidiaSmi) { mx := nv.Collect() expected := map[string]int64{ @@ -251,10 +234,10 @@ func TestNvidiaSMI_Collect(t *testing.T) { }, }, }, - "success RTX 3060 [XML]": { + "success RTX 3060": { { - prepare: prepareCaseRTX3060formatXML, - check: func(t *testing.T, nv *NvidiaSMI) { + prepare: prepareCaseRTX3060, + check: func(t *testing.T, nv *NvidiaSmi) { mx := nv.Collect() expected := map[string]int64{ @@ -300,10 +283,10 @@ func TestNvidiaSMI_Collect(t *testing.T) { }, }, }, - "success Tesla P100 [XML]": { + "success Tesla P100": { { - prepare: prepareCaseTeslaP100formatXML, - check: func(t *testing.T, nv *NvidiaSMI) { + prepare: prepareCaseTeslaP100, + check: func(t *testing.T, nv *NvidiaSmi) { mx := nv.Collect() expected := map[string]int64{ @@ -348,50 +331,10 @@ func TestNvidiaSMI_Collect(t *testing.T) { }, }, }, - "success Tesla P100 [CSV]": { + "success RTX 2080 Win": { { - prepare: prepareCaseTeslaP100formatCSV, - check: func(t *testing.T, nv *NvidiaSMI) { - mx := nv.Collect() - - expected := map[string]int64{ - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_frame_buffer_memory_usage_free": 17070817280, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_frame_buffer_memory_usage_reserved": 108003328, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_frame_buffer_memory_usage_used": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_gpu_utilization": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_graphics_clock": 405, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_mem_clock": 715, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_mem_utilization": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P0": 1, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P1": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P10": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P11": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P12": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P13": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P14": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P15": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P2": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P3": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P4": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P5": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P6": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P7": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P8": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_performance_state_P9": 0, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_power_draw": 28, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_sm_clock": 405, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_temperature": 37, - "gpu_GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6_video_clock": 835, - } - - assert.Equal(t, expected, mx) - }, - }, - }, - "success RTX 2080 Win [XML]": { - { - prepare: prepareCaseRTX2080WinFormatXML, - check: func(t *testing.T, nv *NvidiaSMI) { + prepare: prepareCaseRTX2080Win, + check: func(t *testing.T, nv *NvidiaSmi) { mx := nv.Collect() expected := map[string]int64{ @@ -437,30 +380,10 @@ func TestNvidiaSMI_Collect(t *testing.T) { }, }, }, - "fail on queryGPUInfoXML error [XML]": { + "fails on queryGPUInfo error": { { - prepare: prepareCaseErrOnQueryGPUInfoXML, - check: func(t *testing.T, nv *NvidiaSMI) { - mx := nv.Collect() - - assert.Equal(t, map[string]int64(nil), mx) - }, - }, - }, - "fail on queryGPUInfoCSV error [CSV]": { - { - prepare: prepareCaseErrOnQueryGPUInfoCSV, - check: func(t *testing.T, nv *NvidiaSMI) { - mx := nv.Collect() - - assert.Equal(t, map[string]int64(nil), mx) - }, - }, - }, - "fail on queryHelpQueryGPU error": { - { - prepare: prepareCaseErrOnQueryHelpQueryGPU, - check: func(t *testing.T, nv *NvidiaSMI) { + prepare: prepareCaseErrOnQueryGPUInfo, + check: func(t *testing.T, nv *NvidiaSmi) { mx := nv.Collect() assert.Equal(t, map[string]int64(nil), mx) @@ -483,79 +406,38 @@ func TestNvidiaSMI_Collect(t *testing.T) { } } -type mockNvidiaSMI struct { - gpuInfoXML []byte - errOnQueryGPUInfoXML bool - - gpuInfoCSV []byte - errOnQueryGPUInfoCSV bool - - helpQueryGPU []byte - errOnQueryHelpQueryGPU bool -} - -func (m *mockNvidiaSMI) queryGPUInfoXML() ([]byte, error) { - if m.errOnQueryGPUInfoXML { - return nil, errors.New("error on mock.queryGPUInfoXML()") - } - return m.gpuInfoXML, nil -} - -func (m *mockNvidiaSMI) queryGPUInfoCSV(_ []string) ([]byte, error) { - if m.errOnQueryGPUInfoCSV { - return nil, errors.New("error on mock.queryGPUInfoCSV()") - } - return m.gpuInfoCSV, nil +type mockNvidiaSmi struct { + gpuInfo []byte + errOnQueryGPUInfo bool } -func (m *mockNvidiaSMI) queryHelpQueryGPU() ([]byte, error) { - if m.errOnQueryHelpQueryGPU { - return nil, errors.New("error on mock.queryHelpQueryGPU()") +func (m *mockNvidiaSmi) queryGPUInfo() ([]byte, error) { + if m.errOnQueryGPUInfo { + return nil, errors.New("error on mock.queryGPUInfo()") } - return m.helpQueryGPU, nil -} - -func prepareCaseMIGA100formatXML(nv *NvidiaSMI) { - nv.UseCSVFormat = false - nv.exec = &mockNvidiaSMI{gpuInfoXML: dataXMLA100SXM4MIG} -} - -func prepareCaseRTX3060formatXML(nv *NvidiaSMI) { - nv.UseCSVFormat = false - nv.exec = &mockNvidiaSMI{gpuInfoXML: dataXMLRTX3060} -} - -func prepareCaseRTX4090Driver535formatXML(nv *NvidiaSMI) { - nv.UseCSVFormat = false - nv.exec = &mockNvidiaSMI{gpuInfoXML: dataXMLRTX4090Driver535} + return m.gpuInfo, nil } -func prepareCaseTeslaP100formatXML(nv *NvidiaSMI) { - nv.UseCSVFormat = false - nv.exec = &mockNvidiaSMI{gpuInfoXML: dataXMLTeslaP100} +func prepareCaseMIGA100(nv *NvidiaSmi) { + nv.exec = &mockNvidiaSmi{gpuInfo: dataXMLA100SXM4MIG} } -func prepareCaseRTX2080WinFormatXML(nv *NvidiaSMI) { - nv.UseCSVFormat = false - nv.exec = &mockNvidiaSMI{gpuInfoXML: dataXMLRTX2080Win} +func prepareCaseRTX3060(nv *NvidiaSmi) { + nv.exec = &mockNvidiaSmi{gpuInfo: dataXMLRTX3060} } -func prepareCaseErrOnQueryGPUInfoXML(nv *NvidiaSMI) { - nv.UseCSVFormat = false - nv.exec = &mockNvidiaSMI{errOnQueryGPUInfoXML: true} +func prepareCaseRTX4090Driver535(nv *NvidiaSmi) { + nv.exec = &mockNvidiaSmi{gpuInfo: dataXMLRTX4090Driver535} } -func prepareCaseTeslaP100formatCSV(nv *NvidiaSMI) { - nv.UseCSVFormat = true - nv.exec = &mockNvidiaSMI{helpQueryGPU: dataHelpQueryGPU, gpuInfoCSV: dataCSVTeslaP100} +func prepareCaseTeslaP100(nv *NvidiaSmi) { + nv.exec = &mockNvidiaSmi{gpuInfo: dataXMLTeslaP100} } -func prepareCaseErrOnQueryHelpQueryGPU(nv *NvidiaSMI) { - nv.UseCSVFormat = true - nv.exec = &mockNvidiaSMI{errOnQueryHelpQueryGPU: true} +func prepareCaseRTX2080Win(nv *NvidiaSmi) { + nv.exec = &mockNvidiaSmi{gpuInfo: dataXMLRTX2080Win} } -func prepareCaseErrOnQueryGPUInfoCSV(nv *NvidiaSMI) { - nv.UseCSVFormat = true - nv.exec = &mockNvidiaSMI{helpQueryGPU: dataHelpQueryGPU, errOnQueryGPUInfoCSV: true} +func prepareCaseErrOnQueryGPUInfo(nv *NvidiaSmi) { + nv.exec = &mockNvidiaSmi{errOnQueryGPUInfo: true} } diff --git a/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.json b/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.json index a251e326a684cc..09571319348b46 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.json +++ b/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.json @@ -1,6 +1,5 @@ { "update_every": 123, "timeout": 123.123, - "binary_path": "ok", - "use_csv_format": true + "binary_path": "ok" } diff --git a/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.yaml b/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.yaml index 0b580dbcbf0477..baf3bcd0b0fab0 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.yaml +++ b/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.yaml @@ -1,4 +1,3 @@ update_every: 123 timeout: 123.123 binary_path: "ok" -use_csv_format: yes diff --git a/src/go/plugin/go.d/modules/nvidia_smi/testdata/help-query-gpu.txt b/src/go/plugin/go.d/modules/nvidia_smi/testdata/help-query-gpu.txt deleted file mode 100644 index 2dd3285e1a6394..00000000000000 --- a/src/go/plugin/go.d/modules/nvidia_smi/testdata/help-query-gpu.txt +++ /dev/null @@ -1,414 +0,0 @@ -List of valid properties to query for the switch "--query-gpu=": - -"timestamp" -The timestamp of when the query was made in format "YYYY/MM/DD HH:MM:SS.msec". - -"driver_version" -The version of the installed NVIDIA display driver. This is an alphanumeric string. - -"count" -The number of NVIDIA GPUs in the system. - -"name" or "gpu_name" -The official product name of the GPU. This is an alphanumeric string. For all products. - -"serial" or "gpu_serial" -This number matches the serial number physically printed on each board. It is a globally unique immutable alphanumeric value. - -"uuid" or "gpu_uuid" -This value is the globally unique immutable alphanumeric identifier of the GPU. It does not correspond to any physical label on the board. - -"pci.bus_id" or "gpu_bus_id" -PCI bus id as "domain:bus:device.function", in hex. - -"pci.domain" -PCI domain number, in hex. - -"pci.bus" -PCI bus number, in hex. - -"pci.device" -PCI device number, in hex. - -"pci.device_id" -PCI vendor device id, in hex - -"pci.sub_device_id" -PCI Sub System id, in hex - -"pcie.link.gen.current" -The current PCI-E link generation. These may be reduced when the GPU is not in use. - -"pcie.link.gen.max" -The maximum PCI-E link generation possible with this GPU and system configuration. For example, if the GPU supports a higher PCIe generation than the system supports then this reports the system PCIe generation. - -"pcie.link.width.current" -The current PCI-E link width. These may be reduced when the GPU is not in use. - -"pcie.link.width.max" -The maximum PCI-E link width possible with this GPU and system configuration. For example, if the GPU supports a higher PCIe generation than the system supports then this reports the system PCIe generation. - -"index" -Zero based index of the GPU. Can change at each boot. - -"display_mode" -A flag that indicates whether a physical display (e.g. monitor) is currently connected to any of the GPU's connectors. "Enabled" indicates an attached display. "Disabled" indicates otherwise. - -"display_active" -A flag that indicates whether a display is initialized on the GPU's (e.g. memory is allocated on the device for display). Display can be active even when no monitor is physically attached. "Enabled" indicates an active display. "Disabled" indicates otherwise. - -"persistence_mode" -A flag that indicates whether persistence mode is enabled for the GPU. Value is either "Enabled" or "Disabled". When persistence mode is enabled the NVIDIA driver remains loaded even when no active clients, such as X11 or nvidia-smi, exist. This minimizes the driver load latency associated with running dependent apps, such as CUDA programs. Linux only. - -"accounting.mode" -A flag that indicates whether accounting mode is enabled for the GPU. Value is either "Enabled" or "Disabled". When accounting is enabled statistics are calculated for each compute process running on the GPU.Statistics can be queried during the lifetime or after termination of the process.The execution time of process is reported as 0 while the process is in running state and updated to actualexecution time after the process has terminated. See --help-query-accounted-apps for more info. - -"accounting.buffer_size" -The size of the circular buffer that holds list of processes that can be queried for accounting stats. This is the maximum number of processes that accounting information will be stored for before information about oldest processes will get overwritten by information about new processes. - -Section about driver_model properties -On Windows, the TCC and WDDM driver models are supported. The driver model can be changed with the (-dm) or (-fdm) flags. The TCC driver model is optimized for compute applications. I.E. kernel launch times will be quicker with TCC. The WDDM driver model is designed for graphics applications and is not recommended for compute applications. Linux does not support multiple driver models, and will always have the value of "N/A". Only for selected products. Please see feature matrix in NVML documentation. - -"driver_model.current" -The driver model currently in use. Always "N/A" on Linux. - -"driver_model.pending" -The driver model that will be used on the next reboot. Always "N/A" on Linux. - -"vbios_version" -The BIOS of the GPU board. - -Section about inforom properties -Version numbers for each object in the GPU board's inforom storage. The inforom is a small, persistent store of configuration and state data for the GPU. All inforom version fields are numerical. It can be useful to know these version numbers because some GPU features are only available with inforoms of a certain version or higher. - -"inforom.img" or "inforom.image" -Global version of the infoROM image. Image version just like VBIOS version uniquely describes the exact version of the infoROM flashed on the board in contrast to infoROM object version which is only an indicator of supported features. - -"inforom.oem" -Version for the OEM configuration data. - -"inforom.ecc" -Version for the ECC recording data. - -"inforom.pwr" or "inforom.power" -Version for the power management data. - -Section about gom properties -GOM allows to reduce power usage and optimize GPU throughput by disabling GPU features. Each GOM is designed to meet specific user needs. -In "All On" mode everything is enabled and running at full speed. -The "Compute" mode is designed for running only compute tasks. Graphics operations are not allowed. -The "Low Double Precision" mode is designed for running graphics applications that don't require high bandwidth double precision. -GOM can be changed with the (--gom) flag. - -"gom.current" or "gpu_operation_mode.current" -The GOM currently in use. - -"gom.pending" or "gpu_operation_mode.pending" -The GOM that will be used on the next reboot. - -"fan.speed" -The fan speed value is the percent of the product's maximum noise tolerance fan speed that the device's fan is currently intended to run at. This value may exceed 100% in certain cases. Note: The reported speed is the intended fan speed. If the fan is physically blocked and unable to spin, this output will not match the actual fan speed. Many parts do not report fan speeds because they rely on cooling via fans in the surrounding enclosure. - -"pstate" -The current performance state for the GPU. States range from P0 (maximum performance) to P12 (minimum performance). - -Section about clocks_throttle_reasons properties -Retrieves information about factors that are reducing the frequency of clocks. If all throttle reasons are returned as "Not Active" it means that clocks are running as high as possible. - -"clocks_throttle_reasons.supported" -Bitmask of supported clock throttle reasons. See nvml.h for more details. - -"clocks_throttle_reasons.active" -Bitmask of active clock throttle reasons. See nvml.h for more details. - -"clocks_throttle_reasons.gpu_idle" -Nothing is running on the GPU and the clocks are dropping to Idle state. This limiter may be removed in a later release. - -"clocks_throttle_reasons.applications_clocks_setting" -GPU clocks are limited by applications clocks setting. E.g. can be changed by nvidia-smi --applications-clocks= - -"clocks_throttle_reasons.sw_power_cap" -SW Power Scaling algorithm is reducing the clocks below requested clocks because the GPU is consuming too much power. E.g. SW power cap limit can be changed with nvidia-smi --power-limit= - -"clocks_throttle_reasons.hw_slowdown" -HW Slowdown (reducing the core clocks by a factor of 2 or more) is engaged. This is an indicator of: - HW Thermal Slowdown: temperature being too high - HW Power Brake Slowdown: External Power Brake Assertion is triggered (e.g. by the system power supply) - * Power draw is too high and Fast Trigger protection is reducing the clocks - * May be also reported during PState or clock change - * This behavior may be removed in a later release - -"clocks_throttle_reasons.hw_thermal_slowdown" -HW Thermal Slowdown (reducing the core clocks by a factor of 2 or more) is engaged. This is an indicator of temperature being too high - -"clocks_throttle_reasons.hw_power_brake_slowdown" -HW Power Brake Slowdown (reducing the core clocks by a factor of 2 or more) is engaged. This is an indicator of External Power Brake Assertion being triggered (e.g. by the system power supply) - -"clocks_throttle_reasons.sw_thermal_slowdown" -SW Thermal capping algorithm is reducing clocks below requested clocks because GPU temperature is higher than Max Operating Temp. - -"clocks_throttle_reasons.sync_boost" -Sync Boost This GPU has been added to a Sync boost group with nvidia-smi or DCGM in - * order to maximize performance per watt. All GPUs in the sync boost group - * will boost to the minimum possible clocks across the entire group. Look at - * the throttle reasons for other GPUs in the system to see why those GPUs are - * holding this one at lower clocks. - -Section about memory properties -On-board memory information. Reported total memory is affected by ECC state. If ECC is enabled the total available memory is decreased by several percent, due to the requisite parity bits. The driver may also reserve a small amount of memory for internal use, even without active work on the GPU. - -"memory.total" -Total installed GPU memory. - -"memory.reserved" -Total memory reserved by the NVIDIA driver and firmware. - -"memory.used" -Total memory allocated by active contexts. - -"memory.free" -Total free memory. - -"compute_mode" -The compute mode flag indicates whether individual or multiple compute applications may run on the GPU. -"0: Default" means multiple contexts are allowed per device. -"1: Exclusive_Thread", deprecated, use Exclusive_Process instead -"2: Prohibited" means no contexts are allowed per device (no compute apps). -"3: Exclusive_Process" means only one context is allowed per device, usable from multiple threads at a time. - -"compute_cap" -The CUDA Compute Capability, represented as Major DOT Minor. - -Section about utilization properties -Utilization rates report how busy each GPU is over time, and can be used to determine how much an application is using the GPUs in the system. - -"utilization.gpu" -Percent of time over the past sample period during which one or more kernels was executing on the GPU. -The sample period may be between 1 second and 1/6 second depending on the product. - -"utilization.memory" -Percent of time over the past sample period during which global (device) memory was being read or written. -The sample period may be between 1 second and 1/6 second depending on the product. - -Section about encoder.stats properties -Encoder stats report number of encoder sessions, average FPS and average latency in us for given GPUs in the system. - -"encoder.stats.sessionCount" -Number of encoder sessions running on the GPU. - -"encoder.stats.averageFps" -Average FPS of all sessions running on the GPU. - -"encoder.stats.averageLatency" -Average latency in microseconds of all sessions running on the GPU. - -Section about ecc.mode properties -A flag that indicates whether ECC support is enabled. May be either "Enabled" or "Disabled". Changes to ECC mode require a reboot. Requires Inforom ECC object version 1.0 or higher. - -"ecc.mode.current" -The ECC mode that the GPU is currently operating under. - -"ecc.mode.pending" -The ECC mode that the GPU will operate under after the next reboot. - -Section about ecc.errors properties -NVIDIA GPUs can provide error counts for various types of ECC errors. Some ECC errors are either single or double bit, where single bit errors are corrected and double bit errors are uncorrectable. Texture memory errors may be correctable via resend or uncorrectable if the resend fails. These errors are available across two timescales (volatile and aggregate). Single bit ECC errors are automatically corrected by the HW and do not result in data corruption. Double bit errors are detected but not corrected. Please see the ECC documents on the web for information on compute application behavior when double bit errors occur. Volatile error counters track the number of errors detected since the last driver load. Aggregate error counts persist indefinitely and thus act as a lifetime counter. - -"ecc.errors.corrected.volatile.device_memory" -Errors detected in global device memory. - -"ecc.errors.corrected.volatile.dram" -Errors detected in global device memory. - -"ecc.errors.corrected.volatile.register_file" -Errors detected in register file memory. - -"ecc.errors.corrected.volatile.l1_cache" -Errors detected in the L1 cache. - -"ecc.errors.corrected.volatile.l2_cache" -Errors detected in the L2 cache. - -"ecc.errors.corrected.volatile.texture_memory" -Parity errors detected in texture memory. - -"ecc.errors.corrected.volatile.cbu" -Parity errors detected in CBU. - -"ecc.errors.corrected.volatile.sram" -Errors detected in global SRAMs. - -"ecc.errors.corrected.volatile.total" -Total errors detected across entire chip. - -"ecc.errors.corrected.aggregate.device_memory" -Errors detected in global device memory. - -"ecc.errors.corrected.aggregate.dram" -Errors detected in global device memory. - -"ecc.errors.corrected.aggregate.register_file" -Errors detected in register file memory. - -"ecc.errors.corrected.aggregate.l1_cache" -Errors detected in the L1 cache. - -"ecc.errors.corrected.aggregate.l2_cache" -Errors detected in the L2 cache. - -"ecc.errors.corrected.aggregate.texture_memory" -Parity errors detected in texture memory. - -"ecc.errors.corrected.aggregate.cbu" -Parity errors detected in CBU. - -"ecc.errors.corrected.aggregate.sram" -Errors detected in global SRAMs. - -"ecc.errors.corrected.aggregate.total" -Total errors detected across entire chip. - -"ecc.errors.uncorrected.volatile.device_memory" -Errors detected in global device memory. - -"ecc.errors.uncorrected.volatile.dram" -Errors detected in global device memory. - -"ecc.errors.uncorrected.volatile.register_file" -Errors detected in register file memory. - -"ecc.errors.uncorrected.volatile.l1_cache" -Errors detected in the L1 cache. - -"ecc.errors.uncorrected.volatile.l2_cache" -Errors detected in the L2 cache. - -"ecc.errors.uncorrected.volatile.texture_memory" -Parity errors detected in texture memory. - -"ecc.errors.uncorrected.volatile.cbu" -Parity errors detected in CBU. - -"ecc.errors.uncorrected.volatile.sram" -Errors detected in global SRAMs. - -"ecc.errors.uncorrected.volatile.total" -Total errors detected across entire chip. - -"ecc.errors.uncorrected.aggregate.device_memory" -Errors detected in global device memory. - -"ecc.errors.uncorrected.aggregate.dram" -Errors detected in global device memory. - -"ecc.errors.uncorrected.aggregate.register_file" -Errors detected in register file memory. - -"ecc.errors.uncorrected.aggregate.l1_cache" -Errors detected in the L1 cache. - -"ecc.errors.uncorrected.aggregate.l2_cache" -Errors detected in the L2 cache. - -"ecc.errors.uncorrected.aggregate.texture_memory" -Parity errors detected in texture memory. - -"ecc.errors.uncorrected.aggregate.cbu" -Parity errors detected in CBU. - -"ecc.errors.uncorrected.aggregate.sram" -Errors detected in global SRAMs. - -"ecc.errors.uncorrected.aggregate.total" -Total errors detected across entire chip. - -Section about retired_pages properties -NVIDIA GPUs can retire pages of GPU device memory when they become unreliable. This can happen when multiple single bit ECC errors occur for the same page, or on a double bit ECC error. When a page is retired, the NVIDIA driver will hide it such that no driver, or application memory allocations can access it. - -"retired_pages.single_bit_ecc.count" or "retired_pages.sbe" -The number of GPU device memory pages that have been retired due to multiple single bit ECC errors. - -"retired_pages.double_bit.count" or "retired_pages.dbe" -The number of GPU device memory pages that have been retired due to a double bit ECC error. - -"retired_pages.pending" -Checks if any GPU device memory pages are pending retirement on the next reboot. Pages that are pending retirement can still be allocated, and may cause further reliability issues. - -"temperature.gpu" - Core GPU temperature. in degrees C. - -"temperature.memory" - HBM memory temperature. in degrees C. - -"power.management" -A flag that indicates whether power management is enabled. Either "Supported" or "[Not Supported]". Requires Inforom PWR object version 3.0 or higher or Kepler device. - -"power.draw" -The last measured power draw for the entire board, in watts. Only available if power management is supported. This reading is accurate to within +/- 5 watts. - -"power.limit" -The software power limit in watts. Set by software like nvidia-smi. On Kepler devices Power Limit can be adjusted using [-pl | --power-limit=] switches. - -"enforced.power.limit" -The power management algorithm's power ceiling, in watts. Total board power draw is manipulated by the power management algorithm such that it stays under this value. This value is the minimum of various power limiters. - -"power.default_limit" -The default power management algorithm's power ceiling, in watts. Power Limit will be set back to Default Power Limit after driver unload. - -"power.min_limit" -The minimum value in watts that power limit can be set to. - -"power.max_limit" -The maximum value in watts that power limit can be set to. - -"clocks.current.graphics" or "clocks.gr" -Current frequency of graphics (shader) clock. - -"clocks.current.sm" or "clocks.sm" -Current frequency of SM (Streaming Multiprocessor) clock. - -"clocks.current.memory" or "clocks.mem" -Current frequency of memory clock. - -"clocks.current.video" or "clocks.video" -Current frequency of video encoder/decoder clock. - -Section about clocks.applications properties -User specified frequency at which applications will be running at. Can be changed with [-ac | --applications-clocks] switches. - -"clocks.applications.graphics" or "clocks.applications.gr" -User specified frequency of graphics (shader) clock. - -"clocks.applications.memory" or "clocks.applications.mem" -User specified frequency of memory clock. - -Section about clocks.default_applications properties -Default frequency at which applications will be running at. Application clocks can be changed with [-ac | --applications-clocks] switches. Application clocks can be set to default using [-rac | --reset-applications-clocks] switches. - -"clocks.default_applications.graphics" or "clocks.default_applications.gr" -Default frequency of applications graphics (shader) clock. - -"clocks.default_applications.memory" or "clocks.default_applications.mem" -Default frequency of applications memory clock. - -Section about clocks.max properties -Maximum frequency at which parts of the GPU are design to run. - -"clocks.max.graphics" or "clocks.max.gr" -Maximum frequency of graphics (shader) clock. - -"clocks.max.sm" or "clocks.max.sm" -Maximum frequency of SM (Streaming Multiprocessor) clock. - -"clocks.max.memory" or "clocks.max.mem" -Maximum frequency of memory clock. - -Section about mig.mode properties -A flag that indicates whether MIG mode is enabled. May be either "Enabled" or "Disabled". Changes to MIG mode require a GPU reset. - -"mig.mode.current" -The MIG mode that the GPU is currently operating under. - -"mig.mode.pending" -The MIG mode that the GPU will operate under after reset. - diff --git a/src/go/plugin/go.d/modules/nvidia_smi/testdata/tesla-p100.csv b/src/go/plugin/go.d/modules/nvidia_smi/testdata/tesla-p100.csv deleted file mode 100644 index 9a4c1e1a9dcd7d..00000000000000 --- a/src/go/plugin/go.d/modules/nvidia_smi/testdata/tesla-p100.csv +++ /dev/null @@ -1,2 +0,0 @@ -name, uuid, fan.speed [%], pstate, memory.reserved [MiB], memory.used [MiB], memory.free [MiB], utilization.gpu [%], utilization.memory [%], temperature.gpu, power.draw [W], clocks.current.graphics [MHz], clocks.current.sm [MHz], clocks.current.memory [MHz], clocks.current.video [MHz] -Tesla P100-PCIE-16GB, GPU-ef1b2c9b-38d8-2090-2bd1-f567a3eb42a6, [N/A], P0, 103, 0, 16280, 0, 0, 37, 28.16, 405, 405, 715, 835 \ No newline at end of file From 67824a216bbe38eb7cb65108d26663abc59318c1 Mon Sep 17 00:00:00 2001 From: Netdata bot <43409846+netdatabot@users.noreply.github.com> Date: Mon, 12 Aug 2024 04:55:57 -0400 Subject: [PATCH 12/27] Regenerate integrations.js (#18312) Co-authored-by: ilyam8 <22274335+ilyam8@users.noreply.github.com> --- integrations/integrations.js | 4 +- integrations/integrations.json | 4 +- .../nvidia_smi/integrations/nvidia_gpu.md | 59 +++++++------------ 3 files changed, 26 insertions(+), 41 deletions(-) diff --git a/integrations/integrations.js b/integrations/integrations.js index 89ceb53c9f8ee0..89541e2ffe29ce 100644 --- a/integrations/integrations.js +++ b/integrations/integrations.js @@ -5542,10 +5542,10 @@ export const integrations = [ "most_popular": false }, "overview": "# Nvidia GPU\n\nPlugin: go.d.plugin\nModule: nvidia_smi\n\n## Overview\n\nThis collector monitors GPUs performance metrics using\nthe [nvidia-smi](https://developer.nvidia.com/nvidia-system-management-interface) CLI tool.\n\n> **Warning**: under development, [loop mode](https://github.com/netdata/netdata/issues/14522) not implemented yet.\n\n\n\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis integration doesn't support auto-detection.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### Enable in go.d.conf.\n\nThis collector is disabled by default. You need to explicitly enable it in the `go.d.conf` file.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/nvidia_smi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/nvidia_smi.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| binary_path | Path to nvidia_smi binary. The default is \"nvidia_smi\" and the executable is looked for in the directories specified in the PATH environment variable. | nvidia_smi | no |\n| timeout | nvidia_smi binary execution timeout. | 2 | no |\n| use_csv_format | Used format when requesting GPU information. XML is used if set to 'no'. | no | no |\n\n{% /details %}\n#### Examples\n\n##### CSV format\n\nUse CSV format when requesting GPU information.\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: nvidia_smi\n use_csv_format: yes\n\n```\n{% /details %}\n##### Custom binary path\n\nThe executable is not in the directories specified in the PATH environment variable.\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: nvidia_smi\n binary_path: /usr/local/sbin/nvidia_smi\n\n```\n{% /details %}\n", + "setup": "## Setup\n\n### Prerequisites\n\n#### Enable in go.d.conf.\n\nThis collector is disabled by default. You need to explicitly enable it in the `go.d.conf` file.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/nvidia_smi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/nvidia_smi.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| binary_path | Path to nvidia_smi binary. The default is \"nvidia_smi\" and the executable is looked for in the directories specified in the PATH environment variable. | nvidia_smi | no |\n| timeout | nvidia_smi binary execution timeout. | 2 | no |\n\n{% /details %}\n#### Examples\n\n##### Custom binary path\n\nThe executable is not in the directories specified in the PATH environment variable.\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: nvidia_smi\n binary_path: /usr/local/sbin/nvidia_smi\n\n```\n{% /details %}\n", "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `nvidia_smi` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m nvidia_smi\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `nvidia_smi` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep nvidia_smi\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep nvidia_smi /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep nvidia_smi\n```\n\n", "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", - "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per gpu\n\nThese metrics refer to the GPU.\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n\nMetrics:\n\n| Metric | Dimensions | Unit | XML | CSV |\n|:------|:----------|:----|:---:|:---:|\n| nvidia_smi.gpu_pcie_bandwidth_usage | rx, tx | B/s | \u2022 | |\n| nvidia_smi.gpu_pcie_bandwidth_utilization | rx, tx | % | \u2022 | |\n| nvidia_smi.gpu_fan_speed_perc | fan_speed | % | \u2022 | \u2022 |\n| nvidia_smi.gpu_utilization | gpu | % | \u2022 | \u2022 |\n| nvidia_smi.gpu_memory_utilization | memory | % | \u2022 | \u2022 |\n| nvidia_smi.gpu_decoder_utilization | decoder | % | \u2022 | |\n| nvidia_smi.gpu_encoder_utilization | encoder | % | \u2022 | |\n| nvidia_smi.gpu_frame_buffer_memory_usage | free, used, reserved | B | \u2022 | \u2022 |\n| nvidia_smi.gpu_bar1_memory_usage | free, used | B | \u2022 | |\n| nvidia_smi.gpu_temperature | temperature | Celsius | \u2022 | \u2022 |\n| nvidia_smi.gpu_voltage | voltage | V | \u2022 | |\n| nvidia_smi.gpu_clock_freq | graphics, video, sm, mem | MHz | \u2022 | \u2022 |\n| nvidia_smi.gpu_power_draw | power_draw | Watts | \u2022 | \u2022 |\n| nvidia_smi.gpu_performance_state | P0-P15 | state | \u2022 | \u2022 |\n| nvidia_smi.gpu_mig_mode_current_status | enabled, disabled | status | \u2022 | |\n| nvidia_smi.gpu_mig_devices_count | mig | devices | \u2022 | |\n\n### Per mig\n\nThese metrics refer to the Multi-Instance GPU (MIG).\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n| gpu_instance_id | GPU instance id (e.g. 1) |\n\nMetrics:\n\n| Metric | Dimensions | Unit | XML | CSV |\n|:------|:----------|:----|:---:|:---:|\n| nvidia_smi.gpu_mig_frame_buffer_memory_usage | free, used, reserved | B | \u2022 | |\n| nvidia_smi.gpu_mig_bar1_memory_usage | free, used | B | \u2022 | |\n\n", + "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per gpu\n\nThese metrics refer to the GPU.\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nvidia_smi.gpu_pcie_bandwidth_usage | rx, tx | B/s |\n| nvidia_smi.gpu_pcie_bandwidth_utilization | rx, tx | % |\n| nvidia_smi.gpu_fan_speed_perc | fan_speed | % |\n| nvidia_smi.gpu_utilization | gpu | % |\n| nvidia_smi.gpu_memory_utilization | memory | % |\n| nvidia_smi.gpu_decoder_utilization | decoder | % |\n| nvidia_smi.gpu_encoder_utilization | encoder | % |\n| nvidia_smi.gpu_frame_buffer_memory_usage | free, used, reserved | B |\n| nvidia_smi.gpu_bar1_memory_usage | free, used | B |\n| nvidia_smi.gpu_temperature | temperature | Celsius |\n| nvidia_smi.gpu_voltage | voltage | V |\n| nvidia_smi.gpu_clock_freq | graphics, video, sm, mem | MHz |\n| nvidia_smi.gpu_power_draw | power_draw | Watts |\n| nvidia_smi.gpu_performance_state | P0-P15 | state |\n| nvidia_smi.gpu_mig_mode_current_status | enabled, disabled | status |\n| nvidia_smi.gpu_mig_devices_count | mig | devices |\n\n### Per mig\n\nThese metrics refer to the Multi-Instance GPU (MIG).\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n| gpu_instance_id | GPU instance id (e.g. 1) |\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nvidia_smi.gpu_mig_frame_buffer_memory_usage | free, used, reserved | B |\n| nvidia_smi.gpu_mig_bar1_memory_usage | free, used | B |\n\n", "integration_type": "collector", "id": "go.d.plugin-nvidia_smi-Nvidia_GPU", "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml", diff --git a/integrations/integrations.json b/integrations/integrations.json index 2689389089ba1e..8dbfbab199d2fb 100644 --- a/integrations/integrations.json +++ b/integrations/integrations.json @@ -5540,10 +5540,10 @@ "most_popular": false }, "overview": "# Nvidia GPU\n\nPlugin: go.d.plugin\nModule: nvidia_smi\n\n## Overview\n\nThis collector monitors GPUs performance metrics using\nthe [nvidia-smi](https://developer.nvidia.com/nvidia-system-management-interface) CLI tool.\n\n> **Warning**: under development, [loop mode](https://github.com/netdata/netdata/issues/14522) not implemented yet.\n\n\n\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis integration doesn't support auto-detection.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### Enable in go.d.conf.\n\nThis collector is disabled by default. You need to explicitly enable it in the `go.d.conf` file.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/nvidia_smi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/nvidia_smi.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| binary_path | Path to nvidia_smi binary. The default is \"nvidia_smi\" and the executable is looked for in the directories specified in the PATH environment variable. | nvidia_smi | no |\n| timeout | nvidia_smi binary execution timeout. | 2 | no |\n| use_csv_format | Used format when requesting GPU information. XML is used if set to 'no'. | no | no |\n\n#### Examples\n\n##### CSV format\n\nUse CSV format when requesting GPU information.\n\n```yaml\njobs:\n - name: nvidia_smi\n use_csv_format: yes\n\n```\n##### Custom binary path\n\nThe executable is not in the directories specified in the PATH environment variable.\n\n```yaml\njobs:\n - name: nvidia_smi\n binary_path: /usr/local/sbin/nvidia_smi\n\n```\n", + "setup": "## Setup\n\n### Prerequisites\n\n#### Enable in go.d.conf.\n\nThis collector is disabled by default. You need to explicitly enable it in the `go.d.conf` file.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/nvidia_smi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/nvidia_smi.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| binary_path | Path to nvidia_smi binary. The default is \"nvidia_smi\" and the executable is looked for in the directories specified in the PATH environment variable. | nvidia_smi | no |\n| timeout | nvidia_smi binary execution timeout. | 2 | no |\n\n#### Examples\n\n##### Custom binary path\n\nThe executable is not in the directories specified in the PATH environment variable.\n\n```yaml\njobs:\n - name: nvidia_smi\n binary_path: /usr/local/sbin/nvidia_smi\n\n```\n", "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `nvidia_smi` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m nvidia_smi\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `nvidia_smi` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep nvidia_smi\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep nvidia_smi /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep nvidia_smi\n```\n\n", "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", - "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per gpu\n\nThese metrics refer to the GPU.\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n\nMetrics:\n\n| Metric | Dimensions | Unit | XML | CSV |\n|:------|:----------|:----|:---:|:---:|\n| nvidia_smi.gpu_pcie_bandwidth_usage | rx, tx | B/s | \u2022 | |\n| nvidia_smi.gpu_pcie_bandwidth_utilization | rx, tx | % | \u2022 | |\n| nvidia_smi.gpu_fan_speed_perc | fan_speed | % | \u2022 | \u2022 |\n| nvidia_smi.gpu_utilization | gpu | % | \u2022 | \u2022 |\n| nvidia_smi.gpu_memory_utilization | memory | % | \u2022 | \u2022 |\n| nvidia_smi.gpu_decoder_utilization | decoder | % | \u2022 | |\n| nvidia_smi.gpu_encoder_utilization | encoder | % | \u2022 | |\n| nvidia_smi.gpu_frame_buffer_memory_usage | free, used, reserved | B | \u2022 | \u2022 |\n| nvidia_smi.gpu_bar1_memory_usage | free, used | B | \u2022 | |\n| nvidia_smi.gpu_temperature | temperature | Celsius | \u2022 | \u2022 |\n| nvidia_smi.gpu_voltage | voltage | V | \u2022 | |\n| nvidia_smi.gpu_clock_freq | graphics, video, sm, mem | MHz | \u2022 | \u2022 |\n| nvidia_smi.gpu_power_draw | power_draw | Watts | \u2022 | \u2022 |\n| nvidia_smi.gpu_performance_state | P0-P15 | state | \u2022 | \u2022 |\n| nvidia_smi.gpu_mig_mode_current_status | enabled, disabled | status | \u2022 | |\n| nvidia_smi.gpu_mig_devices_count | mig | devices | \u2022 | |\n\n### Per mig\n\nThese metrics refer to the Multi-Instance GPU (MIG).\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n| gpu_instance_id | GPU instance id (e.g. 1) |\n\nMetrics:\n\n| Metric | Dimensions | Unit | XML | CSV |\n|:------|:----------|:----|:---:|:---:|\n| nvidia_smi.gpu_mig_frame_buffer_memory_usage | free, used, reserved | B | \u2022 | |\n| nvidia_smi.gpu_mig_bar1_memory_usage | free, used | B | \u2022 | |\n\n", + "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per gpu\n\nThese metrics refer to the GPU.\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nvidia_smi.gpu_pcie_bandwidth_usage | rx, tx | B/s |\n| nvidia_smi.gpu_pcie_bandwidth_utilization | rx, tx | % |\n| nvidia_smi.gpu_fan_speed_perc | fan_speed | % |\n| nvidia_smi.gpu_utilization | gpu | % |\n| nvidia_smi.gpu_memory_utilization | memory | % |\n| nvidia_smi.gpu_decoder_utilization | decoder | % |\n| nvidia_smi.gpu_encoder_utilization | encoder | % |\n| nvidia_smi.gpu_frame_buffer_memory_usage | free, used, reserved | B |\n| nvidia_smi.gpu_bar1_memory_usage | free, used | B |\n| nvidia_smi.gpu_temperature | temperature | Celsius |\n| nvidia_smi.gpu_voltage | voltage | V |\n| nvidia_smi.gpu_clock_freq | graphics, video, sm, mem | MHz |\n| nvidia_smi.gpu_power_draw | power_draw | Watts |\n| nvidia_smi.gpu_performance_state | P0-P15 | state |\n| nvidia_smi.gpu_mig_mode_current_status | enabled, disabled | status |\n| nvidia_smi.gpu_mig_devices_count | mig | devices |\n\n### Per mig\n\nThese metrics refer to the Multi-Instance GPU (MIG).\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n| gpu_instance_id | GPU instance id (e.g. 1) |\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nvidia_smi.gpu_mig_frame_buffer_memory_usage | free, used, reserved | B |\n| nvidia_smi.gpu_mig_bar1_memory_usage | free, used | B |\n\n", "integration_type": "collector", "id": "go.d.plugin-nvidia_smi-Nvidia_GPU", "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml", diff --git a/src/go/plugin/go.d/modules/nvidia_smi/integrations/nvidia_gpu.md b/src/go/plugin/go.d/modules/nvidia_smi/integrations/nvidia_gpu.md index 4fcb9130b89aa1..2496ea89184e09 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/integrations/nvidia_gpu.md +++ b/src/go/plugin/go.d/modules/nvidia_smi/integrations/nvidia_gpu.md @@ -70,24 +70,24 @@ Labels: Metrics: -| Metric | Dimensions | Unit | XML | CSV | -|:------|:----------|:----|:---:|:---:| -| nvidia_smi.gpu_pcie_bandwidth_usage | rx, tx | B/s | • | | -| nvidia_smi.gpu_pcie_bandwidth_utilization | rx, tx | % | • | | -| nvidia_smi.gpu_fan_speed_perc | fan_speed | % | • | • | -| nvidia_smi.gpu_utilization | gpu | % | • | • | -| nvidia_smi.gpu_memory_utilization | memory | % | • | • | -| nvidia_smi.gpu_decoder_utilization | decoder | % | • | | -| nvidia_smi.gpu_encoder_utilization | encoder | % | • | | -| nvidia_smi.gpu_frame_buffer_memory_usage | free, used, reserved | B | • | • | -| nvidia_smi.gpu_bar1_memory_usage | free, used | B | • | | -| nvidia_smi.gpu_temperature | temperature | Celsius | • | • | -| nvidia_smi.gpu_voltage | voltage | V | • | | -| nvidia_smi.gpu_clock_freq | graphics, video, sm, mem | MHz | • | • | -| nvidia_smi.gpu_power_draw | power_draw | Watts | • | • | -| nvidia_smi.gpu_performance_state | P0-P15 | state | • | • | -| nvidia_smi.gpu_mig_mode_current_status | enabled, disabled | status | • | | -| nvidia_smi.gpu_mig_devices_count | mig | devices | • | | +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| nvidia_smi.gpu_pcie_bandwidth_usage | rx, tx | B/s | +| nvidia_smi.gpu_pcie_bandwidth_utilization | rx, tx | % | +| nvidia_smi.gpu_fan_speed_perc | fan_speed | % | +| nvidia_smi.gpu_utilization | gpu | % | +| nvidia_smi.gpu_memory_utilization | memory | % | +| nvidia_smi.gpu_decoder_utilization | decoder | % | +| nvidia_smi.gpu_encoder_utilization | encoder | % | +| nvidia_smi.gpu_frame_buffer_memory_usage | free, used, reserved | B | +| nvidia_smi.gpu_bar1_memory_usage | free, used | B | +| nvidia_smi.gpu_temperature | temperature | Celsius | +| nvidia_smi.gpu_voltage | voltage | V | +| nvidia_smi.gpu_clock_freq | graphics, video, sm, mem | MHz | +| nvidia_smi.gpu_power_draw | power_draw | Watts | +| nvidia_smi.gpu_performance_state | P0-P15 | state | +| nvidia_smi.gpu_mig_mode_current_status | enabled, disabled | status | +| nvidia_smi.gpu_mig_devices_count | mig | devices | ### Per mig @@ -103,10 +103,10 @@ Labels: Metrics: -| Metric | Dimensions | Unit | XML | CSV | -|:------|:----------|:----|:---:|:---:| -| nvidia_smi.gpu_mig_frame_buffer_memory_usage | free, used, reserved | B | • | | -| nvidia_smi.gpu_mig_bar1_memory_usage | free, used | B | • | | +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| nvidia_smi.gpu_mig_frame_buffer_memory_usage | free, used, reserved | B | +| nvidia_smi.gpu_mig_bar1_memory_usage | free, used | B | @@ -152,26 +152,11 @@ The following options can be defined globally: update_every, autodetection_retry | autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no | | binary_path | Path to nvidia_smi binary. The default is "nvidia_smi" and the executable is looked for in the directories specified in the PATH environment variable. | nvidia_smi | no | | timeout | nvidia_smi binary execution timeout. | 2 | no | -| use_csv_format | Used format when requesting GPU information. XML is used if set to 'no'. | no | no | #### Examples -##### CSV format - -Use CSV format when requesting GPU information. - -
Config - -```yaml -jobs: - - name: nvidia_smi - use_csv_format: yes - -``` -
- ##### Custom binary path The executable is not in the directories specified in the PATH environment variable. From 63631b495b3961acfcd7aba57173b54d627a9fdc Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Mon, 12 Aug 2024 13:47:43 +0300 Subject: [PATCH 13/27] go.d nvidia_smi: add loop mode (#18313) --- .../modules/nvidia_smi/config_schema.json | 9 + src/go/plugin/go.d/modules/nvidia_smi/exec.go | 179 +++++++++++++++++- src/go/plugin/go.d/modules/nvidia_smi/init.go | 2 +- .../go.d/modules/nvidia_smi/metadata.yaml | 4 + .../go.d/modules/nvidia_smi/nvidia_smi.go | 13 +- .../modules/nvidia_smi/nvidia_smi_test.go | 4 + .../modules/nvidia_smi/testdata/config.json | 3 +- .../modules/nvidia_smi/testdata/config.yaml | 1 + 8 files changed, 205 insertions(+), 10 deletions(-) diff --git a/src/go/plugin/go.d/modules/nvidia_smi/config_schema.json b/src/go/plugin/go.d/modules/nvidia_smi/config_schema.json index 823cd781804842..3f93badc247fec 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/config_schema.json +++ b/src/go/plugin/go.d/modules/nvidia_smi/config_schema.json @@ -23,6 +23,12 @@ "type": "number", "minimum": 0.5, "default": 10 + }, + "loop_mode": { + "title": "Loop Mode", + "description": "When enabled, `nvidia-smi` is executed continuously in a separate thread using the `-l` option.", + "type": "boolean", + "default": true } }, "required": [ @@ -42,6 +48,9 @@ }, "timeout": { "ui:help": "Accepts decimals for precise control (e.g., type 1.5 for 1.5 seconds)." + }, + "loop_mode": { + "ui:help": "In loop mode, `nvidia-smi` will repeatedly query GPU data at specified intervals, defined by the `-l SEC` or `--loop=SEC` parameter, rather than just running the query once. This enables ongoing performance tracking by putting the application to sleep between queries." } } } diff --git a/src/go/plugin/go.d/modules/nvidia_smi/exec.go b/src/go/plugin/go.d/modules/nvidia_smi/exec.go index 4acb3f2c00c62e..11a26131fd0823 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/exec.go +++ b/src/go/plugin/go.d/modules/nvidia_smi/exec.go @@ -3,9 +3,14 @@ package nvidia_smi import ( + "bufio" + "bytes" "context" + "errors" "fmt" "os/exec" + "strconv" + "sync" "time" "github.com/netdata/netdata/go/plugins/logger" @@ -13,14 +18,30 @@ import ( type nvidiaSmiBinary interface { queryGPUInfo() ([]byte, error) + stop() error } -func newNvidiaSmiExec(path string, cfg Config, log *logger.Logger) (*nvidiaSmiExec, error) { - return &nvidiaSmiExec{ - Logger: log, - binPath: path, - timeout: cfg.Timeout.Duration(), - }, nil +func newNvidiaSmiBinary(path string, cfg Config, log *logger.Logger) (nvidiaSmiBinary, error) { + if !cfg.LoopMode { + return &nvidiaSmiExec{ + Logger: log, + binPath: path, + timeout: cfg.Timeout.Duration(), + }, nil + } + + smi := &nvidiaSmiLoopExec{ + Logger: log, + binPath: path, + updateEvery: cfg.UpdateEvery, + firstSampleTimeout: time.Second * 3, + } + + if err := smi.run(); err != nil { + return nil, err + } + + return smi, nil } type nvidiaSmiExec struct { @@ -44,3 +65,149 @@ func (e *nvidiaSmiExec) queryGPUInfo() ([]byte, error) { return bs, nil } + +func (e *nvidiaSmiExec) stop() error { return nil } + +type nvidiaSmiLoopExec struct { + *logger.Logger + + binPath string + + updateEvery int + firstSampleTimeout time.Duration + + cmd *exec.Cmd + done chan struct{} + + mux sync.Mutex + lastSample string +} + +func (e *nvidiaSmiLoopExec) queryGPUInfo() ([]byte, error) { + select { + case <-e.done: + return nil, errors.New("process has already exited") + default: + } + + e.mux.Lock() + defer e.mux.Unlock() + + return []byte(e.lastSample), nil +} + +func (e *nvidiaSmiLoopExec) run() error { + secs := 5 + if e.updateEvery < secs { + secs = e.updateEvery + } + + cmd := exec.Command(e.binPath, "-q", "-x", "-l", strconv.Itoa(secs)) + + e.Debugf("executing '%s'", cmd) + + r, err := cmd.StdoutPipe() + if err != nil { + return err + } + + if err := cmd.Start(); err != nil { + return err + } + + firstSample := make(chan struct{}, 1) + done := make(chan struct{}) + e.cmd = cmd + e.done = done + + go func() { + defer close(done) + + var buf bytes.Buffer + var insideLog bool + var emptyRows int64 + var outsideLogRows int64 + + const unexpectedRowsLimit = 500 + + sc := bufio.NewScanner(r) + + for sc.Scan() { + line := sc.Text() + + if !insideLog { + outsideLogRows++ + } else { + outsideLogRows = 0 + } + + if line == "" { + emptyRows++ + } else { + emptyRows = 0 + } + + if outsideLogRows >= unexpectedRowsLimit || emptyRows >= unexpectedRowsLimit { + e.Errorf("unexpected output from nvidia-smi loop: outside log rows %d, empty rows %d", outsideLogRows, emptyRows) + break + } + + switch { + case line == "": + insideLog = true + buf.Reset() + + buf.WriteString(line) + buf.WriteByte('\n') + case line == "": + insideLog = false + + buf.WriteString(line) + + e.mux.Lock() + e.lastSample = buf.String() + e.mux.Unlock() + + buf.Reset() + + select { + case firstSample <- struct{}{}: + default: + } + case insideLog: + buf.WriteString(line) + buf.WriteByte('\n') + default: + continue + } + } + }() + + select { + case <-e.done: + _ = e.stop() + return errors.New("process exited before the first sample was collected") + case <-time.After(e.firstSampleTimeout): + _ = e.stop() + return errors.New("timed out waiting for first sample") + case <-firstSample: + return nil + } +} + +func (e *nvidiaSmiLoopExec) stop() error { + if e.cmd == nil || e.cmd.Process == nil { + return nil + } + + _ = e.cmd.Process.Kill() + _ = e.cmd.Wait() + e.cmd = nil + + select { + case <-e.done: + return nil + case <-time.After(time.Second * 2): + return errors.New("timed out waiting for process to exit") + } +} diff --git a/src/go/plugin/go.d/modules/nvidia_smi/init.go b/src/go/plugin/go.d/modules/nvidia_smi/init.go index 471cfe733ada27..c13b2fffdf0720 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/init.go +++ b/src/go/plugin/go.d/modules/nvidia_smi/init.go @@ -18,5 +18,5 @@ func (nv *NvidiaSmi) initNvidiaSmiExec() (nvidiaSmiBinary, error) { binPath = path } - return newNvidiaSmiExec(binPath, nv.Config, nv.Logger) + return newNvidiaSmiBinary(binPath, nv.Config, nv.Logger) } diff --git a/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml b/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml index d35716284a8045..2ff35af5319c34 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml +++ b/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml @@ -73,6 +73,10 @@ modules: description: nvidia_smi binary execution timeout. default_value: 2 required: false + - name: loop_mode + description: "When enabled, `nvidia-smi` is executed continuously in a separate thread using the `-l` option." + default_value: true + required: false examples: folding: title: Config diff --git a/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go b/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go index 66872ce77784e5..0c58db62204307 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go +++ b/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go @@ -29,7 +29,8 @@ func init() { func New() *NvidiaSmi { return &NvidiaSmi{ Config: Config{ - Timeout: web.Duration(time.Second * 10), + Timeout: web.Duration(time.Second * 10), + LoopMode: true, }, binName: "nvidia-smi", charts: &module.Charts{}, @@ -43,6 +44,7 @@ type Config struct { UpdateEvery int `yaml:"update_every,omitempty" json:"update_every"` Timeout web.Duration `yaml:"timeout,omitempty" json:"timeout"` BinaryPath string `yaml:"binary_path" json:"binary_path"` + LoopMode bool `yaml:"loop_mode,omitempty" json:"loop_mode"` } type NvidiaSmi struct { @@ -103,4 +105,11 @@ func (nv *NvidiaSmi) Collect() map[string]int64 { return mx } -func (nv *NvidiaSmi) Cleanup() {} +func (nv *NvidiaSmi) Cleanup() { + if nv.exec != nil { + if err := nv.exec.stop(); err != nil { + nv.Errorf("cleanup: %v", err) + } + nv.exec = nil + } +} diff --git a/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi_test.go b/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi_test.go index f93279e19cdbe8..d2070b06911ede 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi_test.go +++ b/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi_test.go @@ -418,6 +418,10 @@ func (m *mockNvidiaSmi) queryGPUInfo() ([]byte, error) { return m.gpuInfo, nil } +func (m *mockNvidiaSmi) stop() error { + return nil +} + func prepareCaseMIGA100(nv *NvidiaSmi) { nv.exec = &mockNvidiaSmi{gpuInfo: dataXMLA100SXM4MIG} } diff --git a/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.json b/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.json index 09571319348b46..6ff795390516bc 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.json +++ b/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.json @@ -1,5 +1,6 @@ { "update_every": 123, "timeout": 123.123, - "binary_path": "ok" + "binary_path": "ok", + "loop_mode": true } diff --git a/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.yaml b/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.yaml index baf3bcd0b0fab0..1f2fedef5674d5 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.yaml +++ b/src/go/plugin/go.d/modules/nvidia_smi/testdata/config.yaml @@ -1,3 +1,4 @@ update_every: 123 timeout: 123.123 binary_path: "ok" +loop_mode: true From 2cb6079ca21160cd1a8e8d18211a107c6110a3cb Mon Sep 17 00:00:00 2001 From: "Austin S. Hemmelgarn" Date: Mon, 12 Aug 2024 06:48:39 -0400 Subject: [PATCH 14/27] Use system certificate configuration for Yum/DNF repos. (#18293) --- packaging/repoconfig/deb.changelog | 6 ++++++ packaging/repoconfig/netdata.repo.dnf | 2 -- packaging/repoconfig/rpm.changelog | 2 ++ 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/packaging/repoconfig/deb.changelog b/packaging/repoconfig/deb.changelog index d7d25054e29968..fc1932555bc08c 100644 --- a/packaging/repoconfig/deb.changelog +++ b/packaging/repoconfig/deb.changelog @@ -1,3 +1,9 @@ +@PKG_NAME@ (3-3) unstable; urgency=medium + + * Version bump to keep in sync with RPM repo packages + + -- Netdata Builder Fri, 9 Aug 2024 09:37:00 -0400 + @PKG_NAME@ (3-2) unstable; urgency=medium * Version bump to keep in sync with RPM repo packages diff --git a/packaging/repoconfig/netdata.repo.dnf b/packaging/repoconfig/netdata.repo.dnf index a8ab94a03f77ef..3a64a2a58e8ca3 100644 --- a/packaging/repoconfig/netdata.repo.dnf +++ b/packaging/repoconfig/netdata.repo.dnf @@ -6,7 +6,6 @@ gpgcheck=1 gpgkey=https://repo.netdata.cloud/netdatabot.gpg.key enabled=1 sslverify=1 -sslcacert=/etc/pki/tls/certs/ca-bundle.crt priority=50 [netdata-repoconfig] @@ -17,5 +16,4 @@ gpgcheck=1 gpgkey=https://repo.netdata.cloud/netdatabot.gpg.key enabled=1 sslverify=1 -sslcacert=/etc/pki/tls/certs/ca-bundle.crt priority=50 diff --git a/packaging/repoconfig/rpm.changelog b/packaging/repoconfig/rpm.changelog index 7cc14dcce102e0..559385297ce6d8 100644 --- a/packaging/repoconfig/rpm.changelog +++ b/packaging/repoconfig/rpm.changelog @@ -1,3 +1,5 @@ +* Fri Aug 9 2024 Austin Hemmelgarn 3-3 +- Use system certificate config for Yum/DNF repos. * Mon Jun 24 2024 Austin Hemmelgarn 3-2 - Fix package file names. * Fri Jun 14 2024 Austin Hemmelgarn 3-1 From e3ec4e894cd755a0998a4bf7e8d4dbbc0df382ac Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Mon, 12 Aug 2024 15:07:18 +0300 Subject: [PATCH 15/27] go.d nvidia_smi: enable by default (#18315) * go.d nvidia_smi: enable by default * update meta --- src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml | 7 +------ src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go | 1 - 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml b/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml index 2ff35af5319c34..2a79b5ac139a18 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml +++ b/src/go/plugin/go.d/modules/nvidia_smi/metadata.yaml @@ -25,8 +25,6 @@ modules: metrics_description: | This collector monitors GPUs performance metrics using the [nvidia-smi](https://developer.nvidia.com/nvidia-system-management-interface) CLI tool. - - > **Warning**: under development, [loop mode](https://github.com/netdata/netdata/issues/14522) not implemented yet. method_description: "" supported_platforms: include: [] @@ -43,10 +41,7 @@ modules: description: "" setup: prerequisites: - list: - - title: Enable in go.d.conf. - description: | - This collector is disabled by default. You need to explicitly enable it in the `go.d.conf` file. + list: [] configuration: file: name: go.d/nvidia_smi.conf diff --git a/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go b/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go index 0c58db62204307..3f89df05a94cbd 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go +++ b/src/go/plugin/go.d/modules/nvidia_smi/nvidia_smi.go @@ -18,7 +18,6 @@ func init() { module.Register("nvidia_smi", module.Creator{ JobConfigSchema: configSchema, Defaults: module.Defaults{ - Disabled: true, UpdateEvery: 10, }, Create: func() module.Module { return New() }, From a36af8a3457849b238e5e0e833f92b6090448fd3 Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Mon, 12 Aug 2024 15:07:45 +0300 Subject: [PATCH 16/27] remove python.d/nvidia_smi (#18316) * remove python.d/nvidia_smi * update python.d.conf --- CMakeLists.txt | 2 - .../python.d.plugin/nvidia_smi/README.md | 81 --- .../python.d.plugin/nvidia_smi/metadata.yaml | 166 ----- .../nvidia_smi/nvidia_smi.chart.py | 651 ------------------ .../nvidia_smi/nvidia_smi.conf | 68 -- src/collectors/python.d.plugin/python.d.conf | 2 +- 6 files changed, 1 insertion(+), 969 deletions(-) delete mode 100644 src/collectors/python.d.plugin/nvidia_smi/README.md delete mode 100644 src/collectors/python.d.plugin/nvidia_smi/metadata.yaml delete mode 100644 src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.chart.py delete mode 100644 src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.conf diff --git a/CMakeLists.txt b/CMakeLists.txt index 3eb90ea75e97be..eb11b57fb57002 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -2785,7 +2785,6 @@ install(FILES src/collectors/python.d.plugin/go_expvar/go_expvar.conf src/collectors/python.d.plugin/haproxy/haproxy.conf src/collectors/python.d.plugin/monit/monit.conf - src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.conf src/collectors/python.d.plugin/openldap/openldap.conf src/collectors/python.d.plugin/oracledb/oracledb.conf src/collectors/python.d.plugin/pandas/pandas.conf @@ -2813,7 +2812,6 @@ install(FILES src/collectors/python.d.plugin/go_expvar/go_expvar.chart.py src/collectors/python.d.plugin/haproxy/haproxy.chart.py src/collectors/python.d.plugin/monit/monit.chart.py - src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.chart.py src/collectors/python.d.plugin/openldap/openldap.chart.py src/collectors/python.d.plugin/oracledb/oracledb.chart.py src/collectors/python.d.plugin/pandas/pandas.chart.py diff --git a/src/collectors/python.d.plugin/nvidia_smi/README.md b/src/collectors/python.d.plugin/nvidia_smi/README.md deleted file mode 100644 index 240b65af3219df..00000000000000 --- a/src/collectors/python.d.plugin/nvidia_smi/README.md +++ /dev/null @@ -1,81 +0,0 @@ - - -# Nvidia GPU collector - -Monitors performance metrics (memory usage, fan speed, pcie bandwidth utilization, temperature, etc.) using `nvidia-smi` cli tool. - -## Requirements - -- The `nvidia-smi` tool installed and your NVIDIA GPU(s) must support the tool. Mostly the newer high end models used for AI / ML and Crypto or Pro range, read more about [nvidia_smi](https://developer.nvidia.com/nvidia-system-management-interface). -- Enable this plugin, as it's disabled by default due to minor performance issues: - ```bash - cd /etc/netdata # Replace this path with your Netdata config directory, if different - sudo ./edit-config python.d.conf - ``` - Remove the '#' before nvidia_smi so it reads: `nvidia_smi: yes`. -- On some systems when the GPU is idle the `nvidia-smi` tool unloads and there is added latency again when it is next queried. If you are running GPUs under constant workload this isn't likely to be an issue. - -If using Docker, see [Netdata Docker container with NVIDIA GPUs monitoring](https://github.com/netdata/netdata/tree/master/packaging/docker#with-nvidia-gpus-monitoring). - -## Charts - -It produces the following charts: - -- PCI Express Bandwidth Utilization in `KiB/s` -- Fan Speed in `percentage` -- GPU Utilization in `percentage` -- Memory Bandwidth Utilization in `percentage` -- Encoder/Decoder Utilization in `percentage` -- Memory Usage in `MiB` -- Temperature in `celsius` -- Clock Frequencies in `MHz` -- Power Utilization in `Watts` -- Memory Used by Each Process in `MiB` -- Memory Used by Each User in `MiB` -- Number of User on GPU in `num` - -## Configuration - -Edit the `python.d/nvidia_smi.conf` configuration file using `edit-config` from the Netdata [config -directory](/docs/netdata-agent/configuration/README.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/nvidia_smi.conf -``` - -Sample: - -```yaml -loop_mode : yes -poll_seconds : 1 -exclude_zero_memory_users : yes -``` - - -### Troubleshooting - -To troubleshoot issues with the `nvidia_smi` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `nvidia_smi` module in debug mode: - -```bash -./python.d.plugin nvidia_smi debug trace -``` diff --git a/src/collectors/python.d.plugin/nvidia_smi/metadata.yaml b/src/collectors/python.d.plugin/nvidia_smi/metadata.yaml deleted file mode 100644 index 2ffdcadaf63cd7..00000000000000 --- a/src/collectors/python.d.plugin/nvidia_smi/metadata.yaml +++ /dev/null @@ -1,166 +0,0 @@ -# This collector will not appear in documentation, as the go version is preferred, -# /src/go/plugin/go.d/modules/nvidia_smi/README.md -# -# meta: -# plugin_name: python.d.plugin -# module_name: nvidia_smi -# monitored_instance: -# name: python.d nvidia_smi -# link: '' -# categories: [] -# icon_filename: '' -# related_resources: -# integrations: -# list: [] -# info_provided_to_referring_integrations: -# description: '' -# keywords: [] -# most_popular: false -# overview: -# data_collection: -# metrics_description: '' -# method_description: '' -# supported_platforms: -# include: [] -# exclude: [] -# multi_instance: true -# additional_permissions: -# description: '' -# default_behavior: -# auto_detection: -# description: '' -# limits: -# description: '' -# performance_impact: -# description: '' -# setup: -# prerequisites: -# list: [] -# configuration: -# file: -# name: '' -# description: '' -# options: -# description: '' -# folding: -# title: '' -# enabled: true -# list: [] -# examples: -# folding: -# enabled: true -# title: '' -# list: [] -# troubleshooting: -# problems: -# list: [] -# alerts: [] -# metrics: -# folding: -# title: Metrics -# enabled: false -# description: "" -# availability: [] -# scopes: -# - name: GPU -# description: "" -# labels: [] -# metrics: -# - name: nvidia_smi.pci_bandwidth -# description: PCI Express Bandwidth Utilization -# unit: "KiB/s" -# chart_type: area -# dimensions: -# - name: rx -# - name: tx -# - name: nvidia_smi.pci_bandwidth_percent -# description: PCI Express Bandwidth Percent -# unit: "percentage" -# chart_type: area -# dimensions: -# - name: rx_percent -# - name: tx_percent -# - name: nvidia_smi.fan_speed -# description: Fan Speed -# unit: "percentage" -# chart_type: line -# dimensions: -# - name: speed -# - name: nvidia_smi.gpu_utilization -# description: GPU Utilization -# unit: "percentage" -# chart_type: line -# dimensions: -# - name: utilization -# - name: nvidia_smi.mem_utilization -# description: Memory Bandwidth Utilization -# unit: "percentage" -# chart_type: line -# dimensions: -# - name: utilization -# - name: nvidia_smi.encoder_utilization -# description: Encoder/Decoder Utilization -# unit: "percentage" -# chart_type: line -# dimensions: -# - name: encoder -# - name: decoder -# - name: nvidia_smi.memory_allocated -# description: Memory Usage -# unit: "MiB" -# chart_type: stacked -# dimensions: -# - name: free -# - name: used -# - name: nvidia_smi.bar1_memory_usage -# description: Bar1 Memory Usage -# unit: "MiB" -# chart_type: stacked -# dimensions: -# - name: free -# - name: used -# - name: nvidia_smi.temperature -# description: Temperature -# unit: "celsius" -# chart_type: line -# dimensions: -# - name: temp -# - name: nvidia_smi.clocks -# description: Clock Frequencies -# unit: "MHz" -# chart_type: line -# dimensions: -# - name: graphics -# - name: video -# - name: sm -# - name: mem -# - name: nvidia_smi.power -# description: Power Utilization -# unit: "Watts" -# chart_type: line -# dimensions: -# - name: power -# - name: nvidia_smi.power_state -# description: Power State -# unit: "state" -# chart_type: line -# dimensions: -# - name: a dimension per {power_state} -# - name: nvidia_smi.processes_mem -# description: Memory Used by Each Process -# unit: "MiB" -# chart_type: stacked -# dimensions: -# - name: a dimension per process -# - name: nvidia_smi.user_mem -# description: Memory Used by Each User -# unit: "MiB" -# chart_type: stacked -# dimensions: -# - name: a dimension per user -# - name: nvidia_smi.user_num -# description: Number of User on GPU -# unit: "num" -# chart_type: line -# dimensions: -# - name: users diff --git a/src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.chart.py b/src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.chart.py deleted file mode 100644 index 556a61435711c2..00000000000000 --- a/src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.chart.py +++ /dev/null @@ -1,651 +0,0 @@ -# -*- coding: utf-8 -*- -# Description: nvidia-smi netdata python.d module -# Original Author: Steven Noonan (tycho) -# Author: Ilya Mashchenko (ilyam8) -# User Memory Stat Author: Guido Scatena (scatenag) - -import os -import pwd -import subprocess -import threading -import xml.etree.ElementTree as et - -from bases.FrameworkServices.SimpleService import SimpleService -from bases.collection import find_binary - -disabled_by_default = True - -NVIDIA_SMI = 'nvidia-smi' - -NOT_AVAILABLE = 'N/A' - -EMPTY_ROW = '' -EMPTY_ROW_LIMIT = 500 -POLLER_BREAK_ROW = '' - -PCI_BANDWIDTH = 'pci_bandwidth' -PCI_BANDWIDTH_PERCENT = 'pci_bandwidth_percent' -FAN_SPEED = 'fan_speed' -GPU_UTIL = 'gpu_utilization' -MEM_UTIL = 'mem_utilization' -ENCODER_UTIL = 'encoder_utilization' -MEM_USAGE = 'mem_usage' -BAR_USAGE = 'bar1_mem_usage' -TEMPERATURE = 'temperature' -CLOCKS = 'clocks' -POWER = 'power' -POWER_STATE = 'power_state' -PROCESSES_MEM = 'processes_mem' -USER_MEM = 'user_mem' -USER_NUM = 'user_num' - -ORDER = [ - PCI_BANDWIDTH, - PCI_BANDWIDTH_PERCENT, - FAN_SPEED, - GPU_UTIL, - MEM_UTIL, - ENCODER_UTIL, - MEM_USAGE, - BAR_USAGE, - TEMPERATURE, - CLOCKS, - POWER, - POWER_STATE, - PROCESSES_MEM, - USER_MEM, - USER_NUM, -] - -# https://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__gpupstate.html -POWER_STATES = ['P' + str(i) for i in range(0, 16)] - -# PCI Transfer data rate in gigabits per second (Gb/s) per generation -PCI_SPEED = { - "1": 2.5, - "2": 5, - "3": 8, - "4": 16, - "5": 32 -} -# PCI encoding per generation -PCI_ENCODING = { - "1": 2 / 10, - "2": 2 / 10, - "3": 2 / 130, - "4": 2 / 130, - "5": 2 / 130 -} - - -def gpu_charts(gpu): - fam = gpu.full_name() - - charts = { - PCI_BANDWIDTH: { - 'options': [None, 'PCI Express Bandwidth Utilization', 'KiB/s', fam, 'nvidia_smi.pci_bandwidth', 'area'], - 'lines': [ - ['rx_util', 'rx', 'absolute', 1, 1], - ['tx_util', 'tx', 'absolute', 1, -1], - ] - }, - PCI_BANDWIDTH_PERCENT: { - 'options': [None, 'PCI Express Bandwidth Percent', 'percentage', fam, 'nvidia_smi.pci_bandwidth_percent', - 'area'], - 'lines': [ - ['rx_util_percent', 'rx_percent'], - ['tx_util_percent', 'tx_percent'], - ] - }, - FAN_SPEED: { - 'options': [None, 'Fan Speed', 'percentage', fam, 'nvidia_smi.fan_speed', 'line'], - 'lines': [ - ['fan_speed', 'speed'], - ] - }, - GPU_UTIL: { - 'options': [None, 'GPU Utilization', 'percentage', fam, 'nvidia_smi.gpu_utilization', 'line'], - 'lines': [ - ['gpu_util', 'utilization'], - ] - }, - MEM_UTIL: { - 'options': [None, 'Memory Bandwidth Utilization', 'percentage', fam, 'nvidia_smi.mem_utilization', 'line'], - 'lines': [ - ['memory_util', 'utilization'], - ] - }, - ENCODER_UTIL: { - 'options': [None, 'Encoder/Decoder Utilization', 'percentage', fam, 'nvidia_smi.encoder_utilization', - 'line'], - 'lines': [ - ['encoder_util', 'encoder'], - ['decoder_util', 'decoder'], - ] - }, - MEM_USAGE: { - 'options': [None, 'Memory Usage', 'MiB', fam, 'nvidia_smi.memory_allocated', 'stacked'], - 'lines': [ - ['fb_memory_free', 'free'], - ['fb_memory_used', 'used'], - ] - }, - BAR_USAGE: { - 'options': [None, 'Bar1 Memory Usage', 'MiB', fam, 'nvidia_smi.bar1_memory_usage', 'stacked'], - 'lines': [ - ['bar1_memory_free', 'free'], - ['bar1_memory_used', 'used'], - ] - }, - TEMPERATURE: { - 'options': [None, 'Temperature', 'celsius', fam, 'nvidia_smi.temperature', 'line'], - 'lines': [ - ['gpu_temp', 'temp'], - ] - }, - CLOCKS: { - 'options': [None, 'Clock Frequencies', 'MHz', fam, 'nvidia_smi.clocks', 'line'], - 'lines': [ - ['graphics_clock', 'graphics'], - ['video_clock', 'video'], - ['sm_clock', 'sm'], - ['mem_clock', 'mem'], - ] - }, - POWER: { - 'options': [None, 'Power Utilization', 'Watts', fam, 'nvidia_smi.power', 'line'], - 'lines': [ - ['power_draw', 'power', 'absolute', 1, 100], - ] - }, - POWER_STATE: { - 'options': [None, 'Power State', 'state', fam, 'nvidia_smi.power_state', 'line'], - 'lines': [['power_state_' + v.lower(), v, 'absolute'] for v in POWER_STATES] - }, - PROCESSES_MEM: { - 'options': [None, 'Memory Used by Each Process', 'MiB', fam, 'nvidia_smi.processes_mem', 'stacked'], - 'lines': [] - }, - USER_MEM: { - 'options': [None, 'Memory Used by Each User', 'MiB', fam, 'nvidia_smi.user_mem', 'stacked'], - 'lines': [] - }, - USER_NUM: { - 'options': [None, 'Number of User on GPU', 'num', fam, 'nvidia_smi.user_num', 'line'], - 'lines': [ - ['user_num', 'users'], - ] - }, - } - - idx = gpu.num - - order = ['gpu{0}_{1}'.format(idx, v) for v in ORDER] - charts = dict(('gpu{0}_{1}'.format(idx, k), v) for k, v in charts.items()) - - for chart in charts.values(): - for line in chart['lines']: - line[0] = 'gpu{0}_{1}'.format(idx, line[0]) - - return order, charts - - -class NvidiaSMI: - def __init__(self): - self.command = find_binary(NVIDIA_SMI) - self.active_proc = None - - def run_once(self): - proc = subprocess.Popen([self.command, '-x', '-q'], stdout=subprocess.PIPE) - stdout, _ = proc.communicate() - return stdout - - def run_loop(self, interval): - if self.active_proc: - self.kill() - proc = subprocess.Popen([self.command, '-x', '-q', '-l', str(interval)], stdout=subprocess.PIPE) - self.active_proc = proc - return proc.stdout - - def kill(self): - if self.active_proc: - self.active_proc.kill() - self.active_proc = None - - -class NvidiaSMIPoller(threading.Thread): - def __init__(self, poll_interval): - threading.Thread.__init__(self) - self.daemon = True - - self.smi = NvidiaSMI() - self.interval = poll_interval - - self.lock = threading.RLock() - self.last_data = str() - self.exit = False - self.empty_rows = 0 - self.rows = list() - - def has_smi(self): - return bool(self.smi.command) - - def run_once(self): - return self.smi.run_once() - - def run(self): - out = self.smi.run_loop(self.interval) - - for row in out: - if self.exit or self.empty_rows > EMPTY_ROW_LIMIT: - break - self.process_row(row) - self.smi.kill() - - def process_row(self, row): - row = row.decode() - self.empty_rows += (row == EMPTY_ROW) - self.rows.append(row) - - if POLLER_BREAK_ROW in row: - self.lock.acquire() - self.last_data = '\n'.join(self.rows) - self.lock.release() - - self.rows = list() - self.empty_rows = 0 - - def is_started(self): - return self.ident is not None - - def shutdown(self): - self.exit = True - - def data(self): - self.lock.acquire() - data = self.last_data - self.lock.release() - return data - - -def handle_attr_error(method): - def on_call(*args, **kwargs): - try: - return method(*args, **kwargs) - except AttributeError: - return None - - return on_call - - -def handle_value_error(method): - def on_call(*args, **kwargs): - try: - return method(*args, **kwargs) - except ValueError: - return None - - return on_call - - -HOST_PREFIX = os.getenv('NETDATA_HOST_PREFIX') -ETC_PASSWD_PATH = '/etc/passwd' -PROC_PATH = '/proc' - -IS_INSIDE_DOCKER = False - -if HOST_PREFIX: - ETC_PASSWD_PATH = os.path.join(HOST_PREFIX, ETC_PASSWD_PATH[1:]) - PROC_PATH = os.path.join(HOST_PREFIX, PROC_PATH[1:]) - IS_INSIDE_DOCKER = True - - -def read_passwd_file(): - data = dict() - with open(ETC_PASSWD_PATH, 'r') as f: - for line in f: - line = line.strip() - if line.startswith("#"): - continue - fields = line.split(":") - # name, passwd, uid, gid, comment, home_dir, shell - if len(fields) != 7: - continue - # uid, guid - fields[2], fields[3] = int(fields[2]), int(fields[3]) - data[fields[2]] = fields - return data - - -def read_passwd_file_safe(): - try: - if IS_INSIDE_DOCKER: - return read_passwd_file() - return dict((k[2], k) for k in pwd.getpwall()) - except (OSError, IOError): - return dict() - - -def get_username_by_pid_safe(pid, passwd_file): - path = os.path.join(PROC_PATH, pid) - try: - uid = os.stat(path).st_uid - except (OSError, IOError): - return '' - try: - if IS_INSIDE_DOCKER: - return passwd_file[uid][0] - return pwd.getpwuid(uid)[0] - except KeyError: - return str(uid) - - -class GPU: - def __init__(self, num, root, exclude_zero_memory_users=False): - self.num = num - self.root = root - self.exclude_zero_memory_users = exclude_zero_memory_users - - def id(self): - return self.root.get('id') - - def name(self): - return self.root.find('product_name').text - - def full_name(self): - return 'gpu{0} {1}'.format(self.num, self.name()) - - @handle_attr_error - def pci_link_gen(self): - return self.root.find('pci').find('pci_gpu_link_info').find('pcie_gen').find('max_link_gen').text - - @handle_attr_error - def pci_link_width(self): - info = self.root.find('pci').find('pci_gpu_link_info') - return info.find('link_widths').find('max_link_width').text.split('x')[0] - - def pci_bw_max(self): - link_gen = self.pci_link_gen() - link_width = int(self.pci_link_width()) - if link_gen not in PCI_SPEED or link_gen not in PCI_ENCODING or not link_width: - return None - # Maximum PCIe Bandwidth = SPEED * WIDTH * (1 - ENCODING) - 1Gb/s. - # see details https://enterprise-support.nvidia.com/s/article/understanding-pcie-configuration-for-maximum-performance - # return max bandwidth in kilobytes per second (kB/s) - return (PCI_SPEED[link_gen] * link_width * (1 - PCI_ENCODING[link_gen]) - 1) * 1000 * 1000 / 8 - - @handle_attr_error - def rx_util(self): - return self.root.find('pci').find('rx_util').text.split()[0] - - @handle_attr_error - def tx_util(self): - return self.root.find('pci').find('tx_util').text.split()[0] - - @handle_attr_error - def fan_speed(self): - return self.root.find('fan_speed').text.split()[0] - - @handle_attr_error - def gpu_util(self): - return self.root.find('utilization').find('gpu_util').text.split()[0] - - @handle_attr_error - def memory_util(self): - return self.root.find('utilization').find('memory_util').text.split()[0] - - @handle_attr_error - def encoder_util(self): - return self.root.find('utilization').find('encoder_util').text.split()[0] - - @handle_attr_error - def decoder_util(self): - return self.root.find('utilization').find('decoder_util').text.split()[0] - - @handle_attr_error - def fb_memory_used(self): - return self.root.find('fb_memory_usage').find('used').text.split()[0] - - @handle_attr_error - def fb_memory_free(self): - return self.root.find('fb_memory_usage').find('free').text.split()[0] - - @handle_attr_error - def bar1_memory_used(self): - return self.root.find('bar1_memory_usage').find('used').text.split()[0] - - @handle_attr_error - def bar1_memory_free(self): - return self.root.find('bar1_memory_usage').find('free').text.split()[0] - - @handle_attr_error - def temperature(self): - return self.root.find('temperature').find('gpu_temp').text.split()[0] - - @handle_attr_error - def graphics_clock(self): - return self.root.find('clocks').find('graphics_clock').text.split()[0] - - @handle_attr_error - def video_clock(self): - return self.root.find('clocks').find('video_clock').text.split()[0] - - @handle_attr_error - def sm_clock(self): - return self.root.find('clocks').find('sm_clock').text.split()[0] - - @handle_attr_error - def mem_clock(self): - return self.root.find('clocks').find('mem_clock').text.split()[0] - - @handle_attr_error - def power_readings(self): - elem = self.root.find('power_readings') - return elem if elem else self.root.find('gpu_power_readings') - - @handle_attr_error - def power_state(self): - return str(self.power_readings().find('power_state').text.split()[0]) - - @handle_value_error - @handle_attr_error - def power_draw(self): - return float(self.power_readings().find('power_draw').text.split()[0]) * 100 - - @handle_attr_error - def processes(self): - processes_info = self.root.find('processes').findall('process_info') - if not processes_info: - return list() - - passwd_file = read_passwd_file_safe() - processes = list() - - for info in processes_info: - pid = info.find('pid').text - processes.append({ - 'pid': int(pid), - 'process_name': info.find('process_name').text, - 'used_memory': int(info.find('used_memory').text.split()[0]), - 'username': get_username_by_pid_safe(pid, passwd_file), - }) - return processes - - def data(self): - data = { - 'rx_util': self.rx_util(), - 'tx_util': self.tx_util(), - 'fan_speed': self.fan_speed(), - 'gpu_util': self.gpu_util(), - 'memory_util': self.memory_util(), - 'encoder_util': self.encoder_util(), - 'decoder_util': self.decoder_util(), - 'fb_memory_used': self.fb_memory_used(), - 'fb_memory_free': self.fb_memory_free(), - 'bar1_memory_used': self.bar1_memory_used(), - 'bar1_memory_free': self.bar1_memory_free(), - 'gpu_temp': self.temperature(), - 'graphics_clock': self.graphics_clock(), - 'video_clock': self.video_clock(), - 'sm_clock': self.sm_clock(), - 'mem_clock': self.mem_clock(), - 'power_draw': self.power_draw(), - } - - if self.rx_util() != NOT_AVAILABLE and self.tx_util() != NOT_AVAILABLE: - pci_bw_max = self.pci_bw_max() - if not pci_bw_max: - data['rx_util_percent'] = 0 - data['tx_util_percent'] = 0 - else: - data['rx_util_percent'] = str(int(int(self.rx_util()) * 100 / self.pci_bw_max())) - data['tx_util_percent'] = str(int(int(self.tx_util()) * 100 / self.pci_bw_max())) - - for v in POWER_STATES: - data['power_state_' + v.lower()] = 0 - p_state = self.power_state() - if p_state: - data['power_state_' + p_state.lower()] = 1 - - processes = self.processes() or [] - users = set() - for p in processes: - data['process_mem_{0}'.format(p['pid'])] = p['used_memory'] - if p['username']: - if self.exclude_zero_memory_users and p['used_memory'] == 0: - continue - users.add(p['username']) - key = 'user_mem_{0}'.format(p['username']) - if key in data: - data[key] += p['used_memory'] - else: - data[key] = p['used_memory'] - data['user_num'] = len(users) - - return dict(('gpu{0}_{1}'.format(self.num, k), v) for k, v in data.items()) - - -class Service(SimpleService): - def __init__(self, configuration=None, name=None): - super(Service, self).__init__(configuration=configuration, name=name) - self.order = list() - self.definitions = dict() - self.loop_mode = configuration.get('loop_mode', True) - poll = int(configuration.get('poll_seconds', self.get_update_every())) - self.exclude_zero_memory_users = configuration.get('exclude_zero_memory_users', False) - self.poller = NvidiaSMIPoller(poll) - - def get_data_loop_mode(self): - if not self.poller.is_started(): - self.poller.start() - - if not self.poller.is_alive(): - self.debug('poller is off') - return None - - return self.poller.data() - - def get_data_normal_mode(self): - return self.poller.run_once() - - def get_data(self): - if self.loop_mode: - last_data = self.get_data_loop_mode() - else: - last_data = self.get_data_normal_mode() - - if not last_data: - return None - - parsed = self.parse_xml(last_data) - if parsed is None: - return None - - data = dict() - for idx, root in enumerate(parsed.findall('gpu')): - gpu = GPU(idx, root, self.exclude_zero_memory_users) - gpu_data = gpu.data() - # self.debug(gpu_data) - gpu_data = dict((k, v) for k, v in gpu_data.items() if is_gpu_data_value_valid(v)) - data.update(gpu_data) - self.update_processes_mem_chart(gpu) - self.update_processes_user_mem_chart(gpu) - - return data or None - - def update_processes_mem_chart(self, gpu): - ps = gpu.processes() - if not ps: - return - chart = self.charts['gpu{0}_{1}'.format(gpu.num, PROCESSES_MEM)] - active_dim_ids = [] - for p in ps: - dim_id = 'gpu{0}_process_mem_{1}'.format(gpu.num, p['pid']) - active_dim_ids.append(dim_id) - if dim_id not in chart: - chart.add_dimension([dim_id, '{0} {1}'.format(p['pid'], p['process_name'])]) - for dim in chart: - if dim.id not in active_dim_ids: - chart.del_dimension(dim.id, hide=False) - - def update_processes_user_mem_chart(self, gpu): - ps = gpu.processes() - if not ps: - return - chart = self.charts['gpu{0}_{1}'.format(gpu.num, USER_MEM)] - active_dim_ids = [] - for p in ps: - if not p.get('username'): - continue - dim_id = 'gpu{0}_user_mem_{1}'.format(gpu.num, p['username']) - active_dim_ids.append(dim_id) - if dim_id not in chart: - chart.add_dimension([dim_id, '{0}'.format(p['username'])]) - - for dim in chart: - if dim.id not in active_dim_ids: - chart.del_dimension(dim.id, hide=False) - - def check(self): - if not self.poller.has_smi(): - self.error("couldn't find '{0}' binary".format(NVIDIA_SMI)) - return False - - raw_data = self.poller.run_once() - if not raw_data: - self.error("failed to invoke '{0}' binary".format(NVIDIA_SMI)) - return False - - parsed = self.parse_xml(raw_data) - if parsed is None: - return False - - gpus = parsed.findall('gpu') - if not gpus: - return False - - self.create_charts(gpus) - - return True - - def parse_xml(self, data): - try: - return et.fromstring(data) - except et.ParseError as error: - self.error('xml parse failed: "{0}", error: {1}'.format(data, error)) - - return None - - def create_charts(self, gpus): - for idx, root in enumerate(gpus): - order, charts = gpu_charts(GPU(idx, root)) - self.order.extend(order) - self.definitions.update(charts) - - -def is_gpu_data_value_valid(value): - try: - int(value) - except (TypeError, ValueError): - return False - return True diff --git a/src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.conf b/src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.conf deleted file mode 100644 index 3d2a30d4125cc0..00000000000000 --- a/src/collectors/python.d.plugin/nvidia_smi/nvidia_smi.conf +++ /dev/null @@ -1,68 +0,0 @@ -# netdata python.d.plugin configuration for nvidia_smi -# -# This file is in YaML format. Generally the format is: -# -# name: value -# -# There are 2 sections: -# - global variables -# - one or more JOBS -# -# JOBS allow you to collect values from multiple sources. -# Each source will have its own set of charts. -# -# JOB parameters have to be indented (using spaces only, example below). - -# ---------------------------------------------------------------------- -# Global Variables -# These variables set the defaults for all JOBs, however each JOB -# may define its own, overriding the defaults. - -# update_every sets the default data collection frequency. -# If unset, the python.d.plugin default is used. -# update_every: 1 - -# priority controls the order of charts at the netdata dashboard. -# Lower numbers move the charts towards the top of the page. -# If unset, the default for python.d.plugin is used. -# priority: 60000 - -# penalty indicates whether to apply penalty to update_every in case of failures. -# Penalty will increase every 5 failed updates in a row. Maximum penalty is 10 minutes. -# penalty: yes - -# autodetection_retry sets the job re-check interval in seconds. -# The job is not deleted if check fails. -# Attempts to start the job are made once every autodetection_retry. -# This feature is disabled by default. -# autodetection_retry: 0 - -# ---------------------------------------------------------------------- -# JOBS (data collection sources) -# -# The default JOBS share the same *name*. JOBS with the same name -# are mutually exclusive. Only one of them will be allowed running at -# any time. This allows autodetection to try several alternatives and -# pick the one that works. -# -# Any number of jobs is supported. -# -# All python.d.plugin JOBS (for all its modules) support a set of -# predefined parameters. These are: -# -# job_name: -# name: myname # the JOB's name as it will appear at the -# # dashboard (by default is the job_name) -# # JOBs sharing a name are mutually exclusive -# update_every: 1 # the JOB's data collection frequency -# priority: 60000 # the JOB's order on the dashboard -# penalty: yes # the JOB's penalty -# autodetection_retry: 0 # the JOB's re-check interval in seconds -# -# Additionally to the above, example also supports the following: -# -# loop_mode: yes/no # default is yes. If set to yes `nvidia-smi` is executed in a separate thread using `-l` option. -# poll_seconds: SECONDS # default is 1. Sets the frequency of seconds the nvidia-smi tool is polled in loop mode. -# exclude_zero_memory_users: yes/no # default is no. Whether to collect users metrics with 0Mb memory allocation. -# -# ---------------------------------------------------------------------- diff --git a/src/collectors/python.d.plugin/python.d.conf b/src/collectors/python.d.plugin/python.d.conf index ca024b4301277a..e4d2872c9bab65 100644 --- a/src/collectors/python.d.plugin/python.d.conf +++ b/src/collectors/python.d.plugin/python.d.conf @@ -36,7 +36,6 @@ example: no go_expvar: no # haproxy: yes # monit: yes -# nvidia_smi: yes # openldap: yes # oracledb: yes # pandas: yes @@ -73,6 +72,7 @@ mongodb: no # Removed (replaced with go.d/mongodb). mysql: no # Removed (replaced with go.d/mysql). nginx: no # Removed (replaced with go.d/nginx). nsd: no # Removed (replaced with go.d/nsd). +nvidia_smi: no # Removed (replaced with go.d/nvidia_smi). postfix: no # Removed (replaced with go.d/postfix). postgres: no # Removed (replaced with go.d/postgres). proxysql: no # Removed (replaced with go.d/proxysql). From f9a773ea7e823a20b3077398f8aa9e501dded19a Mon Sep 17 00:00:00 2001 From: Netdata bot <43409846+netdatabot@users.noreply.github.com> Date: Mon, 12 Aug 2024 08:16:32 -0400 Subject: [PATCH 17/27] Regenerate integrations.js (#18317) Co-authored-by: ilyam8 <22274335+ilyam8@users.noreply.github.com> --- integrations/integrations.js | 4 ++-- integrations/integrations.json | 4 ++-- .../go.d/modules/nvidia_smi/integrations/nvidia_gpu.md | 9 ++------- 3 files changed, 6 insertions(+), 11 deletions(-) diff --git a/integrations/integrations.js b/integrations/integrations.js index 89541e2ffe29ce..6549cd86c70c27 100644 --- a/integrations/integrations.js +++ b/integrations/integrations.js @@ -5541,8 +5541,8 @@ export const integrations = [ }, "most_popular": false }, - "overview": "# Nvidia GPU\n\nPlugin: go.d.plugin\nModule: nvidia_smi\n\n## Overview\n\nThis collector monitors GPUs performance metrics using\nthe [nvidia-smi](https://developer.nvidia.com/nvidia-system-management-interface) CLI tool.\n\n> **Warning**: under development, [loop mode](https://github.com/netdata/netdata/issues/14522) not implemented yet.\n\n\n\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis integration doesn't support auto-detection.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### Enable in go.d.conf.\n\nThis collector is disabled by default. You need to explicitly enable it in the `go.d.conf` file.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/nvidia_smi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/nvidia_smi.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| binary_path | Path to nvidia_smi binary. The default is \"nvidia_smi\" and the executable is looked for in the directories specified in the PATH environment variable. | nvidia_smi | no |\n| timeout | nvidia_smi binary execution timeout. | 2 | no |\n\n{% /details %}\n#### Examples\n\n##### Custom binary path\n\nThe executable is not in the directories specified in the PATH environment variable.\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: nvidia_smi\n binary_path: /usr/local/sbin/nvidia_smi\n\n```\n{% /details %}\n", + "overview": "# Nvidia GPU\n\nPlugin: go.d.plugin\nModule: nvidia_smi\n\n## Overview\n\nThis collector monitors GPUs performance metrics using\nthe [nvidia-smi](https://developer.nvidia.com/nvidia-system-management-interface) CLI tool.\n\n\n\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis integration doesn't support auto-detection.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", + "setup": "## Setup\n\n### Prerequisites\n\nNo action required.\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/nvidia_smi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/nvidia_smi.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| binary_path | Path to nvidia_smi binary. The default is \"nvidia_smi\" and the executable is looked for in the directories specified in the PATH environment variable. | nvidia_smi | no |\n| timeout | nvidia_smi binary execution timeout. | 2 | no |\n| loop_mode | When enabled, `nvidia-smi` is executed continuously in a separate thread using the `-l` option. | yes | no |\n\n{% /details %}\n#### Examples\n\n##### Custom binary path\n\nThe executable is not in the directories specified in the PATH environment variable.\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: nvidia_smi\n binary_path: /usr/local/sbin/nvidia_smi\n\n```\n{% /details %}\n", "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `nvidia_smi` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m nvidia_smi\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `nvidia_smi` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep nvidia_smi\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep nvidia_smi /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep nvidia_smi\n```\n\n", "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per gpu\n\nThese metrics refer to the GPU.\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nvidia_smi.gpu_pcie_bandwidth_usage | rx, tx | B/s |\n| nvidia_smi.gpu_pcie_bandwidth_utilization | rx, tx | % |\n| nvidia_smi.gpu_fan_speed_perc | fan_speed | % |\n| nvidia_smi.gpu_utilization | gpu | % |\n| nvidia_smi.gpu_memory_utilization | memory | % |\n| nvidia_smi.gpu_decoder_utilization | decoder | % |\n| nvidia_smi.gpu_encoder_utilization | encoder | % |\n| nvidia_smi.gpu_frame_buffer_memory_usage | free, used, reserved | B |\n| nvidia_smi.gpu_bar1_memory_usage | free, used | B |\n| nvidia_smi.gpu_temperature | temperature | Celsius |\n| nvidia_smi.gpu_voltage | voltage | V |\n| nvidia_smi.gpu_clock_freq | graphics, video, sm, mem | MHz |\n| nvidia_smi.gpu_power_draw | power_draw | Watts |\n| nvidia_smi.gpu_performance_state | P0-P15 | state |\n| nvidia_smi.gpu_mig_mode_current_status | enabled, disabled | status |\n| nvidia_smi.gpu_mig_devices_count | mig | devices |\n\n### Per mig\n\nThese metrics refer to the Multi-Instance GPU (MIG).\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n| gpu_instance_id | GPU instance id (e.g. 1) |\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nvidia_smi.gpu_mig_frame_buffer_memory_usage | free, used, reserved | B |\n| nvidia_smi.gpu_mig_bar1_memory_usage | free, used | B |\n\n", diff --git a/integrations/integrations.json b/integrations/integrations.json index 8dbfbab199d2fb..5fce39fcc95276 100644 --- a/integrations/integrations.json +++ b/integrations/integrations.json @@ -5539,8 +5539,8 @@ }, "most_popular": false }, - "overview": "# Nvidia GPU\n\nPlugin: go.d.plugin\nModule: nvidia_smi\n\n## Overview\n\nThis collector monitors GPUs performance metrics using\nthe [nvidia-smi](https://developer.nvidia.com/nvidia-system-management-interface) CLI tool.\n\n> **Warning**: under development, [loop mode](https://github.com/netdata/netdata/issues/14522) not implemented yet.\n\n\n\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis integration doesn't support auto-detection.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### Enable in go.d.conf.\n\nThis collector is disabled by default. You need to explicitly enable it in the `go.d.conf` file.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/nvidia_smi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/nvidia_smi.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| binary_path | Path to nvidia_smi binary. The default is \"nvidia_smi\" and the executable is looked for in the directories specified in the PATH environment variable. | nvidia_smi | no |\n| timeout | nvidia_smi binary execution timeout. | 2 | no |\n\n#### Examples\n\n##### Custom binary path\n\nThe executable is not in the directories specified in the PATH environment variable.\n\n```yaml\njobs:\n - name: nvidia_smi\n binary_path: /usr/local/sbin/nvidia_smi\n\n```\n", + "overview": "# Nvidia GPU\n\nPlugin: go.d.plugin\nModule: nvidia_smi\n\n## Overview\n\nThis collector monitors GPUs performance metrics using\nthe [nvidia-smi](https://developer.nvidia.com/nvidia-system-management-interface) CLI tool.\n\n\n\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis integration doesn't support auto-detection.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", + "setup": "## Setup\n\n### Prerequisites\n\nNo action required.\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/nvidia_smi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/nvidia_smi.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 10 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| binary_path | Path to nvidia_smi binary. The default is \"nvidia_smi\" and the executable is looked for in the directories specified in the PATH environment variable. | nvidia_smi | no |\n| timeout | nvidia_smi binary execution timeout. | 2 | no |\n| loop_mode | When enabled, `nvidia-smi` is executed continuously in a separate thread using the `-l` option. | yes | no |\n\n#### Examples\n\n##### Custom binary path\n\nThe executable is not in the directories specified in the PATH environment variable.\n\n```yaml\njobs:\n - name: nvidia_smi\n binary_path: /usr/local/sbin/nvidia_smi\n\n```\n", "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `nvidia_smi` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m nvidia_smi\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `nvidia_smi` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep nvidia_smi\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep nvidia_smi /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep nvidia_smi\n```\n\n", "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per gpu\n\nThese metrics refer to the GPU.\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nvidia_smi.gpu_pcie_bandwidth_usage | rx, tx | B/s |\n| nvidia_smi.gpu_pcie_bandwidth_utilization | rx, tx | % |\n| nvidia_smi.gpu_fan_speed_perc | fan_speed | % |\n| nvidia_smi.gpu_utilization | gpu | % |\n| nvidia_smi.gpu_memory_utilization | memory | % |\n| nvidia_smi.gpu_decoder_utilization | decoder | % |\n| nvidia_smi.gpu_encoder_utilization | encoder | % |\n| nvidia_smi.gpu_frame_buffer_memory_usage | free, used, reserved | B |\n| nvidia_smi.gpu_bar1_memory_usage | free, used | B |\n| nvidia_smi.gpu_temperature | temperature | Celsius |\n| nvidia_smi.gpu_voltage | voltage | V |\n| nvidia_smi.gpu_clock_freq | graphics, video, sm, mem | MHz |\n| nvidia_smi.gpu_power_draw | power_draw | Watts |\n| nvidia_smi.gpu_performance_state | P0-P15 | state |\n| nvidia_smi.gpu_mig_mode_current_status | enabled, disabled | status |\n| nvidia_smi.gpu_mig_devices_count | mig | devices |\n\n### Per mig\n\nThese metrics refer to the Multi-Instance GPU (MIG).\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| uuid | GPU id (e.g. 00000000:00:04.0) |\n| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |\n| gpu_instance_id | GPU instance id (e.g. 1) |\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| nvidia_smi.gpu_mig_frame_buffer_memory_usage | free, used, reserved | B |\n| nvidia_smi.gpu_mig_bar1_memory_usage | free, used | B |\n\n", diff --git a/src/go/plugin/go.d/modules/nvidia_smi/integrations/nvidia_gpu.md b/src/go/plugin/go.d/modules/nvidia_smi/integrations/nvidia_gpu.md index 2496ea89184e09..ab972511365305 100644 --- a/src/go/plugin/go.d/modules/nvidia_smi/integrations/nvidia_gpu.md +++ b/src/go/plugin/go.d/modules/nvidia_smi/integrations/nvidia_gpu.md @@ -24,8 +24,6 @@ Module: nvidia_smi This collector monitors GPUs performance metrics using the [nvidia-smi](https://developer.nvidia.com/nvidia-system-management-interface) CLI tool. -> **Warning**: under development, [loop mode](https://github.com/netdata/netdata/issues/14522) not implemented yet. - @@ -119,11 +117,7 @@ There are no alerts configured by default for this integration. ### Prerequisites -#### Enable in go.d.conf. - -This collector is disabled by default. You need to explicitly enable it in the `go.d.conf` file. - - +No action required. ### Configuration @@ -152,6 +146,7 @@ The following options can be defined globally: update_every, autodetection_retry | autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no | | binary_path | Path to nvidia_smi binary. The default is "nvidia_smi" and the executable is looked for in the directories specified in the PATH environment variable. | nvidia_smi | no | | timeout | nvidia_smi binary execution timeout. | 2 | no | +| loop_mode | When enabled, `nvidia-smi` is executed continuously in a separate thread using the `-l` option. | yes | no | From bfd83397dcc6bd99e7e19b9e68ec86cdc021ca4f Mon Sep 17 00:00:00 2001 From: "Austin S. Hemmelgarn" Date: Mon, 12 Aug 2024 11:55:17 -0400 Subject: [PATCH 18/27] Handle GOROOT inside build system instead of outside. (#18296) * Handle GOROOT inside build system instead of outside. * Fix parsing of GOROOT value. --- .github/workflows/build.yml | 4 ---- packaging/cmake/Modules/FindGo.cmake | 20 ++++++++++++++++++-- packaging/cmake/Modules/NetdataGoTools.cmake | 2 +- 3 files changed, 19 insertions(+), 7 deletions(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index e5a327909d056d..4d2fcdb0e2be5b 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -1071,10 +1071,6 @@ jobs: uses: actions/setup-go@v5 with: go-version: "^1.22" - - name: Set GOROOT - id: goroot - if: needs.file-check.outputs.run == 'true' - run: Add-Content -Path "$env:GITHUB_ENV" -Value "GOROOT=$(go.exe env GOROOT)" - name: Set Up Dependencies id: deps if: needs.file-check.outputs.run == 'true' diff --git a/packaging/cmake/Modules/FindGo.cmake b/packaging/cmake/Modules/FindGo.cmake index e282a10fcb09f5..69e23fda6781b1 100644 --- a/packaging/cmake/Modules/FindGo.cmake +++ b/packaging/cmake/Modules/FindGo.cmake @@ -21,11 +21,12 @@ endif() # and fall back to looking in PATH. For the specific case of MSYS2, we prefer a Windows install over an MSYS2 install. if(DEFINED $ENV{GOROOT}) find_program(GO_EXECUTABLE go PATHS "$ENV{GOROOT}/bin" DOC "Go toolchain" NO_DEFAULT_PATH) + set(GO_ROOT $ENV{GOROOT}) elseif(OS_WINDOWS) if(CMAKE_SYSTEM_NAME STREQUAL "Windows") find_program(GO_EXECUTABLE go PATHS C:/go/bin "C:/Program Files/go/bin" DOC "Go toolchain" NO_DEFAULT_PATH) else() - find_program(GO_EXECUTABLE go PATHS /c/go/bin "/c/Program Files/go/bin" /mingw64/bin /ucrt64/bin /clang64/bin DOC "Go toolchain" NO_DEFAULT_PATH) + find_program(GO_EXECUTABLE go PATHS /c/go/bin "/c/Program Files/go/bin" /mingw64/lib/go/bin /ucrt64/lib/go/bin /clang64/lib/go/bin DOC "Go toolchain" NO_DEFAULT_PATH) endif() else() find_program(GO_EXECUTABLE go PATHS /usr/local/go/bin DOC "Go toolchain" NO_DEFAULT_PATH) @@ -41,12 +42,27 @@ if (GO_EXECUTABLE) if (RESULT EQUAL 0) string(REGEX MATCH "go([0-9]+\\.[0-9]+(\\.[0-9]+)?)" GO_VERSION_STRING "${GO_VERSION_STRING}") string(REGEX MATCH "([0-9]+\\.[0-9]+(\\.[0-9]+)?)" GO_VERSION_STRING "${GO_VERSION_STRING}") + else() + unset(GO_VERSION_STRING) + endif() + + if(NOT DEFINED GO_ROOT) + execute_process( + COMMAND ${GO_EXECUTABLE} env GOROOT + OUTPUT_VARIABLE GO_ROOT + RESULT_VARIABLE RESULT + ) + if(RESULT EQUAL 0) + string(REGEX REPLACE "\n$" "" GO_ROOT "${GO_ROOT}") + else() + unset(GO_ROOT) + endif() endif() endif() include(FindPackageHandleStandardArgs) find_package_handle_standard_args( Go - REQUIRED_VARS GO_EXECUTABLE + REQUIRED_VARS GO_EXECUTABLE GO_ROOT VERSION_VAR GO_VERSION_STRING ) diff --git a/packaging/cmake/Modules/NetdataGoTools.cmake b/packaging/cmake/Modules/NetdataGoTools.cmake index 3e249c7c9c4d94..c8b8b9c0139ac5 100644 --- a/packaging/cmake/Modules/NetdataGoTools.cmake +++ b/packaging/cmake/Modules/NetdataGoTools.cmake @@ -33,7 +33,7 @@ macro(add_go_target target output build_src build_dir) add_custom_command( OUTPUT ${output} - COMMAND "${CMAKE_COMMAND}" -E env CGO_ENABLED=0 GOPROXY=https://proxy.golang.org,direct "${GO_EXECUTABLE}" build -buildvcs=false -ldflags "${GO_LDFLAGS}" -o "${CMAKE_BINARY_DIR}/${output}" "./${build_dir}" + COMMAND "${CMAKE_COMMAND}" -E env GOROOT=${GO_ROOT} CGO_ENABLED=0 GOPROXY=https://proxy.golang.org,direct "${GO_EXECUTABLE}" build -buildvcs=false -ldflags "${GO_LDFLAGS}" -o "${CMAKE_BINARY_DIR}/${output}" "./${build_dir}" DEPENDS ${${target}_DEPS} COMMENT "Building Go component ${output}" WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}/${build_src}" From d594a238ba4b1b8d00c2fa117205f633a7a9bdaa Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Mon, 12 Aug 2024 19:41:04 +0300 Subject: [PATCH 19/27] go.d redis: fix default "address" in config_schema.json (#18320) --- src/go/plugin/go.d/modules/redis/config_schema.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/go/plugin/go.d/modules/redis/config_schema.json b/src/go/plugin/go.d/modules/redis/config_schema.json index 9ab297071d2815..c57b06ac0caa0c 100644 --- a/src/go/plugin/go.d/modules/redis/config_schema.json +++ b/src/go/plugin/go.d/modules/redis/config_schema.json @@ -15,7 +15,7 @@ "title": "URI", "description": "The URI specifying the connection details for the Redis server.", "type": "string", - "default": "redis://@localhost:9221" + "default": "redis://@localhost:6379" }, "timeout": { "title": "Timeout", From c3ec6fae56ef241d535b7d2f18f3918da2856bbb Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Mon, 12 Aug 2024 23:24:31 +0300 Subject: [PATCH 20/27] add go.d dovecot (#18321) * add go.d dovecot * rm trailing spaces --- src/collectors/python.d.plugin/python.d.conf | 2 +- src/go/plugin/go.d/README.md | 1 + src/go/plugin/go.d/config/go.d.conf | 1 + src/go/plugin/go.d/config/go.d/dovecot.conf | 6 + src/go/plugin/go.d/config/go.d/sd/docker.conf | 7 + .../go.d/config/go.d/sd/net_listeners.conf | 7 + src/go/plugin/go.d/modules/dovecot/charts.go | 185 ++++++++++++ src/go/plugin/go.d/modules/dovecot/client.go | 54 ++++ src/go/plugin/go.d/modules/dovecot/collect.go | 89 ++++++ .../go.d/modules/dovecot/config_schema.json | 47 +++ src/go/plugin/go.d/modules/dovecot/dovecot.go | 101 +++++++ .../go.d/modules/dovecot/dovecot_test.go | 281 ++++++++++++++++++ .../plugin/go.d/modules/dovecot/metadata.yaml | 194 ++++++++++++ .../go.d/modules/dovecot/testdata/config.json | 5 + .../go.d/modules/dovecot/testdata/config.yaml | 3 + .../dovecot/testdata/export_global.txt | 2 + src/go/plugin/go.d/modules/init.go | 1 + .../go.d/modules/openvpn/config_schema.json | 2 +- 18 files changed, 986 insertions(+), 2 deletions(-) create mode 100644 src/go/plugin/go.d/config/go.d/dovecot.conf create mode 100644 src/go/plugin/go.d/modules/dovecot/charts.go create mode 100644 src/go/plugin/go.d/modules/dovecot/client.go create mode 100644 src/go/plugin/go.d/modules/dovecot/collect.go create mode 100644 src/go/plugin/go.d/modules/dovecot/config_schema.json create mode 100644 src/go/plugin/go.d/modules/dovecot/dovecot.go create mode 100644 src/go/plugin/go.d/modules/dovecot/dovecot_test.go create mode 100644 src/go/plugin/go.d/modules/dovecot/metadata.yaml create mode 100644 src/go/plugin/go.d/modules/dovecot/testdata/config.json create mode 100644 src/go/plugin/go.d/modules/dovecot/testdata/config.yaml create mode 100644 src/go/plugin/go.d/modules/dovecot/testdata/export_global.txt diff --git a/src/collectors/python.d.plugin/python.d.conf b/src/collectors/python.d.plugin/python.d.conf index e4d2872c9bab65..252247511e9c82 100644 --- a/src/collectors/python.d.plugin/python.d.conf +++ b/src/collectors/python.d.plugin/python.d.conf @@ -30,7 +30,6 @@ gc_interval: 300 # boinc: yes # ceph: yes # changefinder: no -# dovecot: yes # this is just an example example: no go_expvar: no @@ -56,6 +55,7 @@ go_expvar: no adaptec_raid: no # Removed (replaced with go.d/adaptercraid). apache: no # Removed (replaced with go.d/apache). beanstalk: no # Removed (replaced with go.d/beanstalk). +dovecot: no # Removed (replaced with go.d/dovecot). elasticsearch: no # Removed (replaced with go.d/elasticsearch). exim: no # Removed (replaced with go.d/exim). fail2ban: no # Removed (replaced with go.d/fail2ban). diff --git a/src/go/plugin/go.d/README.md b/src/go/plugin/go.d/README.md index 91111034f8aa77..5edb11ef028197 100644 --- a/src/go/plugin/go.d/README.md +++ b/src/go/plugin/go.d/README.md @@ -71,6 +71,7 @@ see the appropriate collector readme. | [docker](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/docker) | Docker Engine | | [docker_engine](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/docker_engine) | Docker Engine | | [dockerhub](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/dockerhub) | Docker Hub | +| [dovecot](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/dovecot) | Dovecot | | [elasticsearch](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/elasticsearch) | Elasticsearch/OpenSearch | | [envoy](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/envoy) | Envoy | | [example](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/example) | - | diff --git a/src/go/plugin/go.d/config/go.d.conf b/src/go/plugin/go.d/config/go.d.conf index 439eb8b462fb95..c0b2ed2be90c13 100644 --- a/src/go/plugin/go.d/config/go.d.conf +++ b/src/go/plugin/go.d/config/go.d.conf @@ -36,6 +36,7 @@ modules: # docker: yes # docker_engine: yes # dockerhub: yes +# dovecot: yes # elasticsearch: yes # envoy: yes # example: no diff --git a/src/go/plugin/go.d/config/go.d/dovecot.conf b/src/go/plugin/go.d/config/go.d/dovecot.conf new file mode 100644 index 00000000000000..5dd31bd7da8ae0 --- /dev/null +++ b/src/go/plugin/go.d/config/go.d/dovecot.conf @@ -0,0 +1,6 @@ +## All available configuration options, their descriptions and default values: +## https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/dovecot#readme + +jobs: + - name: local + address: unix:///var/run/dovecot/old-stats diff --git a/src/go/plugin/go.d/config/go.d/sd/docker.conf b/src/go/plugin/go.d/config/go.d/sd/docker.conf index 27238592d544aa..a4d4d29eda2e63 100644 --- a/src/go/plugin/go.d/config/go.d/sd/docker.conf +++ b/src/go/plugin/go.d/config/go.d/sd/docker.conf @@ -38,6 +38,8 @@ classify: expr: '{{ or (eq .PrivatePort "8091") (match "sp" .Image "couchbase couchbase:*") }}' - tags: "couchdb" expr: '{{ or (eq .PrivatePort "5984") (match "sp" .Image "couchdb couchdb:*") }}' + - tags: "dovecot" + expr: '{{ or (eq .PrivatePort "24242") (match "sp" .Image "*/dovecot */dovecot:*") }}' - tags: "elasticsearch" expr: '{{ or (eq .PrivatePort "9200") (match "sp" .Image "elasticsearch elasticsearch:* */elasticsearch */elasticsearch:* */opensearch */opensearch:*") }}' - tags: "gearman" @@ -124,6 +126,11 @@ compose: module: couchdb name: docker_{{.Name}} url: http://{{.Address}} + - selector: "dovecot" + template: | + module: dovecot + name: docker_{{.Name}} + address: {{.Address}} - selector: "elasticsearch" template: | module: elasticsearch diff --git a/src/go/plugin/go.d/config/go.d/sd/net_listeners.conf b/src/go/plugin/go.d/config/go.d/sd/net_listeners.conf index a78b6c1affd979..e41840d97f1a6b 100644 --- a/src/go/plugin/go.d/config/go.d/sd/net_listeners.conf +++ b/src/go/plugin/go.d/config/go.d/sd/net_listeners.conf @@ -42,6 +42,8 @@ classify: expr: '{{ and (eq .Protocol "UDP") (eq .Port "53") (eq .Comm "dnsmasq") }}' - tags: "docker_engine" expr: '{{ and (eq .Port "9323") (eq .Comm "dockerd") }}' + - tags: "dovecot" + expr: '{{ and (eq .Port "24242") (eq .Comm "dovecot") }}' - tags: "elasticsearch" expr: '{{ or (eq .Port "9200") (glob .Cmdline "*elasticsearch*" "*opensearch*") }}' - tags: "envoy" @@ -216,6 +218,11 @@ compose: module: docker_engine name: local url: http://{{.Address}}/metrics + - selector: "dovecot" + template: | + module: dovecot + name: local + address: {{.Address}} - selector: "elasticsearch" template: | module: elasticsearch diff --git a/src/go/plugin/go.d/modules/dovecot/charts.go b/src/go/plugin/go.d/modules/dovecot/charts.go new file mode 100644 index 00000000000000..3a8bb1a8c2b219 --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/charts.go @@ -0,0 +1,185 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package dovecot + +import ( + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" +) + +const ( + prioSessions = module.Priority + iota + prioLogins + prioAuthenticationAttempts + prioCommands + prioPageFaults + prioContextSwitches + prioDiskIO + prioNetTraffic + prioSysCalls + prioLookups + prioCachePerformance + prioAuthCachePerformance +) + +var charts = module.Charts{ + sessionsChart.Copy(), + loginsChart.Copy(), + authAttemptsChart.Copy(), + commandsChart.Copy(), + pageFaultsChart.Copy(), + contextSwitchesChart.Copy(), + diskIOChart.Copy(), + netTrafficChart.Copy(), + sysCallsChart.Copy(), + lookupsChart.Copy(), + cacheChart.Copy(), + authCacheChart.Copy(), +} + +var ( + sessionsChart = module.Chart{ + ID: "sessions", + Title: "Dovecot Active Sessions", + Units: "sessions", + Fam: "sessions", + Ctx: "dovecot.sessions", + Priority: prioSessions, + Dims: module.Dims{ + {ID: "num_connected_sessions", Name: "active"}, + }, + } + loginsChart = module.Chart{ + ID: "logins", + Title: "Dovecot Logins", + Units: "logins", + Fam: "logins", + Ctx: "dovecot.logins", + Priority: prioLogins, + Dims: module.Dims{ + {ID: "num_logins", Name: "logins"}, + }, + } + authAttemptsChart = module.Chart{ + ID: "auth", + Title: "Dovecot Authentications", + Units: "attempts/s", + Fam: "logins", + Ctx: "dovecot.auth", + Priority: prioAuthenticationAttempts, + Type: module.Stacked, + Dims: module.Dims{ + {ID: "auth_successes", Name: "ok", Algo: module.Incremental}, + {ID: "auth_failures", Name: "failed", Algo: module.Incremental}, + }, + } + commandsChart = module.Chart{ + ID: "commands", + Title: "Dovecot Commands", + Units: "commands", + Fam: "commands", + Ctx: "dovecot.commands", + Priority: prioCommands, + Dims: module.Dims{ + {ID: "num_cmds", Name: "commands"}, + }, + } + pageFaultsChart = module.Chart{ + ID: "faults", + Title: "Dovecot Page Faults", + Units: "faults/s", + Fam: "page faults", + Ctx: "dovecot.faults", + Priority: prioPageFaults, + Dims: module.Dims{ + {ID: "min_faults", Name: "minor", Algo: module.Incremental}, + {ID: "maj_faults", Name: "major", Algo: module.Incremental}, + }, + } + contextSwitchesChart = module.Chart{ + ID: "context_switches", + Title: "Dovecot Context Switches", + Units: "switches/s", + Fam: "context switches", + Ctx: "dovecot.context_switches", + Priority: prioContextSwitches, + Dims: module.Dims{ + {ID: "vol_cs", Name: "voluntary", Algo: module.Incremental}, + {ID: "invol_cs", Name: "involuntary", Algo: module.Incremental}, + }, + } + diskIOChart = module.Chart{ + ID: "io", + Title: "Dovecot Disk I/O", + Units: "KiB/s", + Fam: "disk", + Ctx: "dovecot.io", + Priority: prioDiskIO, + Type: module.Area, + Dims: module.Dims{ + {ID: "disk_input", Name: "read", Div: 1024, Algo: module.Incremental}, + {ID: "disk_output", Name: "write", Mul: -1, Div: 1024, Algo: module.Incremental}, + }, + } + netTrafficChart = module.Chart{ + ID: "net", + Title: "Dovecot Network Bandwidth", + Units: "kilobits/s", + Fam: "network", + Ctx: "dovecot.net", + Priority: prioNetTraffic, + Type: module.Area, + Dims: module.Dims{ + {ID: "read_bytes", Name: "read", Mul: 8, Div: 1000, Algo: module.Incremental}, + {ID: "write_bytes", Name: "write", Mul: -8, Div: 1000, Algo: module.Incremental}, + }, + } + sysCallsChart = module.Chart{ + ID: "syscalls", + Title: "Dovecot Number of SysCalls", + Units: "syscalls/s", + Fam: "system", + Ctx: "dovecot.syscalls", + Priority: prioSysCalls, + Dims: module.Dims{ + {ID: "read_count", Name: "read", Algo: module.Incremental}, + {ID: "write_count", Name: "write", Algo: module.Incremental}, + }, + } + lookupsChart = module.Chart{ + ID: "lookup", + Title: "Dovecot Lookups", + Units: "lookups/s", + Fam: "lookups", + Ctx: "dovecot.lookup", + Priority: prioLookups, + Type: module.Stacked, + Dims: module.Dims{ + {ID: "mail_lookup_path", Name: "path", Algo: module.Incremental}, + {ID: "mail_lookup_attr", Name: "attr", Algo: module.Incremental}, + }, + } + cacheChart = module.Chart{ + ID: "cache", + Title: "Dovecot Cache Hits", + Units: "hits/s", + Fam: "cache", + Ctx: "dovecot.cache", + Priority: prioCachePerformance, + Dims: module.Dims{ + {ID: "mail_cache_hits", Name: "hits", Algo: module.Incremental}, + }, + } + authCacheChart = module.Chart{ + ID: "auth_cache", + Title: "Dovecot Authentication Cache", + Units: "requests/s", + Fam: "cache", + Ctx: "dovecot.auth_cache", + Priority: prioAuthCachePerformance, + Type: module.Stacked, + Dims: module.Dims{ + {ID: "auth_cache_hits", Name: "hits", Algo: module.Incremental}, + {ID: "auth_cache_misses", Name: "misses", Algo: module.Incremental}, + }, + } +) diff --git a/src/go/plugin/go.d/modules/dovecot/client.go b/src/go/plugin/go.d/modules/dovecot/client.go new file mode 100644 index 00000000000000..245d1743fcfa7f --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/client.go @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package dovecot + +import ( + "bytes" + + "github.com/netdata/netdata/go/plugins/plugin/go.d/pkg/socket" +) + +type dovecotConn interface { + connect() error + disconnect() + queryExportGlobal() ([]byte, error) +} + +func newDovecotConn(conf Config) dovecotConn { + return &dovecotClient{conn: socket.New(socket.Config{ + Address: conf.Address, + ConnectTimeout: conf.Timeout.Duration(), + ReadTimeout: conf.Timeout.Duration(), + WriteTimeout: conf.Timeout.Duration(), + })} +} + +type dovecotClient struct { + conn socket.Client +} + +func (c *dovecotClient) connect() error { + return c.conn.Connect() +} + +func (c *dovecotClient) disconnect() { + _ = c.conn.Disconnect() +} + +func (c *dovecotClient) queryExportGlobal() ([]byte, error) { + var b bytes.Buffer + var n int + + err := c.conn.Command("EXPORT\tglobal\n", func(bs []byte) bool { + b.Write(bs) + b.WriteByte('\n') + + n++ + return n < 2 + }) + if err != nil { + return nil, err + } + + return b.Bytes(), nil +} diff --git a/src/go/plugin/go.d/modules/dovecot/collect.go b/src/go/plugin/go.d/modules/dovecot/collect.go new file mode 100644 index 00000000000000..a93bfc811a193d --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/collect.go @@ -0,0 +1,89 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package dovecot + +import ( + "bufio" + "bytes" + "errors" + "fmt" + "strconv" + "strings" +) + +// FIXME: drop using "old_stats" in favour of "stats" (https://doc.dovecot.org/configuration_manual/stats/openmetrics/). + +func (d *Dovecot) collect() (map[string]int64, error) { + if d.conn == nil { + conn, err := d.establishConn() + if err != nil { + return nil, err + } + d.conn = conn + } + + stats, err := d.conn.queryExportGlobal() + if err != nil { + d.conn.disconnect() + d.conn = nil + return nil, err + } + + mx := make(map[string]int64) + + // https://doc.dovecot.org/configuration_manual/stats/old_statistics/#statistics-gathered + if err := d.collectExportGlobal(mx, stats); err != nil { + return nil, err + } + + return mx, nil +} + +func (d *Dovecot) collectExportGlobal(mx map[string]int64, resp []byte) error { + sc := bufio.NewScanner(bytes.NewReader(resp)) + + if !sc.Scan() { + return errors.New("failed to read fields line from export global response") + } + fieldsLine := strings.TrimSpace(sc.Text()) + + if !sc.Scan() { + return errors.New("failed to read values line from export global response") + } + valuesLine := strings.TrimSpace(sc.Text()) + + if fieldsLine == "" || valuesLine == "" { + return errors.New("empty fields line or values line from export global response") + } + + fields := strings.Fields(fieldsLine) + values := strings.Fields(valuesLine) + + if len(fields) != len(values) { + return fmt.Errorf("mismatched fields and values count: fields=%d, values=%d", len(fields), len(values)) + } + + for i, name := range fields { + val := values[i] + + v, err := strconv.ParseInt(val, 10, 64) + if err != nil { + d.Debugf("failed to parse export value %s %s: %v", name, val, err) + continue + } + + mx[name] = v + } + + return nil +} + +func (d *Dovecot) establishConn() (dovecotConn, error) { + conn := d.newConn(d.Config) + + if err := conn.connect(); err != nil { + return nil, err + } + + return conn, nil +} diff --git a/src/go/plugin/go.d/modules/dovecot/config_schema.json b/src/go/plugin/go.d/modules/dovecot/config_schema.json new file mode 100644 index 00000000000000..cf99b69392ffcd --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/config_schema.json @@ -0,0 +1,47 @@ +{ + "jsonSchema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "Dovecot collector configuration.", + "type": "object", + "properties": { + "update_every": { + "title": "Update every", + "description": "Data collection interval, measured in seconds.", + "type": "integer", + "minimum": 1, + "default": 1 + }, + "address": { + "title": "Address", + "description": "The Unix or TCP socket address where the Dovecot [old_stats](https://doc.dovecot.org/configuration_manual/stats/old_statistics/#old-statistics) plugin listens for connections.", + "type": "string", + "default": "127.0.0.1:24242" + }, + "timeout": { + "title": "Timeout", + "description": "Timeout for establishing a connection and communication (reading and writing) in seconds.", + "type": "number", + "minimum": 0.5, + "default": 1 + } + }, + "required": [ + "address" + ], + "additionalProperties": false, + "patternProperties": { + "^name$": {} + } + }, + "uiSchema": { + "uiOptions": { + "fullPage": true + }, + "address": { + "ui:help": "Use `unix://{path_to_socket}` for Unix socket or `{ip}:{port}` for TCP socket." + }, + "timeout": { + "ui:help": "Accepts decimals for precise control (e.g., type 1.5 for 1.5 seconds)." + } + } +} diff --git a/src/go/plugin/go.d/modules/dovecot/dovecot.go b/src/go/plugin/go.d/modules/dovecot/dovecot.go new file mode 100644 index 00000000000000..ee3d6239905d39 --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/dovecot.go @@ -0,0 +1,101 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package dovecot + +import ( + _ "embed" + "errors" + "time" + + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" + "github.com/netdata/netdata/go/plugins/plugin/go.d/pkg/web" +) + +//go:embed "config_schema.json" +var configSchema string + +func init() { + module.Register("dovecot", module.Creator{ + JobConfigSchema: configSchema, + Create: func() module.Module { return New() }, + Config: func() any { return &Config{} }, + }) +} + +func New() *Dovecot { + return &Dovecot{ + Config: Config{ + Address: "127.0.0.1:24242", + Timeout: web.Duration(time.Second * 1), + }, + newConn: newDovecotConn, + charts: charts.Copy(), + } +} + +type Config struct { + UpdateEvery int `yaml:"update_every,omitempty" json:"update_every"` + Address string `yaml:"address" json:"address"` + Timeout web.Duration `yaml:"timeout" json:"timeout"` +} + +type Dovecot struct { + module.Base + Config `yaml:",inline" json:""` + + charts *module.Charts + + newConn func(Config) dovecotConn + conn dovecotConn +} + +func (d *Dovecot) Configuration() any { + return d.Config +} + +func (d *Dovecot) Init() error { + if d.Address == "" { + d.Error("config: 'address' not set") + return errors.New("address not set") + } + + return nil +} + +func (d *Dovecot) Check() error { + mx, err := d.collect() + if err != nil { + d.Error(err) + return err + } + + if len(mx) == 0 { + return errors.New("no metrics collected") + } + + return nil +} + +func (d *Dovecot) Charts() *module.Charts { + return d.charts +} + +func (d *Dovecot) Collect() map[string]int64 { + mx, err := d.collect() + if err != nil { + d.Error(err) + } + + if len(mx) == 0 { + return nil + } + + return mx +} + +func (d *Dovecot) Cleanup() { + if d.conn != nil { + d.conn.disconnect() + d.conn = nil + } +} diff --git a/src/go/plugin/go.d/modules/dovecot/dovecot_test.go b/src/go/plugin/go.d/modules/dovecot/dovecot_test.go new file mode 100644 index 00000000000000..ba60adeb6d7635 --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/dovecot_test.go @@ -0,0 +1,281 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package dovecot + +import ( + "errors" + "os" + "testing" + + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +var ( + dataConfigJSON, _ = os.ReadFile("testdata/config.json") + dataConfigYAML, _ = os.ReadFile("testdata/config.yaml") + + dataExportGlobal, _ = os.ReadFile("testdata/export_global.txt") +) + +func Test_testDataIsValid(t *testing.T) { + for name, data := range map[string][]byte{ + "dataConfigJSON": dataConfigJSON, + "dataConfigYAML": dataConfigYAML, + "dataExportGlobal": dataExportGlobal, + } { + require.NotNil(t, data, name) + } +} + +func TestDovecot_ConfigurationSerialize(t *testing.T) { + module.TestConfigurationSerialize(t, &Dovecot{}, dataConfigJSON, dataConfigYAML) +} + +func TestDovecot_Init(t *testing.T) { + tests := map[string]struct { + config Config + wantFail bool + }{ + "success with default config": { + wantFail: false, + config: New().Config, + }, + "fails if address not set": { + wantFail: true, + config: func() Config { + conf := New().Config + conf.Address = "" + return conf + }(), + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + dovecot := New() + dovecot.Config = test.config + + if test.wantFail { + assert.Error(t, dovecot.Init()) + } else { + assert.NoError(t, dovecot.Init()) + } + }) + } +} + +func TestDovecot_Cleanup(t *testing.T) { + tests := map[string]struct { + prepare func() *Dovecot + }{ + "not initialized": { + prepare: func() *Dovecot { + return New() + }, + }, + "after check": { + prepare: func() *Dovecot { + dovecot := New() + dovecot.newConn = func(config Config) dovecotConn { return prepareMockOk() } + _ = dovecot.Check() + return dovecot + }, + }, + "after collect": { + prepare: func() *Dovecot { + dovecot := New() + dovecot.newConn = func(config Config) dovecotConn { return prepareMockOk() } + _ = dovecot.Collect() + return dovecot + }, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + dovecot := test.prepare() + + assert.NotPanics(t, dovecot.Cleanup) + }) + } +} + +func TestDovecot_Charts(t *testing.T) { + assert.NotNil(t, New().Charts()) +} + +func TestDovecot_Check(t *testing.T) { + tests := map[string]struct { + prepareMock func() *mockDovecotConn + wantFail bool + }{ + "success case": { + wantFail: false, + prepareMock: prepareMockOk, + }, + "err on connect": { + wantFail: true, + prepareMock: prepareMockErrOnConnect, + }, + "unexpected response": { + wantFail: true, + prepareMock: prepareMockUnexpectedResponse, + }, + "empty response": { + wantFail: true, + prepareMock: prepareMockEmptyResponse, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + dovecot := New() + mock := test.prepareMock() + dovecot.newConn = func(config Config) dovecotConn { return mock } + + if test.wantFail { + assert.Error(t, dovecot.Check()) + } else { + assert.NoError(t, dovecot.Check()) + } + }) + } +} + +func TestDovecot_Collect(t *testing.T) { + tests := map[string]struct { + prepareMock func() *mockDovecotConn + wantMetrics map[string]int64 + disconnectBeforeCleanup bool + disconnectAfterCleanup bool + }{ + "success case": { + prepareMock: prepareMockOk, + disconnectBeforeCleanup: false, + disconnectAfterCleanup: true, + wantMetrics: map[string]int64{ + "auth_cache_hits": 1, + "auth_cache_misses": 1, + "auth_db_tempfails": 1, + "auth_failures": 1, + "auth_master_successes": 1, + "auth_successes": 1, + "disk_input": 1, + "disk_output": 1, + "invol_cs": 1, + "mail_cache_hits": 1, + "mail_lookup_attr": 1, + "mail_lookup_path": 1, + "mail_read_bytes": 1, + "mail_read_count": 1, + "maj_faults": 1, + "min_faults": 1, + "num_cmds": 1, + "num_connected_sessions": 1, + "num_logins": 1, + "read_bytes": 1, + "read_count": 1, + "reset_timestamp": 1723481629, + "vol_cs": 1, + "write_bytes": 1, + "write_count": 1, + }, + }, + "unexpected response": { + prepareMock: prepareMockUnexpectedResponse, + disconnectBeforeCleanup: false, + disconnectAfterCleanup: true, + }, + "empty response": { + prepareMock: prepareMockEmptyResponse, + disconnectBeforeCleanup: false, + disconnectAfterCleanup: true, + }, + "err on connect": { + prepareMock: prepareMockErrOnConnect, + disconnectBeforeCleanup: false, + disconnectAfterCleanup: false, + }, + "err on query stats": { + prepareMock: prepareMockErrOnQueryExportGlobal, + disconnectBeforeCleanup: true, + disconnectAfterCleanup: true, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + dovecot := New() + mock := test.prepareMock() + dovecot.newConn = func(config Config) dovecotConn { return mock } + + mx := dovecot.Collect() + + require.Equal(t, test.wantMetrics, mx) + + if len(test.wantMetrics) > 0 { + module.TestMetricsHasAllChartsDims(t, dovecot.Charts(), mx) + } + + assert.Equal(t, test.disconnectBeforeCleanup, mock.disconnectCalled, "disconnect before cleanup") + dovecot.Cleanup() + assert.Equal(t, test.disconnectAfterCleanup, mock.disconnectCalled, "disconnect after cleanup") + }) + } +} + +func prepareMockOk() *mockDovecotConn { + return &mockDovecotConn{ + exportGlobalResponse: dataExportGlobal, + } +} + +func prepareMockErrOnConnect() *mockDovecotConn { + return &mockDovecotConn{ + errOnConnect: true, + } +} + +func prepareMockErrOnQueryExportGlobal() *mockDovecotConn { + return &mockDovecotConn{ + errOnQueryExportGlobal: true, + } +} + +func prepareMockUnexpectedResponse() *mockDovecotConn { + return &mockDovecotConn{ + exportGlobalResponse: []byte("Lorem ipsum dolor sit amet, consectetur adipiscing elit."), + } +} + +func prepareMockEmptyResponse() *mockDovecotConn { + return &mockDovecotConn{} +} + +type mockDovecotConn struct { + errOnConnect bool + errOnQueryExportGlobal bool + exportGlobalResponse []byte + disconnectCalled bool +} + +func (m *mockDovecotConn) connect() error { + if m.errOnConnect { + return errors.New("mock.connect() error") + } + return nil +} + +func (m *mockDovecotConn) disconnect() { + m.disconnectCalled = true +} + +func (m *mockDovecotConn) queryExportGlobal() ([]byte, error) { + if m.errOnQueryExportGlobal { + return nil, errors.New("mock.queryExportGlobal() error") + } + return m.exportGlobalResponse, nil +} diff --git a/src/go/plugin/go.d/modules/dovecot/metadata.yaml b/src/go/plugin/go.d/modules/dovecot/metadata.yaml new file mode 100644 index 00000000000000..948990bca7711d --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/metadata.yaml @@ -0,0 +1,194 @@ +plugin_name: go.d.plugin +modules: + - meta: + id: collector-go.d.plugin-dovecot + plugin_name: go.d.plugin + module_name: dovecot + monitored_instance: + name: Dovecot + link: 'https://www.dovecot.org/' + categories: + - data-collection.mail-servers + icon_filename: "dovecot.svg" + related_resources: + integrations: + list: [] + info_provided_to_referring_integrations: + description: "" + keywords: + - dovecot + - imap + - mail + most_popular: false + overview: + data_collection: + metrics_description: | + This collector monitors Dovecot metrics about sessions, logins, commands, page faults and more. + method_description: | + It reads the server's response to the `EXPORT\tglobal\n` command. + supported_platforms: + include: [] + exclude: [] + multi_instance: true + additional_permissions: + description: "" + default_behavior: + auto_detection: + description: | + Automatically discovers and collects Dovecot statistics from the following default locations: + + - localhost:24242 + - unix:///var/run/dovecot/old-stats + limits: + description: "" + performance_impact: + description: "" + setup: + prerequisites: + list: + - title: Enable old_stats plugin + description: | + To enable `old_stats` plugin, see [Old Statistics](https://doc.dovecot.org/configuration_manual/stats/old_statistics/#old-statistics). + configuration: + file: + name: go.d/dovecot.conf + options: + description: | + The following options can be defined globally: update_every, autodetection_retry. + folding: + title: Config options + enabled: true + list: + - name: update_every + description: Data collection frequency. + default_value: 1 + required: false + - name: autodetection_retry + description: Recheck interval in seconds. Zero means no recheck will be scheduled. + default_value: 0 + required: false + - name: address + description: "The Unix or TCP socket address where the Dovecot [old_stats](https://doc.dovecot.org/configuration_manual/stats/old_statistics/#old-statistics) plugin listens for connections." + default_value: 127.0.0.1:24242 + required: true + - name: timeout + description: Connection, read, and write timeout duration in seconds. The timeout includes name resolution. + default_value: 1 + required: false + examples: + folding: + title: Config + enabled: true + list: + - name: Basic (TCP) + description: A basic example configuration. + config: | + jobs: + - name: local + address: 127.0.0.1:24242 + - name: Basic (UNIX) + description: A basic example configuration using a UNIX socket. + config: | + jobs: + - name: local + address: unix:///var/run/dovecot/old-stats + - name: Multi-instance + description: | + > **Note**: When you define multiple jobs, their names must be unique. + + Collecting metrics from local and remote instances. + config: | + jobs: + - name: local + address: 127.0.0.1:24242 + + - name: remote + address: 203.0.113.0:24242 + troubleshooting: + problems: + list: [] + alerts: [] + metrics: + folding: + title: Metrics + enabled: false + description: "" + availability: [] + scopes: + - name: global + description: "These metrics refer to the entire monitored application." + labels: [] + metrics: + - name: dovecot.session + description: Dovecot Active Sessions + unit: "sessions" + chart_type: line + dimensions: + - name: active + - name: dovecot.logins + description: Dovecot Logins + unit: "logins" + chart_type: line + dimensions: + - name: logins + - name: dovecot.auth + description: Dovecot Authentications + unit: "attempts/s" + chart_type: stacked + dimensions: + - name: ok + - name: failed + - name: dovecot.commands + description: Dovecot Commands + unit: "commands" + chart_type: line + dimensions: + - name: commands + - name: dovecot.context_switches + description: Dovecot Context Switches + unit: "switches/s" + chart_type: line + dimensions: + - name: voluntary + - name: voluntary + - name: dovecot.io + description: Dovecot Disk I/O + unit: "KiB/s" + chart_type: area + dimensions: + - name: read + - name: write + - name: dovecot.net + description: Dovecot Network Bandwidth + unit: "kilobits/s" + chart_type: area + dimensions: + - name: read + - name: write + - name: dovecot.syscalls + description: Dovecot Number of SysCalls + unit: "syscalls/s" + chart_type: line + dimensions: + - name: read + - name: write + - name: dovecot.lookup + description: Dovecot Lookups + unit: "lookups/s" + chart_type: stacked + dimensions: + - name: path + - name: attr + - name: dovecot.cache + description: Dovecot Cache Hits + unit: "hits/s" + chart_type: line + dimensions: + - name: hits + - name: dovecot.auth_cache + description: Dovecot Authentication Cache + unit: "requests/s" + chart_type: stacked + dimensions: + - name: hits + - name: misses diff --git a/src/go/plugin/go.d/modules/dovecot/testdata/config.json b/src/go/plugin/go.d/modules/dovecot/testdata/config.json new file mode 100644 index 00000000000000..e868347203bee3 --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/testdata/config.json @@ -0,0 +1,5 @@ +{ + "update_every": 123, + "address": "ok", + "timeout": 123.123 +} diff --git a/src/go/plugin/go.d/modules/dovecot/testdata/config.yaml b/src/go/plugin/go.d/modules/dovecot/testdata/config.yaml new file mode 100644 index 00000000000000..1b81d09eb8288b --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/testdata/config.yaml @@ -0,0 +1,3 @@ +update_every: 123 +address: "ok" +timeout: 123.123 diff --git a/src/go/plugin/go.d/modules/dovecot/testdata/export_global.txt b/src/go/plugin/go.d/modules/dovecot/testdata/export_global.txt new file mode 100644 index 00000000000000..00d28914a6172c --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/testdata/export_global.txt @@ -0,0 +1,2 @@ +reset_timestamp last_update num_logins num_cmds num_connected_sessions user_cpu sys_cpu clock_time min_faults maj_faults vol_cs invol_cs disk_input disk_output read_count read_bytes write_count write_bytes mail_lookup_path mail_lookup_attr mail_read_count mail_read_bytes mail_cache_hits auth_successes auth_master_successes auth_failures auth_db_tempfails auth_cache_hits auth_cache_misses +1723481629 1.111111 1 1 1 1.1 1.1 1.1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 diff --git a/src/go/plugin/go.d/modules/init.go b/src/go/plugin/go.d/modules/init.go index 5592bfb79fb351..73f98f6e6c31bc 100644 --- a/src/go/plugin/go.d/modules/init.go +++ b/src/go/plugin/go.d/modules/init.go @@ -25,6 +25,7 @@ import ( _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/docker" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/docker_engine" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/dockerhub" + _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/dovecot" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/elasticsearch" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/envoy" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/example" diff --git a/src/go/plugin/go.d/modules/openvpn/config_schema.json b/src/go/plugin/go.d/modules/openvpn/config_schema.json index 9824000b310a96..8bbda1fd46c297 100644 --- a/src/go/plugin/go.d/modules/openvpn/config_schema.json +++ b/src/go/plugin/go.d/modules/openvpn/config_schema.json @@ -15,7 +15,7 @@ "title": "Address", "description": "The IP address and port where the OpenVPN [Management Interface](https://openvpn.net/community-resources/management-interface/) listens for connections.", "type": "string", - "default": "127.0.0.1:123" + "default": "127.0.0.1:7505" }, "timeout": { "title": "Timeout", From 68255b5aa8f64821ad42812cb55992badf3c32b4 Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Mon, 12 Aug 2024 23:41:24 +0300 Subject: [PATCH 21/27] remove python.d/dovecot (#18322) --- CMakeLists.txt | 2 - .../python.d.plugin/dovecot/README.md | 1 - .../python.d.plugin/dovecot/dovecot.chart.py | 143 ----------- .../python.d.plugin/dovecot/dovecot.conf | 98 -------- .../dovecot/integrations/dovecot.md | 230 ------------------ .../python.d.plugin/dovecot/metadata.yaml | 207 ---------------- 6 files changed, 681 deletions(-) delete mode 120000 src/collectors/python.d.plugin/dovecot/README.md delete mode 100644 src/collectors/python.d.plugin/dovecot/dovecot.chart.py delete mode 100644 src/collectors/python.d.plugin/dovecot/dovecot.conf delete mode 100644 src/collectors/python.d.plugin/dovecot/integrations/dovecot.md delete mode 100644 src/collectors/python.d.plugin/dovecot/metadata.yaml diff --git a/CMakeLists.txt b/CMakeLists.txt index eb11b57fb57002..5bd2ab0f93051b 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -2780,7 +2780,6 @@ install(FILES src/collectors/python.d.plugin/boinc/boinc.conf src/collectors/python.d.plugin/ceph/ceph.conf src/collectors/python.d.plugin/changefinder/changefinder.conf - src/collectors/python.d.plugin/dovecot/dovecot.conf src/collectors/python.d.plugin/example/example.conf src/collectors/python.d.plugin/go_expvar/go_expvar.conf src/collectors/python.d.plugin/haproxy/haproxy.conf @@ -2807,7 +2806,6 @@ install(FILES src/collectors/python.d.plugin/boinc/boinc.chart.py src/collectors/python.d.plugin/ceph/ceph.chart.py src/collectors/python.d.plugin/changefinder/changefinder.chart.py - src/collectors/python.d.plugin/dovecot/dovecot.chart.py src/collectors/python.d.plugin/example/example.chart.py src/collectors/python.d.plugin/go_expvar/go_expvar.chart.py src/collectors/python.d.plugin/haproxy/haproxy.chart.py diff --git a/src/collectors/python.d.plugin/dovecot/README.md b/src/collectors/python.d.plugin/dovecot/README.md deleted file mode 120000 index c4749cedce0686..00000000000000 --- a/src/collectors/python.d.plugin/dovecot/README.md +++ /dev/null @@ -1 +0,0 @@ -integrations/dovecot.md \ No newline at end of file diff --git a/src/collectors/python.d.plugin/dovecot/dovecot.chart.py b/src/collectors/python.d.plugin/dovecot/dovecot.chart.py deleted file mode 100644 index dfaef28b5aaa10..00000000000000 --- a/src/collectors/python.d.plugin/dovecot/dovecot.chart.py +++ /dev/null @@ -1,143 +0,0 @@ -# -*- coding: utf-8 -*- -# Description: dovecot netdata python.d module -# Author: Pawel Krupa (paulfantom) -# SPDX-License-Identifier: GPL-3.0-or-later - -from bases.FrameworkServices.SocketService import SocketService - -UNIX_SOCKET = '/var/run/dovecot/stats' - -ORDER = [ - 'sessions', - 'logins', - 'commands', - 'faults', - 'context_switches', - 'io', - 'net', - 'syscalls', - 'lookup', - 'cache', - 'auth', - 'auth_cache' -] - -CHARTS = { - 'sessions': { - 'options': [None, 'Dovecot Active Sessions', 'number', 'sessions', 'dovecot.sessions', 'line'], - 'lines': [ - ['num_connected_sessions', 'active sessions', 'absolute'] - ] - }, - 'logins': { - 'options': [None, 'Dovecot Logins', 'number', 'logins', 'dovecot.logins', 'line'], - 'lines': [ - ['num_logins', 'logins', 'absolute'] - ] - }, - 'commands': { - 'options': [None, 'Dovecot Commands', 'commands', 'commands', 'dovecot.commands', 'line'], - 'lines': [ - ['num_cmds', 'commands', 'absolute'] - ] - }, - 'faults': { - 'options': [None, 'Dovecot Page Faults', 'faults', 'page faults', 'dovecot.faults', 'line'], - 'lines': [ - ['min_faults', 'minor', 'absolute'], - ['maj_faults', 'major', 'absolute'] - ] - }, - 'context_switches': { - 'options': [None, 'Dovecot Context Switches', 'switches', 'context switches', 'dovecot.context_switches', - 'line'], - 'lines': [ - ['vol_cs', 'voluntary', 'absolute'], - ['invol_cs', 'involuntary', 'absolute'] - ] - }, - 'io': { - 'options': [None, 'Dovecot Disk I/O', 'KiB/s', 'disk', 'dovecot.io', 'area'], - 'lines': [ - ['disk_input', 'read', 'incremental', 1, 1024], - ['disk_output', 'write', 'incremental', -1, 1024] - ] - }, - 'net': { - 'options': [None, 'Dovecot Network Bandwidth', 'kilobits/s', 'network', 'dovecot.net', 'area'], - 'lines': [ - ['read_bytes', 'read', 'incremental', 8, 1000], - ['write_bytes', 'write', 'incremental', -8, 1000] - ] - }, - 'syscalls': { - 'options': [None, 'Dovecot Number of SysCalls', 'syscalls/s', 'system', 'dovecot.syscalls', 'line'], - 'lines': [ - ['read_count', 'read', 'incremental'], - ['write_count', 'write', 'incremental'] - ] - }, - 'lookup': { - 'options': [None, 'Dovecot Lookups', 'number/s', 'lookups', 'dovecot.lookup', 'stacked'], - 'lines': [ - ['mail_lookup_path', 'path', 'incremental'], - ['mail_lookup_attr', 'attr', 'incremental'] - ] - }, - 'cache': { - 'options': [None, 'Dovecot Cache Hits', 'hits/s', 'cache', 'dovecot.cache', 'line'], - 'lines': [ - ['mail_cache_hits', 'hits', 'incremental'] - ] - }, - 'auth': { - 'options': [None, 'Dovecot Authentications', 'attempts', 'logins', 'dovecot.auth', 'stacked'], - 'lines': [ - ['auth_successes', 'ok', 'absolute'], - ['auth_failures', 'failed', 'absolute'] - ] - }, - 'auth_cache': { - 'options': [None, 'Dovecot Authentication Cache', 'number', 'cache', 'dovecot.auth_cache', 'stacked'], - 'lines': [ - ['auth_cache_hits', 'hit', 'absolute'], - ['auth_cache_misses', 'miss', 'absolute'] - ] - } -} - - -class Service(SocketService): - def __init__(self, configuration=None, name=None): - SocketService.__init__(self, configuration=configuration, name=name) - self.order = ORDER - self.definitions = CHARTS - self.host = None # localhost - self.port = None # 24242 - self.unix_socket = UNIX_SOCKET - self.request = 'EXPORT\tglobal\r\n' - - def _get_data(self): - """ - Format data received from socket - :return: dict - """ - try: - raw = self._get_raw_data() - except (ValueError, AttributeError): - return None - - if raw is None: - self.debug('dovecot returned no data') - return None - - data = raw.split('\n')[:2] - desc = data[0].split('\t') - vals = data[1].split('\t') - ret = dict() - for i, _ in enumerate(desc): - try: - ret[str(desc[i])] = int(vals[i]) - except ValueError: - continue - return ret or None diff --git a/src/collectors/python.d.plugin/dovecot/dovecot.conf b/src/collectors/python.d.plugin/dovecot/dovecot.conf deleted file mode 100644 index 451dbc9acc2de0..00000000000000 --- a/src/collectors/python.d.plugin/dovecot/dovecot.conf +++ /dev/null @@ -1,98 +0,0 @@ -# netdata python.d.plugin configuration for dovecot -# -# This file is in YaML format. Generally the format is: -# -# name: value -# -# There are 2 sections: -# - global variables -# - one or more JOBS -# -# JOBS allow you to collect values from multiple sources. -# Each source will have its own set of charts. -# -# JOB parameters have to be indented (using spaces only, example below). - -# ---------------------------------------------------------------------- -# Global Variables -# These variables set the defaults for all JOBs, however each JOB -# may define its own, overriding the defaults. - -# update_every sets the default data collection frequency. -# If unset, the python.d.plugin default is used. -# update_every: 1 - -# priority controls the order of charts at the netdata dashboard. -# Lower numbers move the charts towards the top of the page. -# If unset, the default for python.d.plugin is used. -# priority: 60000 - -# penalty indicates whether to apply penalty to update_every in case of failures. -# Penalty will increase every 5 failed updates in a row. Maximum penalty is 10 minutes. -# penalty: yes - -# autodetection_retry sets the job re-check interval in seconds. -# The job is not deleted if check fails. -# Attempts to start the job are made once every autodetection_retry. -# This feature is disabled by default. -# autodetection_retry: 0 - -# ---------------------------------------------------------------------- -# JOBS (data collection sources) -# -# The default JOBS share the same *name*. JOBS with the same name -# are mutually exclusive. Only one of them will be allowed running at -# any time. This allows autodetection to try several alternatives and -# pick the one that works. -# -# Any number of jobs is supported. -# -# All python.d.plugin JOBS (for all its modules) support a set of -# predefined parameters. These are: -# -# job_name: -# name: myname # the JOB's name as it will appear at the -# # dashboard (by default is the job_name) -# # JOBs sharing a name are mutually exclusive -# update_every: 1 # the JOB's data collection frequency -# priority: 60000 # the JOB's order on the dashboard -# penalty: yes # the JOB's penalty -# autodetection_retry: 0 # the JOB's re-check interval in seconds -# -# Additionally to the above, dovecot also supports the following: -# -# socket: 'path/to/dovecot/stats' -# -# or -# host: 'IP or HOSTNAME' # the host to connect to -# port: PORT # the port to connect to -# -# - -# ---------------------------------------------------------------------- -# AUTO-DETECTION JOBS -# only one of them will run (they have the same name) - -localhost: - name : 'local' - host : 'localhost' - port : 24242 - -localipv4: - name : 'local' - host : '127.0.0.1' - port : 24242 - -localipv6: - name : 'local' - host : '::1' - port : 24242 - -localsocket: - name : 'local' - socket : '/var/run/dovecot/stats' - -localsocket_old: - name : 'local' - socket : '/var/run/dovecot/old-stats' - diff --git a/src/collectors/python.d.plugin/dovecot/integrations/dovecot.md b/src/collectors/python.d.plugin/dovecot/integrations/dovecot.md deleted file mode 100644 index a6472f9406ad56..00000000000000 --- a/src/collectors/python.d.plugin/dovecot/integrations/dovecot.md +++ /dev/null @@ -1,230 +0,0 @@ - - -# Dovecot - - - - - -Plugin: python.d.plugin -Module: dovecot - - - -## Overview - -This collector monitors Dovecot metrics about sessions, logins, commands, page faults and more. - -It uses the dovecot socket and executes the `EXPORT global` command to get the statistics. - -This collector is supported on all platforms. - -This collector supports collecting metrics from multiple instances of this integration, including remote instances. - - -### Default Behavior - -#### Auto-Detection - -If no configuration is given, the collector will attempt to connect to dovecot using unix socket localized in `/var/run/dovecot/stats` - -#### Limits - -The default configuration for this integration does not impose any limits on data collection. - -#### Performance Impact - -The default configuration for this integration is not expected to impose a significant performance impact on the system. - - -## Metrics - -Metrics grouped by *scope*. - -The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. - - - -### Per Dovecot instance - -These metrics refer to the entire monitored application. - -This scope has no labels. - -Metrics: - -| Metric | Dimensions | Unit | -|:------|:----------|:----| -| dovecot.sessions | active sessions | number | -| dovecot.logins | logins | number | -| dovecot.commands | commands | commands | -| dovecot.faults | minor, major | faults | -| dovecot.context_switches | voluntary, involuntary | switches | -| dovecot.io | read, write | KiB/s | -| dovecot.net | read, write | kilobits/s | -| dovecot.syscalls | read, write | syscalls/s | -| dovecot.lookup | path, attr | number/s | -| dovecot.cache | hits | hits/s | -| dovecot.auth | ok, failed | attempts | -| dovecot.auth_cache | hit, miss | number | - - - -## Alerts - -There are no alerts configured by default for this integration. - - -## Setup - -### Prerequisites - -#### Dovecot configuration - -The Dovecot UNIX socket should have R/W permissions for user netdata, or Dovecot should be configured with a TCP/IP socket. - - -### Configuration - -#### File - -The configuration file name for this integration is `python.d/dovecot.conf`. - - -You can edit the configuration file using the `edit-config` script from the -Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). - -```bash -cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata -sudo ./edit-config python.d/dovecot.conf -``` -#### Options - -There are 2 sections: - -* Global variables -* One or more JOBS that can define multiple different instances to monitor. - -The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. - -Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. - -Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. - - -
Config options - -| Name | Description | Default | Required | -|:----|:-----------|:-------|:--------:| -| update_every | Sets the default data collection frequency. | 5 | no | -| priority | Controls the order of charts at the netdata dashboard. | 60000 | no | -| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no | -| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no | -| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | no | -| socket | Use this socket to communicate with Devcot | /var/run/dovecot/stats | no | -| host | Instead of using a socket, you can point the collector to an ip for devcot statistics. | | no | -| port | Used in combination with host, configures the port devcot listens to. | | no | - -
- -#### Examples - -##### Local TCP - -A basic TCP configuration. - -
Config - -```yaml -localtcpip: - name: 'local' - host: '127.0.0.1' - port: 24242 - -``` -
- -##### Local socket - -A basic local socket configuration - -
Config - -```yaml -localsocket: - name: 'local' - socket: '/var/run/dovecot/stats' - -``` -
- - - -## Troubleshooting - -### Debug Mode - -To troubleshoot issues with the `dovecot` collector, run the `python.d.plugin` with the debug option enabled. The output -should give you clues as to why the collector isn't working. - -- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on - your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. - - ```bash - cd /usr/libexec/netdata/plugins.d/ - ``` - -- Switch to the `netdata` user. - - ```bash - sudo -u netdata -s - ``` - -- Run the `python.d.plugin` to debug the collector: - - ```bash - ./python.d.plugin dovecot debug trace - ``` - -### Getting Logs - -If you're encountering problems with the `dovecot` collector, follow these steps to retrieve logs and identify potential issues: - -- **Run the command** specific to your system (systemd, non-systemd, or Docker container). -- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem. - -#### System with systemd - -Use the following command to view logs generated since the last Netdata service restart: - -```bash -journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep dovecot -``` - -#### System without systemd - -Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name: - -```bash -grep dovecot /var/log/netdata/collector.log -``` - -**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues. - -#### Docker Container - -If your Netdata runs in a Docker container named "netdata" (replace if different), use this command: - -```bash -docker logs netdata 2>&1 | grep dovecot -``` - - diff --git a/src/collectors/python.d.plugin/dovecot/metadata.yaml b/src/collectors/python.d.plugin/dovecot/metadata.yaml deleted file mode 100644 index b247da8460f905..00000000000000 --- a/src/collectors/python.d.plugin/dovecot/metadata.yaml +++ /dev/null @@ -1,207 +0,0 @@ -plugin_name: python.d.plugin -modules: - - meta: - plugin_name: python.d.plugin - module_name: dovecot - monitored_instance: - name: Dovecot - link: 'https://www.dovecot.org/' - categories: - - data-collection.mail-servers - icon_filename: 'dovecot.svg' - related_resources: - integrations: - list: [] - info_provided_to_referring_integrations: - description: '' - keywords: - - dovecot - - imap - - mail - most_popular: false - overview: - data_collection: - metrics_description: 'This collector monitors Dovecot metrics about sessions, logins, commands, page faults and more.' - method_description: 'It uses the dovecot socket and executes the `EXPORT global` command to get the statistics.' - supported_platforms: - include: [] - exclude: [] - multi_instance: true - additional_permissions: - description: '' - default_behavior: - auto_detection: - description: 'If no configuration is given, the collector will attempt to connect to dovecot using unix socket localized in `/var/run/dovecot/stats`' - limits: - description: '' - performance_impact: - description: '' - setup: - prerequisites: - list: - - title: 'Dovecot configuration' - description: The Dovecot UNIX socket should have R/W permissions for user netdata, or Dovecot should be configured with a TCP/IP socket. - configuration: - file: - name: python.d/dovecot.conf - options: - description: | - There are 2 sections: - - * Global variables - * One or more JOBS that can define multiple different instances to monitor. - - The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. - - Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. - - Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. - folding: - title: "Config options" - enabled: true - list: - - name: update_every - description: Sets the default data collection frequency. - default_value: 5 - required: false - - name: priority - description: Controls the order of charts at the netdata dashboard. - default_value: 60000 - required: false - - name: autodetection_retry - description: Sets the job re-check interval in seconds. - default_value: 0 - required: false - - name: penalty - description: Indicates whether to apply penalty to update_every in case of failures. - default_value: yes - required: false - - name: name - description: Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. - default_value: '' - required: false - - name: socket - description: Use this socket to communicate with Devcot - default_value: /var/run/dovecot/stats - required: false - - name: host - description: Instead of using a socket, you can point the collector to an ip for devcot statistics. - default_value: '' - required: false - - name: port - description: Used in combination with host, configures the port devcot listens to. - default_value: '' - required: false - examples: - folding: - enabled: true - title: "Config" - list: - - name: Local TCP - description: A basic TCP configuration. - config: | - localtcpip: - name: 'local' - host: '127.0.0.1' - port: 24242 - - name: Local socket - description: A basic local socket configuration - config: | - localsocket: - name: 'local' - socket: '/var/run/dovecot/stats' - troubleshooting: - problems: - list: [] - alerts: [] - metrics: - folding: - title: Metrics - enabled: false - description: "" - availability: [] - scopes: - - name: global - description: "These metrics refer to the entire monitored application." - labels: [] - metrics: - - name: dovecot.sessions - description: Dovecot Active Sessions - unit: "number" - chart_type: line - dimensions: - - name: active sessions - - name: dovecot.logins - description: Dovecot Logins - unit: "number" - chart_type: line - dimensions: - - name: logins - - name: dovecot.commands - description: Dovecot Commands - unit: "commands" - chart_type: line - dimensions: - - name: commands - - name: dovecot.faults - description: Dovecot Page Faults - unit: "faults" - chart_type: line - dimensions: - - name: minor - - name: major - - name: dovecot.context_switches - description: Dovecot Context Switches - unit: "switches" - chart_type: line - dimensions: - - name: voluntary - - name: involuntary - - name: dovecot.io - description: Dovecot Disk I/O - unit: "KiB/s" - chart_type: area - dimensions: - - name: read - - name: write - - name: dovecot.net - description: Dovecot Network Bandwidth - unit: "kilobits/s" - chart_type: area - dimensions: - - name: read - - name: write - - name: dovecot.syscalls - description: Dovecot Number of SysCalls - unit: "syscalls/s" - chart_type: line - dimensions: - - name: read - - name: write - - name: dovecot.lookup - description: Dovecot Lookups - unit: "number/s" - chart_type: stacked - dimensions: - - name: path - - name: attr - - name: dovecot.cache - description: Dovecot Cache Hits - unit: "hits/s" - chart_type: line - dimensions: - - name: hits - - name: dovecot.auth - description: Dovecot Authentications - unit: "attempts" - chart_type: stacked - dimensions: - - name: ok - - name: failed - - name: dovecot.auth_cache - description: Dovecot Authentication Cache - unit: "number" - chart_type: stacked - dimensions: - - name: hit - - name: miss From 26914ae6b2edc6cc25fecd29c04f16d1b8efd4f9 Mon Sep 17 00:00:00 2001 From: Netdata bot <43409846+netdatabot@users.noreply.github.com> Date: Mon, 12 Aug 2024 17:02:23 -0400 Subject: [PATCH 22/27] Regenerate integrations.js (#18324) Co-authored-by: ilyam8 <22274335+ilyam8@users.noreply.github.com> --- integrations/integrations.js | 75 +++--- integrations/integrations.json | 75 +++--- src/collectors/COLLECTORS.md | 2 +- src/go/plugin/go.d/modules/dovecot/README.md | 1 + .../modules/dovecot/integrations/dovecot.md | 242 ++++++++++++++++++ 5 files changed, 320 insertions(+), 75 deletions(-) create mode 120000 src/go/plugin/go.d/modules/dovecot/README.md create mode 100644 src/go/plugin/go.d/modules/dovecot/integrations/dovecot.md diff --git a/integrations/integrations.js b/integrations/integrations.js index 6549cd86c70c27..d14175df06b8e6 100644 --- a/integrations/integrations.js +++ b/integrations/integrations.js @@ -3960,6 +3960,44 @@ export const integrations = [ "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/dockerhub/metadata.yaml", "related_resources": "" }, + { + "meta": { + "id": "collector-go.d.plugin-dovecot", + "plugin_name": "go.d.plugin", + "module_name": "dovecot", + "monitored_instance": { + "name": "Dovecot", + "link": "https://www.dovecot.org/", + "categories": [ + "data-collection.mail-servers" + ], + "icon_filename": "dovecot.svg" + }, + "related_resources": { + "integrations": { + "list": [] + } + }, + "info_provided_to_referring_integrations": { + "description": "" + }, + "keywords": [ + "dovecot", + "imap", + "mail" + ], + "most_popular": false + }, + "overview": "# Dovecot\n\nPlugin: go.d.plugin\nModule: dovecot\n\n## Overview\n\nThis collector monitors Dovecot metrics about sessions, logins, commands, page faults and more.\n\n\nIt reads the server's response to the `EXPORT\\tglobal\\n` command.\n\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nAutomatically discovers and collects Dovecot statistics from the following default locations:\n\n- localhost:24242\n- unix:///var/run/dovecot/old-stats\n\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", + "setup": "## Setup\n\n### Prerequisites\n\n#### Enable old_stats plugin\n\nTo enable `old_stats` plugin, see [Old Statistics](https://doc.dovecot.org/configuration_manual/stats/old_statistics/#old-statistics).\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/dovecot.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/dovecot.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 1 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| address | The Unix or TCP socket address where the Dovecot [old_stats](https://doc.dovecot.org/configuration_manual/stats/old_statistics/#old-statistics) plugin listens for connections. | 127.0.0.1:24242 | yes |\n| timeout | Connection, read, and write timeout duration in seconds. The timeout includes name resolution. | 1 | no |\n\n{% /details %}\n#### Examples\n\n##### Basic (TCP)\n\nA basic example configuration.\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: local\n address: 127.0.0.1:24242\n\n```\n{% /details %}\n##### Basic (UNIX)\n\nA basic example configuration using a UNIX socket.\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: local\n address: unix:///var/run/dovecot/old-stats\n\n```\n{% /details %}\n##### Multi-instance\n\n> **Note**: When you define multiple jobs, their names must be unique.\n\nCollecting metrics from local and remote instances.\n\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: local\n address: 127.0.0.1:24242\n\n - name: remote\n address: 203.0.113.0:24242\n\n```\n{% /details %}\n", + "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `dovecot` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m dovecot\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `dovecot` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep dovecot\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep dovecot /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep dovecot\n```\n\n", + "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", + "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per Dovecot instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| dovecot.session | active | sessions |\n| dovecot.logins | logins | logins |\n| dovecot.auth | ok, failed | attempts/s |\n| dovecot.commands | commands | commands |\n| dovecot.context_switches | voluntary, voluntary | switches/s |\n| dovecot.io | read, write | KiB/s |\n| dovecot.net | read, write | kilobits/s |\n| dovecot.syscalls | read, write | syscalls/s |\n| dovecot.lookup | path, attr | lookups/s |\n| dovecot.cache | hits | hits/s |\n| dovecot.auth_cache | hits, misses | requests/s |\n\n", + "integration_type": "collector", + "id": "go.d.plugin-dovecot-Dovecot", + "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/dovecot/metadata.yaml", + "related_resources": "" + }, { "meta": { "id": "collector-go.d.plugin-elasticsearch", @@ -18957,43 +18995,6 @@ export const integrations = [ "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/changefinder/metadata.yaml", "related_resources": "" }, - { - "meta": { - "plugin_name": "python.d.plugin", - "module_name": "dovecot", - "monitored_instance": { - "name": "Dovecot", - "link": "https://www.dovecot.org/", - "categories": [ - "data-collection.mail-servers" - ], - "icon_filename": "dovecot.svg" - }, - "related_resources": { - "integrations": { - "list": [] - } - }, - "info_provided_to_referring_integrations": { - "description": "" - }, - "keywords": [ - "dovecot", - "imap", - "mail" - ], - "most_popular": false - }, - "overview": "# Dovecot\n\nPlugin: python.d.plugin\nModule: dovecot\n\n## Overview\n\nThis collector monitors Dovecot metrics about sessions, logins, commands, page faults and more.\n\nIt uses the dovecot socket and executes the `EXPORT global` command to get the statistics.\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nIf no configuration is given, the collector will attempt to connect to dovecot using unix socket localized in `/var/run/dovecot/stats`\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### Dovecot configuration\n\nThe Dovecot UNIX socket should have R/W permissions for user netdata, or Dovecot should be configured with a TCP/IP socket.\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `python.d/dovecot.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config python.d/dovecot.conf\n```\n#### Options\n\nThere are 2 sections:\n\n* Global variables\n* One or more JOBS that can define multiple different instances to monitor.\n\nThe following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values.\n\nAdditionally, the following collapsed table contains all the options that can be configured inside a JOB definition.\n\nEvery configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Sets the default data collection frequency. | 5 | no |\n| priority | Controls the order of charts at the netdata dashboard. | 60000 | no |\n| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no |\n| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no |\n| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | no |\n| socket | Use this socket to communicate with Devcot | /var/run/dovecot/stats | no |\n| host | Instead of using a socket, you can point the collector to an ip for devcot statistics. | | no |\n| port | Used in combination with host, configures the port devcot listens to. | | no |\n\n{% /details %}\n#### Examples\n\n##### Local TCP\n\nA basic TCP configuration.\n\n{% details open=true summary=\"Config\" %}\n```yaml\nlocaltcpip:\n name: 'local'\n host: '127.0.0.1'\n port: 24242\n\n```\n{% /details %}\n##### Local socket\n\nA basic local socket configuration\n\n{% details open=true summary=\"Config\" %}\n```yaml\nlocalsocket:\n name: 'local'\n socket: '/var/run/dovecot/stats'\n\n```\n{% /details %}\n", - "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `dovecot` collector, run the `python.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `python.d.plugin` to debug the collector:\n\n ```bash\n ./python.d.plugin dovecot debug trace\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `dovecot` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep dovecot\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep dovecot /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep dovecot\n```\n\n", - "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", - "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per Dovecot instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| dovecot.sessions | active sessions | number |\n| dovecot.logins | logins | number |\n| dovecot.commands | commands | commands |\n| dovecot.faults | minor, major | faults |\n| dovecot.context_switches | voluntary, involuntary | switches |\n| dovecot.io | read, write | KiB/s |\n| dovecot.net | read, write | kilobits/s |\n| dovecot.syscalls | read, write | syscalls/s |\n| dovecot.lookup | path, attr | number/s |\n| dovecot.cache | hits | hits/s |\n| dovecot.auth | ok, failed | attempts |\n| dovecot.auth_cache | hit, miss | number |\n\n", - "integration_type": "collector", - "id": "python.d.plugin-dovecot-Dovecot", - "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/dovecot/metadata.yaml", - "related_resources": "" - }, { "meta": { "plugin_name": "python.d.plugin", diff --git a/integrations/integrations.json b/integrations/integrations.json index 5fce39fcc95276..a347b2f0700556 100644 --- a/integrations/integrations.json +++ b/integrations/integrations.json @@ -3958,6 +3958,44 @@ "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/dockerhub/metadata.yaml", "related_resources": "" }, + { + "meta": { + "id": "collector-go.d.plugin-dovecot", + "plugin_name": "go.d.plugin", + "module_name": "dovecot", + "monitored_instance": { + "name": "Dovecot", + "link": "https://www.dovecot.org/", + "categories": [ + "data-collection.mail-servers" + ], + "icon_filename": "dovecot.svg" + }, + "related_resources": { + "integrations": { + "list": [] + } + }, + "info_provided_to_referring_integrations": { + "description": "" + }, + "keywords": [ + "dovecot", + "imap", + "mail" + ], + "most_popular": false + }, + "overview": "# Dovecot\n\nPlugin: go.d.plugin\nModule: dovecot\n\n## Overview\n\nThis collector monitors Dovecot metrics about sessions, logins, commands, page faults and more.\n\n\nIt reads the server's response to the `EXPORT\\tglobal\\n` command.\n\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nAutomatically discovers and collects Dovecot statistics from the following default locations:\n\n- localhost:24242\n- unix:///var/run/dovecot/old-stats\n\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", + "setup": "## Setup\n\n### Prerequisites\n\n#### Enable old_stats plugin\n\nTo enable `old_stats` plugin, see [Old Statistics](https://doc.dovecot.org/configuration_manual/stats/old_statistics/#old-statistics).\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/dovecot.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/dovecot.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 1 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| address | The Unix or TCP socket address where the Dovecot [old_stats](https://doc.dovecot.org/configuration_manual/stats/old_statistics/#old-statistics) plugin listens for connections. | 127.0.0.1:24242 | yes |\n| timeout | Connection, read, and write timeout duration in seconds. The timeout includes name resolution. | 1 | no |\n\n#### Examples\n\n##### Basic (TCP)\n\nA basic example configuration.\n\n```yaml\njobs:\n - name: local\n address: 127.0.0.1:24242\n\n```\n##### Basic (UNIX)\n\nA basic example configuration using a UNIX socket.\n\n```yaml\njobs:\n - name: local\n address: unix:///var/run/dovecot/old-stats\n\n```\n##### Multi-instance\n\n> **Note**: When you define multiple jobs, their names must be unique.\n\nCollecting metrics from local and remote instances.\n\n\n```yaml\njobs:\n - name: local\n address: 127.0.0.1:24242\n\n - name: remote\n address: 203.0.113.0:24242\n\n```\n", + "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `dovecot` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m dovecot\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `dovecot` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep dovecot\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep dovecot /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep dovecot\n```\n\n", + "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", + "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per Dovecot instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| dovecot.session | active | sessions |\n| dovecot.logins | logins | logins |\n| dovecot.auth | ok, failed | attempts/s |\n| dovecot.commands | commands | commands |\n| dovecot.context_switches | voluntary, voluntary | switches/s |\n| dovecot.io | read, write | KiB/s |\n| dovecot.net | read, write | kilobits/s |\n| dovecot.syscalls | read, write | syscalls/s |\n| dovecot.lookup | path, attr | lookups/s |\n| dovecot.cache | hits | hits/s |\n| dovecot.auth_cache | hits, misses | requests/s |\n\n", + "integration_type": "collector", + "id": "go.d.plugin-dovecot-Dovecot", + "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/dovecot/metadata.yaml", + "related_resources": "" + }, { "meta": { "id": "collector-go.d.plugin-elasticsearch", @@ -18955,43 +18993,6 @@ "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/changefinder/metadata.yaml", "related_resources": "" }, - { - "meta": { - "plugin_name": "python.d.plugin", - "module_name": "dovecot", - "monitored_instance": { - "name": "Dovecot", - "link": "https://www.dovecot.org/", - "categories": [ - "data-collection.mail-servers" - ], - "icon_filename": "dovecot.svg" - }, - "related_resources": { - "integrations": { - "list": [] - } - }, - "info_provided_to_referring_integrations": { - "description": "" - }, - "keywords": [ - "dovecot", - "imap", - "mail" - ], - "most_popular": false - }, - "overview": "# Dovecot\n\nPlugin: python.d.plugin\nModule: dovecot\n\n## Overview\n\nThis collector monitors Dovecot metrics about sessions, logins, commands, page faults and more.\n\nIt uses the dovecot socket and executes the `EXPORT global` command to get the statistics.\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nIf no configuration is given, the collector will attempt to connect to dovecot using unix socket localized in `/var/run/dovecot/stats`\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### Dovecot configuration\n\nThe Dovecot UNIX socket should have R/W permissions for user netdata, or Dovecot should be configured with a TCP/IP socket.\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `python.d/dovecot.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config python.d/dovecot.conf\n```\n#### Options\n\nThere are 2 sections:\n\n* Global variables\n* One or more JOBS that can define multiple different instances to monitor.\n\nThe following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values.\n\nAdditionally, the following collapsed table contains all the options that can be configured inside a JOB definition.\n\nEvery configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Sets the default data collection frequency. | 5 | no |\n| priority | Controls the order of charts at the netdata dashboard. | 60000 | no |\n| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no |\n| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no |\n| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | no |\n| socket | Use this socket to communicate with Devcot | /var/run/dovecot/stats | no |\n| host | Instead of using a socket, you can point the collector to an ip for devcot statistics. | | no |\n| port | Used in combination with host, configures the port devcot listens to. | | no |\n\n#### Examples\n\n##### Local TCP\n\nA basic TCP configuration.\n\n```yaml\nlocaltcpip:\n name: 'local'\n host: '127.0.0.1'\n port: 24242\n\n```\n##### Local socket\n\nA basic local socket configuration\n\n```yaml\nlocalsocket:\n name: 'local'\n socket: '/var/run/dovecot/stats'\n\n```\n", - "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `dovecot` collector, run the `python.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `python.d.plugin` to debug the collector:\n\n ```bash\n ./python.d.plugin dovecot debug trace\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `dovecot` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep dovecot\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep dovecot /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep dovecot\n```\n\n", - "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", - "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per Dovecot instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| dovecot.sessions | active sessions | number |\n| dovecot.logins | logins | number |\n| dovecot.commands | commands | commands |\n| dovecot.faults | minor, major | faults |\n| dovecot.context_switches | voluntary, involuntary | switches |\n| dovecot.io | read, write | KiB/s |\n| dovecot.net | read, write | kilobits/s |\n| dovecot.syscalls | read, write | syscalls/s |\n| dovecot.lookup | path, attr | number/s |\n| dovecot.cache | hits | hits/s |\n| dovecot.auth | ok, failed | attempts |\n| dovecot.auth_cache | hit, miss | number |\n\n", - "integration_type": "collector", - "id": "python.d.plugin-dovecot-Dovecot", - "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/dovecot/metadata.yaml", - "related_resources": "" - }, { "meta": { "plugin_name": "python.d.plugin", diff --git a/src/collectors/COLLECTORS.md b/src/collectors/COLLECTORS.md index 5a2615d2a83215..be420d3d44b904 100644 --- a/src/collectors/COLLECTORS.md +++ b/src/collectors/COLLECTORS.md @@ -745,7 +745,7 @@ If you don't see the app/service you'd like to monitor in this list: - [DMARC](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/prometheus/integrations/dmarc.md) -- [Dovecot](https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/dovecot/integrations/dovecot.md) +- [Dovecot](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/dovecot/integrations/dovecot.md) - [Exim](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/exim/integrations/exim.md) diff --git a/src/go/plugin/go.d/modules/dovecot/README.md b/src/go/plugin/go.d/modules/dovecot/README.md new file mode 120000 index 00000000000000..c4749cedce0686 --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/README.md @@ -0,0 +1 @@ +integrations/dovecot.md \ No newline at end of file diff --git a/src/go/plugin/go.d/modules/dovecot/integrations/dovecot.md b/src/go/plugin/go.d/modules/dovecot/integrations/dovecot.md new file mode 100644 index 00000000000000..f95b6d9cbfb4ce --- /dev/null +++ b/src/go/plugin/go.d/modules/dovecot/integrations/dovecot.md @@ -0,0 +1,242 @@ + + +# Dovecot + + + + + +Plugin: go.d.plugin +Module: dovecot + + + +## Overview + +This collector monitors Dovecot metrics about sessions, logins, commands, page faults and more. + + +It reads the server's response to the `EXPORT\tglobal\n` command. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +Automatically discovers and collects Dovecot statistics from the following default locations: + +- localhost:24242 +- unix:///var/run/dovecot/old-stats + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Dovecot instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| dovecot.session | active | sessions | +| dovecot.logins | logins | logins | +| dovecot.auth | ok, failed | attempts/s | +| dovecot.commands | commands | commands | +| dovecot.context_switches | voluntary, voluntary | switches/s | +| dovecot.io | read, write | KiB/s | +| dovecot.net | read, write | kilobits/s | +| dovecot.syscalls | read, write | syscalls/s | +| dovecot.lookup | path, attr | lookups/s | +| dovecot.cache | hits | hits/s | +| dovecot.auth_cache | hits, misses | requests/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Enable old_stats plugin + +To enable `old_stats` plugin, see [Old Statistics](https://doc.dovecot.org/configuration_manual/stats/old_statistics/#old-statistics). + + + +### Configuration + +#### File + +The configuration file name for this integration is `go.d/dovecot.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config go.d/dovecot.conf +``` +#### Options + +The following options can be defined globally: update_every, autodetection_retry. + + +
Config options + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Data collection frequency. | 1 | no | +| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no | +| address | The Unix or TCP socket address where the Dovecot [old_stats](https://doc.dovecot.org/configuration_manual/stats/old_statistics/#old-statistics) plugin listens for connections. | 127.0.0.1:24242 | yes | +| timeout | Connection, read, and write timeout duration in seconds. The timeout includes name resolution. | 1 | no | + +
+ +#### Examples + +##### Basic (TCP) + +A basic example configuration. + +
Config + +```yaml +jobs: + - name: local + address: 127.0.0.1:24242 + +``` +
+ +##### Basic (UNIX) + +A basic example configuration using a UNIX socket. + +
Config + +```yaml +jobs: + - name: local + address: unix:///var/run/dovecot/old-stats + +``` +
+ +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +
Config + +```yaml +jobs: + - name: local + address: 127.0.0.1:24242 + + - name: remote + address: 203.0.113.0:24242 + +``` +
+ + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `dovecot` collector, run the `go.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `go.d.plugin` to debug the collector: + + ```bash + ./go.d.plugin -d -m dovecot + ``` + +### Getting Logs + +If you're encountering problems with the `dovecot` collector, follow these steps to retrieve logs and identify potential issues: + +- **Run the command** specific to your system (systemd, non-systemd, or Docker container). +- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem. + +#### System with systemd + +Use the following command to view logs generated since the last Netdata service restart: + +```bash +journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep dovecot +``` + +#### System without systemd + +Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name: + +```bash +grep dovecot /var/log/netdata/collector.log +``` + +**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues. + +#### Docker Container + +If your Netdata runs in a Docker container named "netdata" (replace if different), use this command: + +```bash +docker logs netdata 2>&1 | grep dovecot +``` + + From 98c5304a88819823b9b03c2a4928072fd66cc968 Mon Sep 17 00:00:00 2001 From: netdatabot Date: Tue, 13 Aug 2024 00:19:00 +0000 Subject: [PATCH 23/27] [ci skip] Update changelog and version for nightly build: v1.46.0-295-nightly. --- CHANGELOG.md | 24 ++++++++++++------------ packaging/version | 2 +- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9130e7e2300400..a319dca77583b0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,16 @@ **Merged pull requests:** +- Regenerate integrations.js [\#18324](https://github.com/netdata/netdata/pull/18324) ([netdatabot](https://github.com/netdatabot)) +- remove python.d/dovecot [\#18322](https://github.com/netdata/netdata/pull/18322) ([ilyam8](https://github.com/ilyam8)) +- add go.d dovecot [\#18321](https://github.com/netdata/netdata/pull/18321) ([ilyam8](https://github.com/ilyam8)) +- go.d redis: fix default "address" in config\_schema.json [\#18320](https://github.com/netdata/netdata/pull/18320) ([ilyam8](https://github.com/ilyam8)) +- Regenerate integrations.js [\#18317](https://github.com/netdata/netdata/pull/18317) ([netdatabot](https://github.com/netdatabot)) +- remove python.d/nvidia\_smi [\#18316](https://github.com/netdata/netdata/pull/18316) ([ilyam8](https://github.com/ilyam8)) +- go.d nvidia\_smi: enable by default [\#18315](https://github.com/netdata/netdata/pull/18315) ([ilyam8](https://github.com/ilyam8)) +- go.d nvidia\_smi: add loop mode [\#18313](https://github.com/netdata/netdata/pull/18313) ([ilyam8](https://github.com/ilyam8)) +- Regenerate integrations.js [\#18312](https://github.com/netdata/netdata/pull/18312) ([netdatabot](https://github.com/netdatabot)) +- go.d nvidia\_smi remove "csv" mode [\#18311](https://github.com/netdata/netdata/pull/18311) ([ilyam8](https://github.com/ilyam8)) - Regenerate integrations.js [\#18308](https://github.com/netdata/netdata/pull/18308) ([netdatabot](https://github.com/netdatabot)) - add go.d/exim [\#18306](https://github.com/netdata/netdata/pull/18306) ([ilyam8](https://github.com/ilyam8)) - remove python.d/exim [\#18305](https://github.com/netdata/netdata/pull/18305) ([ilyam8](https://github.com/ilyam8)) @@ -16,7 +26,9 @@ - remove python.d/nsd [\#18300](https://github.com/netdata/netdata/pull/18300) ([ilyam8](https://github.com/ilyam8)) - Regenerate integrations.js [\#18299](https://github.com/netdata/netdata/pull/18299) ([netdatabot](https://github.com/netdatabot)) - go.d gearman fix meta [\#18298](https://github.com/netdata/netdata/pull/18298) ([ilyam8](https://github.com/ilyam8)) +- Handle GOROOT inside build system instead of outside. [\#18296](https://github.com/netdata/netdata/pull/18296) ([Ferroin](https://github.com/Ferroin)) - add go.d/gearman [\#18294](https://github.com/netdata/netdata/pull/18294) ([ilyam8](https://github.com/ilyam8)) +- Use system certificate configuration for Yum/DNF repos. [\#18293](https://github.com/netdata/netdata/pull/18293) ([Ferroin](https://github.com/Ferroin)) - Regenerate integrations.js [\#18292](https://github.com/netdata/netdata/pull/18292) ([netdatabot](https://github.com/netdatabot)) - remove python.d/gearman [\#18291](https://github.com/netdata/netdata/pull/18291) ([ilyam8](https://github.com/ilyam8)) - remove python.d/alarms [\#18290](https://github.com/netdata/netdata/pull/18290) ([ilyam8](https://github.com/ilyam8)) @@ -403,18 +415,6 @@ - add go.d clickhouse [\#17743](https://github.com/netdata/netdata/pull/17743) ([ilyam8](https://github.com/ilyam8)) - fix clickhouse in apps groups [\#17742](https://github.com/netdata/netdata/pull/17742) ([ilyam8](https://github.com/ilyam8)) - fix ebpf cgroup swap context [\#17740](https://github.com/netdata/netdata/pull/17740) ([ilyam8](https://github.com/ilyam8)) -- Update netdata-agent-security.md [\#17738](https://github.com/netdata/netdata/pull/17738) ([Ancairon](https://github.com/Ancairon)) -- Collecting metrics docs grammar pass [\#17736](https://github.com/netdata/netdata/pull/17736) ([Ancairon](https://github.com/Ancairon)) -- Grammar pass on docs [\#17735](https://github.com/netdata/netdata/pull/17735) ([Ancairon](https://github.com/Ancairon)) -- eBPF OOMKills adjust and fixes. [\#17734](https://github.com/netdata/netdata/pull/17734) ([thiagoftsm](https://github.com/thiagoftsm)) -- Ensure that the choice of compiler and target is passed to sub-projects. [\#17732](https://github.com/netdata/netdata/pull/17732) ([Ferroin](https://github.com/Ferroin)) -- Include the Host in the HTTP header \(mqtt\) [\#17731](https://github.com/netdata/netdata/pull/17731) ([stelfrag](https://github.com/stelfrag)) -- Add alert meta info [\#17730](https://github.com/netdata/netdata/pull/17730) ([stelfrag](https://github.com/stelfrag)) -- grammar pass on alerts and notifications dir [\#17729](https://github.com/netdata/netdata/pull/17729) ([Ancairon](https://github.com/Ancairon)) -- Regenerate integrations.js [\#17726](https://github.com/netdata/netdata/pull/17726) ([netdatabot](https://github.com/netdatabot)) -- go.d systemdunits add "skip\_transient" [\#17725](https://github.com/netdata/netdata/pull/17725) ([ilyam8](https://github.com/ilyam8)) -- minor fix on link [\#17722](https://github.com/netdata/netdata/pull/17722) ([Ancairon](https://github.com/Ancairon)) -- Regenerate integrations.js [\#17721](https://github.com/netdata/netdata/pull/17721) ([netdatabot](https://github.com/netdatabot)) ## [v1.45.6](https://github.com/netdata/netdata/tree/v1.45.6) (2024-06-05) diff --git a/packaging/version b/packaging/version index 5963b73da04264..ad4424a0396e12 100644 --- a/packaging/version +++ b/packaging/version @@ -1 +1 @@ -v1.46.0-282-nightly +v1.46.0-295-nightly From 5d0ce4c21b3e241968072b3371f6f439bc999617 Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Tue, 13 Aug 2024 19:53:18 +0300 Subject: [PATCH 24/27] remove python.d/uwsgi (#18325) --- CMakeLists.txt | 2 - .../python.d.plugin/uwsgi/README.md | 1 - .../uwsgi/integrations/uwsgi.md | 252 ------------------ .../python.d.plugin/uwsgi/metadata.yaml | 201 -------------- .../python.d.plugin/uwsgi/uwsgi.chart.py | 177 ------------ .../python.d.plugin/uwsgi/uwsgi.conf | 92 ------- 6 files changed, 725 deletions(-) delete mode 120000 src/collectors/python.d.plugin/uwsgi/README.md delete mode 100644 src/collectors/python.d.plugin/uwsgi/integrations/uwsgi.md delete mode 100644 src/collectors/python.d.plugin/uwsgi/metadata.yaml delete mode 100644 src/collectors/python.d.plugin/uwsgi/uwsgi.chart.py delete mode 100644 src/collectors/python.d.plugin/uwsgi/uwsgi.conf diff --git a/CMakeLists.txt b/CMakeLists.txt index 5bd2ab0f93051b..ac96d89827633b 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -2793,7 +2793,6 @@ install(FILES src/collectors/python.d.plugin/spigotmc/spigotmc.conf src/collectors/python.d.plugin/tor/tor.conf src/collectors/python.d.plugin/traefik/traefik.conf - src/collectors/python.d.plugin/uwsgi/uwsgi.conf src/collectors/python.d.plugin/varnish/varnish.conf src/collectors/python.d.plugin/w1sensor/w1sensor.conf src/collectors/python.d.plugin/zscores/zscores.conf @@ -2819,7 +2818,6 @@ install(FILES src/collectors/python.d.plugin/spigotmc/spigotmc.chart.py src/collectors/python.d.plugin/tor/tor.chart.py src/collectors/python.d.plugin/traefik/traefik.chart.py - src/collectors/python.d.plugin/uwsgi/uwsgi.chart.py src/collectors/python.d.plugin/varnish/varnish.chart.py src/collectors/python.d.plugin/w1sensor/w1sensor.chart.py src/collectors/python.d.plugin/zscores/zscores.chart.py diff --git a/src/collectors/python.d.plugin/uwsgi/README.md b/src/collectors/python.d.plugin/uwsgi/README.md deleted file mode 120000 index 44b8559492a874..00000000000000 --- a/src/collectors/python.d.plugin/uwsgi/README.md +++ /dev/null @@ -1 +0,0 @@ -integrations/uwsgi.md \ No newline at end of file diff --git a/src/collectors/python.d.plugin/uwsgi/integrations/uwsgi.md b/src/collectors/python.d.plugin/uwsgi/integrations/uwsgi.md deleted file mode 100644 index f5a8903e9b9590..00000000000000 --- a/src/collectors/python.d.plugin/uwsgi/integrations/uwsgi.md +++ /dev/null @@ -1,252 +0,0 @@ - - -# uWSGI - - - - - -Plugin: python.d.plugin -Module: uwsgi - - - -## Overview - -This collector monitors uWSGI metrics about requests, workers, memory and more. - -It collects every metric exposed from the stats server of uWSGI, either from the `stats.socket` or from the web server's TCP/IP socket. - -This collector is supported on all platforms. - -This collector supports collecting metrics from multiple instances of this integration, including remote instances. - - -### Default Behavior - -#### Auto-Detection - -This collector will auto-detect uWSGI instances deployed on the local host, running on port 1717, or exposing stats on socket `tmp/stats.socket`. - -#### Limits - -The default configuration for this integration does not impose any limits on data collection. - -#### Performance Impact - -The default configuration for this integration is not expected to impose a significant performance impact on the system. - - -## Metrics - -Metrics grouped by *scope*. - -The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. - - - -### Per uWSGI instance - -These metrics refer to the entire monitored application. - -This scope has no labels. - -Metrics: - -| Metric | Dimensions | Unit | -|:------|:----------|:----| -| uwsgi.requests | a dimension per worker | requests/s | -| uwsgi.tx | a dimension per worker | KiB/s | -| uwsgi.avg_rt | a dimension per worker | milliseconds | -| uwsgi.memory_rss | a dimension per worker | MiB | -| uwsgi.memory_vsz | a dimension per worker | MiB | -| uwsgi.exceptions | exceptions | exceptions | -| uwsgi.harakiris | harakiris | harakiris | -| uwsgi.respawns | respawns | respawns | - - - -## Alerts - -There are no alerts configured by default for this integration. - - -## Setup - -### Prerequisites - -#### Enable the uWSGI Stats server - -Make sure that you uWSGI exposes it's metrics via a Stats server. - -Source: https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html - - - -### Configuration - -#### File - -The configuration file name for this integration is `python.d/uwsgi.conf`. - - -You can edit the configuration file using the `edit-config` script from the -Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). - -```bash -cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata -sudo ./edit-config python.d/uwsgi.conf -``` -#### Options - -There are 2 sections: - -* Global variables -* One or more JOBS that can define multiple different instances to monitor. - -The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. - -Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. - -Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. - - -
Config options - -| Name | Description | Default | Required | -|:----|:-----------|:-------|:--------:| -| update_every | Sets the default data collection frequency. | 5 | no | -| priority | Controls the order of charts at the netdata dashboard. | 60000 | no | -| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no | -| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no | -| name | The JOB's name as it will appear at the dashboard (by default is the job_name) | job_name | no | -| socket | The 'path/to/uwsgistats.sock' | no | no | -| host | The host to connect to | no | no | -| port | The port to connect to | no | no | - -
- -#### Examples - -##### Basic (default out-of-the-box) - -A basic example configuration, one job will run at a time. Autodetect mechanism uses it by default. As all JOBs have the same name, only one can run at a time. - -
Config - -```yaml -socket: - name : 'local' - socket : '/tmp/stats.socket' - -localhost: - name : 'local' - host : 'localhost' - port : 1717 - -localipv4: - name : 'local' - host : '127.0.0.1' - port : 1717 - -localipv6: - name : 'local' - host : '::1' - port : 1717 - -``` -
- -##### Multi-instance - -> **Note**: When you define multiple jobs, their names must be unique. - -Collecting metrics from local and remote instances. - - -
Config - -```yaml -local: - name : 'local' - host : 'localhost' - port : 1717 - -remote: - name : 'remote' - host : '192.0.2.1' - port : 1717 - -``` -
- - - -## Troubleshooting - -### Debug Mode - -To troubleshoot issues with the `uwsgi` collector, run the `python.d.plugin` with the debug option enabled. The output -should give you clues as to why the collector isn't working. - -- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on - your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. - - ```bash - cd /usr/libexec/netdata/plugins.d/ - ``` - -- Switch to the `netdata` user. - - ```bash - sudo -u netdata -s - ``` - -- Run the `python.d.plugin` to debug the collector: - - ```bash - ./python.d.plugin uwsgi debug trace - ``` - -### Getting Logs - -If you're encountering problems with the `uwsgi` collector, follow these steps to retrieve logs and identify potential issues: - -- **Run the command** specific to your system (systemd, non-systemd, or Docker container). -- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem. - -#### System with systemd - -Use the following command to view logs generated since the last Netdata service restart: - -```bash -journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep uwsgi -``` - -#### System without systemd - -Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name: - -```bash -grep uwsgi /var/log/netdata/collector.log -``` - -**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues. - -#### Docker Container - -If your Netdata runs in a Docker container named "netdata" (replace if different), use this command: - -```bash -docker logs netdata 2>&1 | grep uwsgi -``` - - diff --git a/src/collectors/python.d.plugin/uwsgi/metadata.yaml b/src/collectors/python.d.plugin/uwsgi/metadata.yaml deleted file mode 100644 index cdb090ac1f5b03..00000000000000 --- a/src/collectors/python.d.plugin/uwsgi/metadata.yaml +++ /dev/null @@ -1,201 +0,0 @@ -plugin_name: python.d.plugin -modules: - - meta: - plugin_name: python.d.plugin - module_name: uwsgi - monitored_instance: - name: uWSGI - link: "https://github.com/unbit/uwsgi/tree/2.0.21" - categories: - - data-collection.web-servers-and-web-proxies - icon_filename: "uwsgi.svg" - related_resources: - integrations: - list: [] - info_provided_to_referring_integrations: - description: "" - keywords: - - application server - - python - - web applications - most_popular: false - overview: - data_collection: - metrics_description: "This collector monitors uWSGI metrics about requests, workers, memory and more." - method_description: "It collects every metric exposed from the stats server of uWSGI, either from the `stats.socket` or from the web server's TCP/IP socket." - supported_platforms: - include: [] - exclude: [] - multi_instance: true - additional_permissions: - description: "" - default_behavior: - auto_detection: - description: "This collector will auto-detect uWSGI instances deployed on the local host, running on port 1717, or exposing stats on socket `tmp/stats.socket`." - limits: - description: "" - performance_impact: - description: "" - setup: - prerequisites: - list: - - title: Enable the uWSGI Stats server - description: | - Make sure that you uWSGI exposes it's metrics via a Stats server. - - Source: https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html - configuration: - file: - name: "python.d/uwsgi.conf" - options: - description: | - There are 2 sections: - - * Global variables - * One or more JOBS that can define multiple different instances to monitor. - - The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. - - Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. - - Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. - folding: - title: "Config options" - enabled: true - list: - - name: update_every - description: Sets the default data collection frequency. - default_value: 5 - required: false - - name: priority - description: Controls the order of charts at the netdata dashboard. - default_value: 60000 - required: false - - name: autodetection_retry - description: Sets the job re-check interval in seconds. - default_value: 0 - required: false - - name: penalty - description: Indicates whether to apply penalty to update_every in case of failures. - default_value: yes - required: false - - name: name - description: The JOB's name as it will appear at the dashboard (by default is the job_name) - default_value: job_name - required: false - - name: socket - description: The 'path/to/uwsgistats.sock' - default_value: no - required: false - - name: host - description: The host to connect to - default_value: no - required: false - - name: port - description: The port to connect to - default_value: no - required: false - examples: - folding: - enabled: true - title: "Config" - list: - - name: Basic (default out-of-the-box) - description: A basic example configuration, one job will run at a time. Autodetect mechanism uses it by default. As all JOBs have the same name, only one can run at a time. - config: | - socket: - name : 'local' - socket : '/tmp/stats.socket' - - localhost: - name : 'local' - host : 'localhost' - port : 1717 - - localipv4: - name : 'local' - host : '127.0.0.1' - port : 1717 - - localipv6: - name : 'local' - host : '::1' - port : 1717 - - name: Multi-instance - description: | - > **Note**: When you define multiple jobs, their names must be unique. - - Collecting metrics from local and remote instances. - config: | - local: - name : 'local' - host : 'localhost' - port : 1717 - - remote: - name : 'remote' - host : '192.0.2.1' - port : 1717 - troubleshooting: - problems: - list: [] - alerts: [] - metrics: - folding: - title: Metrics - enabled: false - description: "" - availability: [] - scopes: - - name: global - description: "These metrics refer to the entire monitored application." - labels: [] - metrics: - - name: uwsgi.requests - description: Requests - unit: "requests/s" - chart_type: stacked - dimensions: - - name: a dimension per worker - - name: uwsgi.tx - description: Transmitted data - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per worker - - name: uwsgi.avg_rt - description: Average request time - unit: "milliseconds" - chart_type: line - dimensions: - - name: a dimension per worker - - name: uwsgi.memory_rss - description: RSS (Resident Set Size) - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per worker - - name: uwsgi.memory_vsz - description: VSZ (Virtual Memory Size) - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per worker - - name: uwsgi.exceptions - description: Exceptions - unit: "exceptions" - chart_type: line - dimensions: - - name: exceptions - - name: uwsgi.harakiris - description: Harakiris - unit: "harakiris" - chart_type: line - dimensions: - - name: harakiris - - name: uwsgi.respawns - description: Respawns - unit: "respawns" - chart_type: line - dimensions: - - name: respawns diff --git a/src/collectors/python.d.plugin/uwsgi/uwsgi.chart.py b/src/collectors/python.d.plugin/uwsgi/uwsgi.chart.py deleted file mode 100644 index e4d9000054fd0e..00000000000000 --- a/src/collectors/python.d.plugin/uwsgi/uwsgi.chart.py +++ /dev/null @@ -1,177 +0,0 @@ -# -*- coding: utf-8 -*- -# Description: uwsgi netdata python.d module -# Author: Robbert Segeren (robbert-ef) -# SPDX-License-Identifier: GPL-3.0-or-later - -import json -from copy import deepcopy - -from bases.FrameworkServices.SocketService import SocketService - -ORDER = [ - 'requests', - 'tx', - 'avg_rt', - 'memory_rss', - 'memory_vsz', - 'exceptions', - 'harakiri', - 'respawn', -] - -DYNAMIC_CHARTS = [ - 'requests', - 'tx', - 'avg_rt', - 'memory_rss', - 'memory_vsz', -] - -# NOTE: lines are created dynamically in `check()` method -CHARTS = { - 'requests': { - 'options': [None, 'Requests', 'requests/s', 'requests', 'uwsgi.requests', 'stacked'], - 'lines': [ - ['requests', 'requests', 'incremental'] - ] - }, - 'tx': { - 'options': [None, 'Transmitted data', 'KiB/s', 'requests', 'uwsgi.tx', 'stacked'], - 'lines': [ - ['tx', 'tx', 'incremental'] - ] - }, - 'avg_rt': { - 'options': [None, 'Average request time', 'milliseconds', 'requests', 'uwsgi.avg_rt', 'line'], - 'lines': [ - ['avg_rt', 'avg_rt', 'absolute'] - ] - }, - 'memory_rss': { - 'options': [None, 'RSS (Resident Set Size)', 'MiB', 'memory', 'uwsgi.memory_rss', 'stacked'], - 'lines': [ - ['memory_rss', 'memory_rss', 'absolute', 1, 1 << 20] - ] - }, - 'memory_vsz': { - 'options': [None, 'VSZ (Virtual Memory Size)', 'MiB', 'memory', 'uwsgi.memory_vsz', 'stacked'], - 'lines': [ - ['memory_vsz', 'memory_vsz', 'absolute', 1, 1 << 20] - ] - }, - 'exceptions': { - 'options': [None, 'Exceptions', 'exceptions', 'exceptions', 'uwsgi.exceptions', 'line'], - 'lines': [ - ['exceptions', 'exceptions', 'incremental'] - ] - }, - 'harakiri': { - 'options': [None, 'Harakiris', 'harakiris', 'harakiris', 'uwsgi.harakiris', 'line'], - 'lines': [ - ['harakiri_count', 'harakiris', 'incremental'] - ] - }, - 'respawn': { - 'options': [None, 'Respawns', 'respawns', 'respawns', 'uwsgi.respawns', 'line'], - 'lines': [ - ['respawn_count', 'respawns', 'incremental'] - ] - }, -} - - -class Service(SocketService): - def __init__(self, configuration=None, name=None): - super(Service, self).__init__(configuration=configuration, name=name) - self.order = ORDER - self.definitions = deepcopy(CHARTS) - self.url = self.configuration.get('host', 'localhost') - self.port = self.configuration.get('port', 1717) - # Clear dynamic dimensions, these are added during `_get_data()` to allow adding workers at run-time - for chart in DYNAMIC_CHARTS: - self.definitions[chart]['lines'] = [] - self.last_result = {} - self.workers = [] - - def read_data(self): - """ - Read data from socket and parse as JSON. - :return: (dict) stats - """ - raw_data = self._get_raw_data() - if not raw_data: - return None - try: - return json.loads(raw_data) - except ValueError as err: - self.error(err) - return None - - def check(self): - """ - Parse configuration and check if we can read data. - :return: boolean - """ - self._parse_config() - return bool(self.read_data()) - - def add_worker_dimensions(self, key): - """ - Helper to add dimensions for a worker. - :param key: (int or str) worker identifier - :return: - """ - for chart in DYNAMIC_CHARTS: - for line in CHARTS[chart]['lines']: - dimension_id = '{}_{}'.format(line[0], key) - dimension_name = str(key) - - dimension = [dimension_id, dimension_name] + line[2:] - self.charts[chart].add_dimension(dimension) - - @staticmethod - def _check_raw_data(data): - # The server will close the connection when it's done sending - # data, so just keep looping until that happens. - return False - - def _get_data(self): - """ - Read data from socket - :return: dict - """ - stats = self.read_data() - if not stats: - return None - - result = { - 'exceptions': 0, - 'harakiri_count': 0, - 'respawn_count': 0, - } - - for worker in stats['workers']: - key = worker['pid'] - - # Add dimensions for new workers - if key not in self.workers: - self.add_worker_dimensions(key) - self.workers.append(key) - - result['requests_{}'.format(key)] = worker['requests'] - result['tx_{}'.format(key)] = worker['tx'] - result['avg_rt_{}'.format(key)] = worker['avg_rt'] - - # avg_rt is not reset by uwsgi, so reset here - if self.last_result.get('requests_{}'.format(key)) == worker['requests']: - result['avg_rt_{}'.format(key)] = 0 - - result['memory_rss_{}'.format(key)] = worker['rss'] - result['memory_vsz_{}'.format(key)] = worker['vsz'] - - result['exceptions'] += worker['exceptions'] - result['harakiri_count'] += worker['harakiri_count'] - result['respawn_count'] += worker['respawn_count'] - - self.last_result = result - return result diff --git a/src/collectors/python.d.plugin/uwsgi/uwsgi.conf b/src/collectors/python.d.plugin/uwsgi/uwsgi.conf deleted file mode 100644 index 7d09e7330190e9..00000000000000 --- a/src/collectors/python.d.plugin/uwsgi/uwsgi.conf +++ /dev/null @@ -1,92 +0,0 @@ -# netdata python.d.plugin configuration for uwsgi -# -# This file is in YaML format. Generally the format is: -# -# name: value -# -# There are 2 sections: -# - global variables -# - one or more JOBS -# -# JOBS allow you to collect values from multiple sources. -# Each source will have its own set of charts. -# -# JOB parameters have to be indented (using spaces only, example below). - -# ---------------------------------------------------------------------- -# Global Variables -# These variables set the defaults for all JOBs, however each JOB -# may define its own, overriding the defaults. - -# update_every sets the default data collection frequency. -# If unset, the python.d.plugin default is used. -# update_every: 1 - -# priority controls the order of charts at the netdata dashboard. -# Lower numbers move the charts towards the top of the page. -# If unset, the default for python.d.plugin is used. -# priority: 60000 - -# penalty indicates whether to apply penalty to update_every in case of failures. -# Penalty will increase every 5 failed updates in a row. Maximum penalty is 10 minutes. -# penalty: yes - -# autodetection_retry sets the job re-check interval in seconds. -# The job is not deleted if check fails. -# Attempts to start the job are made once every autodetection_retry. -# This feature is disabled by default. -# autodetection_retry: 0 - -# ---------------------------------------------------------------------- -# JOBS (data collection sources) -# -# The default JOBS share the same *name*. JOBS with the same name -# are mutually exclusive. Only one of them will be allowed running at -# any time. This allows autodetection to try several alternatives and -# pick the one that works. -# -# Any number of jobs is supported. -# -# All python.d.plugin JOBS (for all its modules) support a set of -# predefined parameters. These are: -# -# job_name: -# name: myname # the JOB's name as it will appear at the -# # dashboard (by default is the job_name) -# # JOBs sharing a name are mutually exclusive -# update_every: 1 # the JOB's data collection frequency -# priority: 60000 # the JOB's order on the dashboard -# penalty: yes # the JOB's penalty -# autodetection_retry: 0 # the JOB's re-check interval in seconds -# -# Additionally to the above, uwsgi also supports the following: -# -# socket: 'path/to/uwsgistats.sock' -# -# or -# host: 'IP or HOSTNAME' # the host to connect to -# port: PORT # the port to connect to -# -# ---------------------------------------------------------------------- -# AUTO-DETECTION JOBS -# only one of them will run (they have the same name) -# - -socket: - name : 'local' - socket : '/tmp/stats.socket' - -localhost: - name : 'local' - host : 'localhost' - port : 1717 - -localipv4: - name : 'local' - host : '127.0.0.1' - port : 1717 - -localipv6: - name : 'local' - host : '::1' - port : 1717 From 3527c0a93abe4f52b22755af798b1c847bd68c83 Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Tue, 13 Aug 2024 20:16:32 +0300 Subject: [PATCH 25/27] add go.d/uwsgi (#18326) --- src/collectors/python.d.plugin/python.d.conf | 2 +- src/go/plugin/go.d/config/go.d.conf | 1 + .../go.d/config/go.d/sd/net_listeners.conf | 7 + src/go/plugin/go.d/config/go.d/uwsgi.conf | 6 + src/go/plugin/go.d/modules/init.go | 1 + src/go/plugin/go.d/modules/uwsgi/charts.go | 275 +++++++++++++++ src/go/plugin/go.d/modules/uwsgi/client.go | 64 ++++ src/go/plugin/go.d/modules/uwsgi/collect.go | 128 +++++++ .../go.d/modules/uwsgi/config_schema.json | 44 +++ src/go/plugin/go.d/modules/uwsgi/init.go | 3 + .../plugin/go.d/modules/uwsgi/metadata.yaml | 215 ++++++++++++ .../go.d/modules/uwsgi/testdata/config.json | 5 + .../go.d/modules/uwsgi/testdata/config.yaml | 3 + .../go.d/modules/uwsgi/testdata/stats.json | 117 +++++++ .../uwsgi/testdata/stats_no_workers.json | 49 +++ src/go/plugin/go.d/modules/uwsgi/uwsgi.go | 98 ++++++ .../plugin/go.d/modules/uwsgi/uwsgi_test.go | 325 ++++++++++++++++++ 17 files changed, 1342 insertions(+), 1 deletion(-) create mode 100644 src/go/plugin/go.d/config/go.d/uwsgi.conf create mode 100644 src/go/plugin/go.d/modules/uwsgi/charts.go create mode 100644 src/go/plugin/go.d/modules/uwsgi/client.go create mode 100644 src/go/plugin/go.d/modules/uwsgi/collect.go create mode 100644 src/go/plugin/go.d/modules/uwsgi/config_schema.json create mode 100644 src/go/plugin/go.d/modules/uwsgi/init.go create mode 100644 src/go/plugin/go.d/modules/uwsgi/metadata.yaml create mode 100644 src/go/plugin/go.d/modules/uwsgi/testdata/config.json create mode 100644 src/go/plugin/go.d/modules/uwsgi/testdata/config.yaml create mode 100644 src/go/plugin/go.d/modules/uwsgi/testdata/stats.json create mode 100644 src/go/plugin/go.d/modules/uwsgi/testdata/stats_no_workers.json create mode 100644 src/go/plugin/go.d/modules/uwsgi/uwsgi.go create mode 100644 src/go/plugin/go.d/modules/uwsgi/uwsgi_test.go diff --git a/src/collectors/python.d.plugin/python.d.conf b/src/collectors/python.d.plugin/python.d.conf index 252247511e9c82..bcd95481851ab2 100644 --- a/src/collectors/python.d.plugin/python.d.conf +++ b/src/collectors/python.d.plugin/python.d.conf @@ -45,7 +45,6 @@ go_expvar: no # spigotmc: yes # traefik: yes # tor: yes -# uwsgi: yes # varnish: yes # w1sensor: yes # zscores: no @@ -82,3 +81,4 @@ sensors: no # Removed (replaced with go.d/sensors). squid: no # Removed (replaced with go.d/squid). tomcat: no # Removed (replaced with go.d/tomcat) puppet: no # Removed (replaced with go.d/puppet). +uwsgi: no # Removed (replaced with go.d/uwsgi). diff --git a/src/go/plugin/go.d/config/go.d.conf b/src/go/plugin/go.d/config/go.d.conf index c0b2ed2be90c13..895765107d04ce 100644 --- a/src/go/plugin/go.d/config/go.d.conf +++ b/src/go/plugin/go.d/config/go.d.conf @@ -107,6 +107,7 @@ modules: # traefik: yes # upsd: yes # unbound: yes +# uwsgi: yes # vernemq: yes # vcsa: yes # vsphere: yes diff --git a/src/go/plugin/go.d/config/go.d/sd/net_listeners.conf b/src/go/plugin/go.d/config/go.d/sd/net_listeners.conf index e41840d97f1a6b..9634151bf25f77 100644 --- a/src/go/plugin/go.d/config/go.d/sd/net_listeners.conf +++ b/src/go/plugin/go.d/config/go.d/sd/net_listeners.conf @@ -124,6 +124,8 @@ classify: expr: '{{ and (eq .Port "8953") (eq .Comm "unbound") }}' - tags: "upsd" expr: '{{ or (eq .Port "3493") (eq .Comm "upsd") }}' + - tags: "uwsgi" + expr: '{{ and (eq .Port "1717") (eq .Comm "uwsgi") }}' - tags: "vernemq" expr: '{{ and (eq .Port "8888") (glob .Cmdline "*vernemq*") }}' - tags: "zookeeper" @@ -469,6 +471,11 @@ compose: module: upsd name: local address: {{.Address}} + - selector: "uwsgi" + template: | + module: uwsgi + name: local + address: {{.Address}} - selector: "vernemq" template: | module: vernemq diff --git a/src/go/plugin/go.d/config/go.d/uwsgi.conf b/src/go/plugin/go.d/config/go.d/uwsgi.conf new file mode 100644 index 00000000000000..f3189180479612 --- /dev/null +++ b/src/go/plugin/go.d/config/go.d/uwsgi.conf @@ -0,0 +1,6 @@ +## All available configuration options, their descriptions and default values: +## https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/modules/uwsgi#readme + +#jobs: +# - name: local +# address: 127.0.0.1:1717 diff --git a/src/go/plugin/go.d/modules/init.go b/src/go/plugin/go.d/modules/init.go index 73f98f6e6c31bc..386411746c5691 100644 --- a/src/go/plugin/go.d/modules/init.go +++ b/src/go/plugin/go.d/modules/init.go @@ -99,6 +99,7 @@ import ( _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/traefik" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/unbound" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/upsd" + _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/uwsgi" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/vcsa" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/vernemq" _ "github.com/netdata/netdata/go/plugins/plugin/go.d/modules/vsphere" diff --git a/src/go/plugin/go.d/modules/uwsgi/charts.go b/src/go/plugin/go.d/modules/uwsgi/charts.go new file mode 100644 index 00000000000000..d79b3938b412af --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/charts.go @@ -0,0 +1,275 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package uwsgi + +import ( + "fmt" + "strconv" + "strings" + + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" +) + +const ( + prioTransmittedData = module.Priority + iota + prioRequests + prioHarakiris + prioExceptions + prioRespawns + + prioWorkerTransmittedData + prioWorkerRequests + prioWorkerDeltaRequests + prioWorkerAvgRequestTime + prioWorkerHarakiris + prioWorkerExceptions + prioWorkerStatus + prioWorkerRequestHandlingStatus + prioWorkerRespawns + prioWorkerMemoryRss + prioWorkerMemoryVsz +) + +var charts = module.Charts{ + transmittedDataChart.Copy(), + requestsChart.Copy(), + harakirisChart.Copy(), + exceptionsChart.Copy(), + respawnsChart.Copy(), +} + +var ( + transmittedDataChart = module.Chart{ + ID: "transmitted_data", + Title: "UWSGI Transmitted Data", + Units: "bytes/s", + Fam: "workers", + Ctx: "uwsgi.transmitted_data", + Priority: prioTransmittedData, + Type: module.Area, + Dims: module.Dims{ + {ID: "workers_tx", Name: "tx", Algo: module.Incremental}, + }, + } + requestsChart = module.Chart{ + ID: "requests", + Title: "UWSGI Requests", + Units: "requests/s", + Fam: "workers", + Ctx: "uwsgi.requests", + Priority: prioRequests, + Dims: module.Dims{ + {ID: "workers_requests", Name: "requests", Algo: module.Incremental}, + }, + } + harakirisChart = module.Chart{ + ID: "harakiris", + Title: "UWSGI Dropped Requests", + Units: "harakiris/s", + Fam: "workers", + Ctx: "uwsgi.harakiris", + Priority: prioHarakiris, + Dims: module.Dims{ + {ID: "workers_harakiris", Name: "harakiris", Algo: module.Incremental}, + }, + } + exceptionsChart = module.Chart{ + ID: "exceptions", + Title: "UWSGI Raised Exceptions", + Units: "exceptions/s", + Fam: "workers", + Ctx: "uwsgi.exceptions", + Priority: prioExceptions, + Dims: module.Dims{ + {ID: "workers_exceptions", Name: "exceptions", Algo: module.Incremental}, + }, + } + respawnsChart = module.Chart{ + ID: "respawns", + Title: "UWSGI Respawns", + Units: "respawns/s", + Fam: "workers", + Ctx: "uwsgi.respawns", + Priority: prioRespawns, + Dims: module.Dims{ + {ID: "workers_respawns", Name: "respawns", Algo: module.Incremental}, + }, + } +) + +var workerChartsTmpl = module.Charts{ + workerTransmittedDataChartTmpl.Copy(), + workerRequestsChartTmpl.Copy(), + workerDeltaRequestsChartTmpl.Copy(), + workerAvgRequestTimeChartTmpl.Copy(), + workerHarakirisChartTmpl.Copy(), + workerExceptionsChartTmpl.Copy(), + workerStatusChartTmpl.Copy(), + workerRequestHandlingStatusChartTmpl.Copy(), + workerRespawnsChartTmpl.Copy(), + workerMemoryRssChartTmpl.Copy(), + workerMemoryVszChartTmpl.Copy(), +} + +var ( + workerTransmittedDataChartTmpl = module.Chart{ + ID: "worker_%s_transmitted_data", + Title: "UWSGI Worker Transmitted Data", + Units: "bytes/s", + Fam: "wrk transmitted data", + Ctx: "uwsgi.worker_transmitted_data", + Priority: prioWorkerTransmittedData, + Type: module.Area, + Dims: module.Dims{ + {ID: "worker_%s_tx", Name: "tx", Algo: module.Incremental}, + }, + } + workerRequestsChartTmpl = module.Chart{ + ID: "worker_%s_requests", + Title: "UWSGI Worker Requests", + Units: "requests/s", + Fam: "wrk requests", + Ctx: "uwsgi.worker_requests", + Priority: prioWorkerRequests, + Dims: module.Dims{ + {ID: "worker_%s_requests", Name: "requests", Algo: module.Incremental}, + }, + } + workerDeltaRequestsChartTmpl = module.Chart{ + ID: "worker_%s_delta_requests", + Title: "UWSGI Worker Delta Requests", + Units: "requests/s", + Fam: "wrk requests", + Ctx: "uwsgi.worker_delta_requests", + Priority: prioWorkerDeltaRequests, + Dims: module.Dims{ + {ID: "worker_%s_delta_requests", Name: "delta_requests", Algo: module.Incremental}, + }, + } + workerAvgRequestTimeChartTmpl = module.Chart{ + ID: "worker_%s_average_request_time", + Title: "UWSGI Worker Average Request Time", + Units: "milliseconds", + Fam: "wrk request time", + Ctx: "uwsgi.worker_average_request_time", + Priority: prioWorkerAvgRequestTime, + Dims: module.Dims{ + {ID: "worker_%s_average_request_time", Name: "avg"}, + }, + } + workerHarakirisChartTmpl = module.Chart{ + ID: "worker_%s_harakiris", + Title: "UWSGI Worker Dropped Requests", + Units: "harakiris/s", + Fam: "wrk harakiris", + Ctx: "uwsgi.worker_harakiris", + Priority: prioWorkerHarakiris, + Dims: module.Dims{ + {ID: "worker_%s_harakiris", Name: "harakiris", Algo: module.Incremental}, + }, + } + workerExceptionsChartTmpl = module.Chart{ + ID: "worker_%s_exceptions", + Title: "UWSGI Worker Raised Exceptions", + Units: "exceptions/s", + Fam: "wrk exceptions", + Ctx: "uwsgi.worker_exceptions", + Priority: prioWorkerExceptions, + Dims: module.Dims{ + {ID: "worker_%s_exceptions", Name: "exceptions", Algo: module.Incremental}, + }, + } + workerStatusChartTmpl = module.Chart{ + ID: "worker_%s_status", + Title: "UWSGI Worker Status", + Units: "status", + Fam: "wrk status", + Ctx: "uwsgi.status", + Priority: prioWorkerStatus, + Dims: module.Dims{ + {ID: "worker_%s_status_idle", Name: "idle"}, + {ID: "worker_%s_status_busy", Name: "busy"}, + {ID: "worker_%s_status_cheap", Name: "cheap"}, + {ID: "worker_%s_status_pause", Name: "pause"}, + {ID: "worker_%s_status_sig", Name: "sig"}, + }, + } + workerRequestHandlingStatusChartTmpl = module.Chart{ + ID: "worker_%s_request_handling_status", + Title: "UWSGI Worker Request Handling Status", + Units: "status", + Fam: "wrk status", + Ctx: "uwsgi.request_handling_status", + Priority: prioWorkerRequestHandlingStatus, + Dims: module.Dims{ + {ID: "worker_%s_request_handling_status_accepting", Name: "accepting"}, + {ID: "worker_%s_request_handling_status_not_accepting", Name: "not_accepting"}, + }, + } + workerRespawnsChartTmpl = module.Chart{ + ID: "worker_%s_respawns", + Title: "UWSGI Worker Respawns", + Units: "respawns/s", + Fam: "wrk respawns", + Ctx: "uwsgi.worker_respawns", + Priority: prioWorkerRespawns, + Dims: module.Dims{ + {ID: "worker_%s_respawns", Name: "respawns", Algo: module.Incremental}, + }, + } + workerMemoryRssChartTmpl = module.Chart{ + ID: "worker_%s_memory_rss", + Title: "UWSGI Worker Memory RSS (Resident Set Size)", + Units: "bytes", + Fam: "wrk memory", + Ctx: "uwsgi.worker_memory_rss", + Priority: prioWorkerMemoryRss, + Type: module.Area, + Dims: module.Dims{ + {ID: "worker_%s_memory_rss", Name: "rss"}, + }, + } + workerMemoryVszChartTmpl = module.Chart{ + ID: "worker_%s_memory_vsz", + Title: "UWSGI Worker Memory VSZ (Virtual Memory Size)", + Units: "bytes", + Fam: "wrk memory", + Ctx: "uwsgi.worker_memory_vsz", + Priority: prioWorkerMemoryVsz, + Type: module.Area, + Dims: module.Dims{ + {ID: "worker_%s_memory_vsz", Name: "vsz"}, + }, + } +) + +func (u *Uwsgi) addWorkerCharts(workerID int) { + charts := workerChartsTmpl.Copy() + + id := strconv.Itoa(workerID) + + for _, chart := range *charts { + chart.ID = fmt.Sprintf(chart.ID, id) + chart.Labels = []module.Label{ + {Key: "worker_id", Value: id}, + } + for _, dim := range chart.Dims { + dim.ID = fmt.Sprintf(dim.ID, id) + } + } + + if err := u.Charts().Add(*charts...); err != nil { + u.Warning(err) + } +} + +func (u *Uwsgi) removeWorkerCharts(workerID int) { + px := fmt.Sprintf("worker_%d_", workerID) + + for _, chart := range *u.Charts() { + if strings.HasPrefix(chart.ID, px) { + chart.MarkRemove() + chart.MarkNotCreated() + } + } +} diff --git a/src/go/plugin/go.d/modules/uwsgi/client.go b/src/go/plugin/go.d/modules/uwsgi/client.go new file mode 100644 index 00000000000000..4036807434c9e5 --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/client.go @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package uwsgi + +import ( + "bytes" + "fmt" + + "github.com/netdata/netdata/go/plugins/plugin/go.d/pkg/socket" +) + +type uwsgiConn interface { + connect() error + disconnect() + queryStats() ([]byte, error) +} + +func newUwsgiConn(conf Config) uwsgiConn { + return &uwsgiClient{conn: socket.New(socket.Config{ + Address: conf.Address, + ConnectTimeout: conf.Timeout.Duration(), + ReadTimeout: conf.Timeout.Duration(), + WriteTimeout: conf.Timeout.Duration(), + })} +} + +type uwsgiClient struct { + conn socket.Client +} + +func (c *uwsgiClient) connect() error { + return c.conn.Connect() +} + +func (c *uwsgiClient) disconnect() { + _ = c.conn.Disconnect() +} + +func (c *uwsgiClient) queryStats() ([]byte, error) { + var b bytes.Buffer + var n int64 + var err error + const readLineLimit = 1000 * 10 + + clientErr := c.conn.Command("", func(bs []byte) bool { + b.Write(bs) + b.WriteByte('\n') + + if n++; n >= readLineLimit { + err = fmt.Errorf("read line limit exceeded %d", readLineLimit) + return false + } + // The server will close the connection when it has finished sending data. + return true + }) + if clientErr != nil { + return nil, clientErr + } + if err != nil { + return nil, err + } + + return b.Bytes(), nil +} diff --git a/src/go/plugin/go.d/modules/uwsgi/collect.go b/src/go/plugin/go.d/modules/uwsgi/collect.go new file mode 100644 index 00000000000000..3f440535435e96 --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/collect.go @@ -0,0 +1,128 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package uwsgi + +import ( + "encoding/json" + "fmt" +) + +type statsResponse struct { + Workers []workerStats `json:"workers"` +} + +type workerStats struct { + ID int `json:"id"` + Accepting int64 `json:"accepting"` + Requests int64 `json:"requests"` + DeltaRequests int64 `json:"delta_requests"` + Exceptions int64 `json:"exceptions"` + HarakiriCount int64 `json:"harakiri_count"` + Status string `json:"status"` + RSS int64 `json:"rss"` + VSZ int64 `json:"vsz"` + RespawnCount int64 `json:"respawn_count"` + TX int64 `json:"tx"` + AvgRT int64 `json:"avg_rt"` +} + +func (u *Uwsgi) collect() (map[string]int64, error) { + conn, err := u.establishConn() + if err != nil { + return nil, fmt.Errorf("failed to connect: %v", err) + } + + defer conn.disconnect() + + stats, err := conn.queryStats() + if err != nil { + return nil, fmt.Errorf("failed to query stats: %v", err) + } + + mx := make(map[string]int64) + + if err := u.collectStats(mx, stats); err != nil { + return nil, err + } + + return mx, nil +} + +func (u *Uwsgi) collectStats(mx map[string]int64, stats []byte) error { + var resp statsResponse + if err := json.Unmarshal(stats, &resp); err != nil { + return fmt.Errorf("failed to json decode stats response: %v", err) + } + + // stats server returns an empty array if there are no workers + if resp.Workers == nil { + return fmt.Errorf("unexpected stats response: no workers found") + } + + seen := make(map[int]bool) + + mx["workers_tx"] = 0 + mx["workers_requests"] = 0 + mx["workers_harakiris"] = 0 + mx["workers_exceptions"] = 0 + mx["workers_respawns"] = 0 + + for _, w := range resp.Workers { + mx["workers_tx"] += w.TX + mx["workers_requests"] += w.Requests + mx["workers_harakiris"] += w.HarakiriCount + mx["workers_exceptions"] += w.Exceptions + mx["workers_respawns"] += w.RespawnCount + + seen[w.ID] = true + + if !u.seenWorkers[w.ID] { + u.seenWorkers[w.ID] = true + u.addWorkerCharts(w.ID) + } + + px := fmt.Sprintf("worker_%d_", w.ID) + + mx[px+"tx"] = w.TX + mx[px+"requests"] = w.Requests + mx[px+"delta_requests"] = w.DeltaRequests + mx[px+"average_request_time"] = w.AvgRT + mx[px+"harakiris"] = w.HarakiriCount + mx[px+"exceptions"] = w.Exceptions + mx[px+"respawns"] = w.RespawnCount + mx[px+"memory_rss"] = w.RSS + mx[px+"memory_vsz"] = w.VSZ + + for _, v := range []string{"idle", "busy", "cheap", "pause", "sig"} { + mx[px+"status_"+v] = boolToInt(w.Status == v) + } + mx[px+"request_handling_status_accepting"] = boolToInt(w.Accepting == 1) + mx[px+"request_handling_status_not_accepting"] = boolToInt(w.Accepting == 0) + } + + for id := range u.seenWorkers { + if !seen[id] { + delete(u.seenWorkers, id) + u.removeWorkerCharts(id) + } + } + + return nil +} + +func (u *Uwsgi) establishConn() (uwsgiConn, error) { + conn := u.newConn(u.Config) + + if err := conn.connect(); err != nil { + return nil, err + } + + return conn, nil +} + +func boolToInt(b bool) int64 { + if b { + return 1 + } + return 0 +} diff --git a/src/go/plugin/go.d/modules/uwsgi/config_schema.json b/src/go/plugin/go.d/modules/uwsgi/config_schema.json new file mode 100644 index 00000000000000..14c75043248020 --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/config_schema.json @@ -0,0 +1,44 @@ +{ + "jsonSchema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "UWSGI collector configuration.", + "type": "object", + "properties": { + "update_every": { + "title": "Update every", + "description": "Data collection interval, measured in seconds.", + "type": "integer", + "minimum": 1, + "default": 1 + }, + "address": { + "title": "Address", + "description": "The IP address and port where the UWSGI [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) listens for connections.", + "type": "string", + "default": "127.0.0.1:1717" + }, + "timeout": { + "title": "Timeout", + "description": "Timeout for establishing a connection and communication (reading and writing) in seconds.", + "type": "number", + "minimum": 0.5, + "default": 1 + } + }, + "required": [ + "address" + ], + "additionalProperties": false, + "patternProperties": { + "^name$": {} + } + }, + "uiSchema": { + "uiOptions": { + "fullPage": true + }, + "timeout": { + "ui:help": "Accepts decimals for precise control (e.g., type 1.5 for 1.5 seconds)." + } + } +} diff --git a/src/go/plugin/go.d/modules/uwsgi/init.go b/src/go/plugin/go.d/modules/uwsgi/init.go new file mode 100644 index 00000000000000..ab5999708b94db --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/init.go @@ -0,0 +1,3 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package uwsgi diff --git a/src/go/plugin/go.d/modules/uwsgi/metadata.yaml b/src/go/plugin/go.d/modules/uwsgi/metadata.yaml new file mode 100644 index 00000000000000..698d6abbfe0a10 --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/metadata.yaml @@ -0,0 +1,215 @@ +plugin_name: go.d.plugin +modules: + - meta: + id: collector-go.d.plugin-uwsgi + plugin_name: go.d.plugin + module_name: uwsgi + monitored_instance: + name: uWSGI + link: https://uwsgi-docs.readthedocs.io/en/latest/ + categories: + - data-collection.web-servers-and-web-proxies + icon_filename: "uwsgi.svg" + related_resources: + integrations: + list: [] + info_provided_to_referring_integrations: + description: "" + keywords: + - application server + - python + - web applications + most_popular: false + overview: + data_collection: + metrics_description: | + Monitors UWSGI worker health and performance by collecting metrics like requests, transmitted data, exceptions, and harakiris. + method_description: | + It fetches [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) statistics over TCP. + supported_platforms: + include: [] + exclude: [] + multi_instance: true + additional_permissions: + description: "" + default_behavior: + auto_detection: + description: | + Automatically discovers and collects UWSGI statistics from the following default locations: + + - localhost:1717 + limits: + description: "" + performance_impact: + description: "" + setup: + prerequisites: + list: + - title: Enable the uWSGI Stats Server + description: | + See [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) for details. + configuration: + file: + name: go.d/uwsgi.conf + options: + description: | + The following options can be defined globally: update_every, autodetection_retry. + folding: + title: Config options + enabled: true + list: + - name: update_every + description: Data collection frequency. + default_value: 1 + required: false + - name: autodetection_retry + description: Recheck interval in seconds. Zero means no recheck will be scheduled. + default_value: 0 + required: false + - name: address + description: "The IP address and port where the UWSGI [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) listens for connections." + default_value: 127.0.0.1:1717 + required: true + - name: timeout + description: Connection, read, and write timeout duration in seconds. The timeout includes name resolution. + default_value: 1 + required: false + examples: + folding: + title: Config + enabled: true + list: + - name: Basic + description: A basic example configuration. + config: | + jobs: + - name: local + address: 127.0.0.1:1717 + - name: Multi-instance + description: | + > **Note**: When you define multiple jobs, their names must be unique. + + Collecting metrics from local and remote instances. + config: | + jobs: + - name: local + address: 127.0.0.1:1717 + + - name: remote + address: 203.0.113.0:1717 + troubleshooting: + problems: + list: [] + alerts: [] + metrics: + folding: + title: Metrics + enabled: false + description: "" + availability: [] + scopes: + - name: global + description: "These metrics refer to the entire monitored application." + labels: [] + metrics: + - name: uwsgi.transmitted_data + description: UWSGI Transmitted Data + unit: "bytes/s" + chart_type: area + dimensions: + - name: tx + - name: uwsgi.requests + description: UWSGI Requests + unit: "requests/s" + chart_type: line + dimensions: + - name: requests + - name: uwsgi.harakiris + description: UWSGI Dropped Requests + unit: "harakiris/s" + chart_type: line + dimensions: + - name: harakiris + - name: uwsgi.respawns + description: UWSGI Respawns + unit: "respawns/s" + chart_type: line + dimensions: + - name: respawns + - name: worker + description: "These metrics refer to the Worker process." + labels: + - name: "worker_id" + description: Worker ID. + metrics: + - name: uwsgi.worker_transmitted_data + description: UWSGI Worker Transmitted Data + unit: "bytes/s" + chart_type: area + dimensions: + - name: tx + - name: uwsgi.worker_requests + description: UWSGI Worker Requests + unit: "requests/s" + chart_type: line + dimensions: + - name: requests + - name: uwsgi.worker_delta_requests + description: UWSGI Worker Delta Requests + unit: "requests/s" + chart_type: line + dimensions: + - name: delta_requests + - name: uwsgi.worker_average_request_time + description: UWSGI Worker Average Request Time + unit: "milliseconds" + chart_type: line + dimensions: + - name: avg + - name: uwsgi.worker_harakiris + description: UWSGI Worker Dropped Requests + unit: "harakiris/s" + chart_type: line + dimensions: + - name: harakiris + - name: uwsgi.worker_exceptions + description: UWSGI Worker Raised Exceptions + unit: "exceptions/s" + chart_type: line + dimensions: + - name: exceptions + - name: uwsgi.worker_status + description: UWSGI Worker Status + unit: "status" + chart_type: line + dimensions: + - name: idle + - name: busy + - name: cheap + - name: pause + - name: sig + - name: uwsgi.worker_request_handling_status + description: UWSGI Worker Request Handling Status + unit: "status" + chart_type: line + dimensions: + - name: accepting + - name: not_accepting + - name: uwsgi.worker_respawns + description: UWSGI Worker Respawns + unit: "respawns/s" + chart_type: line + dimensions: + - name: respawns + - name: uwsgi.worker_memory_rss + description: UWSGI Worker Memory RSS (Resident Set Size) + unit: "bytes" + chart_type: area + dimensions: + - name: rss + - name: uwsgi.worker_memory_vsz + description: UWSGI Worker Memory VSZ (Virtual Memory Size) + unit: "bytes" + chart_type: area + dimensions: + - name: vsz diff --git a/src/go/plugin/go.d/modules/uwsgi/testdata/config.json b/src/go/plugin/go.d/modules/uwsgi/testdata/config.json new file mode 100644 index 00000000000000..e868347203bee3 --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/testdata/config.json @@ -0,0 +1,5 @@ +{ + "update_every": 123, + "address": "ok", + "timeout": 123.123 +} diff --git a/src/go/plugin/go.d/modules/uwsgi/testdata/config.yaml b/src/go/plugin/go.d/modules/uwsgi/testdata/config.yaml new file mode 100644 index 00000000000000..1b81d09eb8288b --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/testdata/config.yaml @@ -0,0 +1,3 @@ +update_every: 123 +address: "ok" +timeout: 123.123 diff --git a/src/go/plugin/go.d/modules/uwsgi/testdata/stats.json b/src/go/plugin/go.d/modules/uwsgi/testdata/stats.json new file mode 100644 index 00000000000000..d00a340ba87aa5 --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/testdata/stats.json @@ -0,0 +1,117 @@ +{ + "version": "2.1.21-debian", + "listen_queue": 1, + "listen_queue_errors": 1, + "signal_queue": 1, + "load": 1, + "pid": 859919, + "uid": 1111, + "gid": 1111, + "cwd": "/home/ilyam", + "locks": [ + { + "user 1": 1 + }, + { + "signal": 1 + }, + { + "filemon": 1 + }, + { + "timer": 1 + }, + { + "rbtimer": 1 + }, + { + "cron": 1 + }, + { + "rpc": 1 + }, + { + "snmp": 1 + } + ], + "sockets": [ + { + "name": ":3131", + "proto": "uwsgi", + "queue": 1, + "max_queue": 111, + "shared": 1, + "can_offload": 1 + } + ], + "workers": [ + { + "id": 1, + "pid": 859911, + "accepting": 1, + "requests": 1, + "delta_requests": 1, + "exceptions": 1, + "harakiri_count": 1, + "signals": 1, + "signal_queue": 1, + "status": "idle", + "rss": 1, + "vsz": 1, + "running_time": 1, + "last_spawn": 1723542786, + "respawn_count": 1, + "tx": 1, + "avg_rt": 1, + "apps": [], + "cores": [ + { + "id": 1, + "requests": 1, + "static_requests": 1, + "routed_requests": 1, + "offloaded_requests": 1, + "write_errors": 1, + "read_errors": 1, + "in_request": 1, + "vars": [], + "req_info": {} + } + ] + }, + { + "id": 2, + "pid": 859911, + "accepting": 1, + "requests": 1, + "delta_requests": 1, + "exceptions": 1, + "harakiri_count": 1, + "signals": 1, + "signal_queue": 1, + "status": "idle", + "rss": 1, + "vsz": 1, + "running_time": 1, + "last_spawn": 1723542786, + "respawn_count": 1, + "tx": 1, + "avg_rt": 1, + "apps": [], + "cores": [ + { + "id": 1, + "requests": 1, + "static_requests": 1, + "routed_requests": 1, + "offloaded_requests": 1, + "write_errors": 1, + "read_errors": 1, + "in_request": 1, + "vars": [], + "req_info": {} + } + ] + } + ] +} diff --git a/src/go/plugin/go.d/modules/uwsgi/testdata/stats_no_workers.json b/src/go/plugin/go.d/modules/uwsgi/testdata/stats_no_workers.json new file mode 100644 index 00000000000000..8b8c782fd537eb --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/testdata/stats_no_workers.json @@ -0,0 +1,49 @@ +{ + "version": "2.0.21-debian", + "listen_queue": 0, + "listen_queue_errors": 0, + "signal_queue": 0, + "load": 0, + "pid": 1267323, + "uid": 1001, + "gid": 1001, + "cwd": "/home/ilyam", + "locks": [ + { + "user 0": 0 + }, + { + "signal": 0 + }, + { + "filemon": 0 + }, + { + "timer": 0 + }, + { + "rbtimer": 0 + }, + { + "cron": 0 + }, + { + "rpc": 0 + }, + { + "snmp": 0 + } + ], + "sockets": [ + { + "name": ":3031", + "proto": "uwsgi", + "queue": 0, + "max_queue": 100, + "shared": 0, + "can_offload": 0 + } + ], + "workers": [ + ] +} diff --git a/src/go/plugin/go.d/modules/uwsgi/uwsgi.go b/src/go/plugin/go.d/modules/uwsgi/uwsgi.go new file mode 100644 index 00000000000000..7fe98503e6517e --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/uwsgi.go @@ -0,0 +1,98 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package uwsgi + +import ( + _ "embed" + "errors" + "time" + + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" + "github.com/netdata/netdata/go/plugins/plugin/go.d/pkg/web" +) + +//go:embed "config_schema.json" +var configSchema string + +func init() { + module.Register("uwsgi", module.Creator{ + JobConfigSchema: configSchema, + Create: func() module.Module { return New() }, + Config: func() any { return &Config{} }, + }) +} + +func New() *Uwsgi { + return &Uwsgi{ + Config: Config{ + Address: "127.0.0.1:1717", + Timeout: web.Duration(time.Second * 1), + }, + newConn: newUwsgiConn, + charts: charts.Copy(), + seenWorkers: make(map[int]bool), + } +} + +type Config struct { + UpdateEvery int `yaml:"update_every,omitempty" json:"update_every"` + Address string `yaml:"address" json:"address"` + Timeout web.Duration `yaml:"timeout" json:"timeout"` +} + +type Uwsgi struct { + module.Base + Config `yaml:",inline" json:""` + + charts *module.Charts + + newConn func(Config) uwsgiConn + + seenWorkers map[int]bool +} + +func (u *Uwsgi) Configuration() any { + return u.Config +} + +func (u *Uwsgi) Init() error { + if u.Address == "" { + u.Error("config: 'address' not set") + return errors.New("address not set") + } + + return nil +} + +func (u *Uwsgi) Check() error { + mx, err := u.collect() + if err != nil { + u.Error(err) + return err + } + + if len(mx) == 0 { + return errors.New("no metrics collected") + } + + return nil +} + +func (u *Uwsgi) Charts() *module.Charts { + return u.charts +} + +func (u *Uwsgi) Collect() map[string]int64 { + mx, err := u.collect() + if err != nil { + u.Error(err) + } + + if len(mx) == 0 { + return nil + } + + return mx +} + +func (u *Uwsgi) Cleanup() {} diff --git a/src/go/plugin/go.d/modules/uwsgi/uwsgi_test.go b/src/go/plugin/go.d/modules/uwsgi/uwsgi_test.go new file mode 100644 index 00000000000000..900c48538d4f2d --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/uwsgi_test.go @@ -0,0 +1,325 @@ +// SPDX-License-Identifier: GPL-3.0-or-later + +package uwsgi + +import ( + "errors" + "os" + "testing" + + "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +var ( + dataConfigJSON, _ = os.ReadFile("testdata/config.json") + dataConfigYAML, _ = os.ReadFile("testdata/config.yaml") + + dataStats, _ = os.ReadFile("testdata/stats.json") + dataStatsNoWorkers, _ = os.ReadFile("testdata/stats_no_workers.json") +) + +func Test_testDataIsValid(t *testing.T) { + for name, data := range map[string][]byte{ + "dataConfigJSON": dataConfigJSON, + "dataConfigYAML": dataConfigYAML, + "dataStats": dataStats, + "dataStatsNoWorkers": dataStatsNoWorkers, + } { + require.NotNil(t, data, name) + } +} + +func TestUwsgi_ConfigurationSerialize(t *testing.T) { + module.TestConfigurationSerialize(t, &Uwsgi{}, dataConfigJSON, dataConfigYAML) +} + +func TestUwsgi_Init(t *testing.T) { + tests := map[string]struct { + config Config + wantFail bool + }{ + "success with default config": { + wantFail: false, + config: New().Config, + }, + "fails if address not set": { + wantFail: true, + config: func() Config { + conf := New().Config + conf.Address = "" + return conf + }(), + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + uw := New() + uw.Config = test.config + + if test.wantFail { + assert.Error(t, uw.Init()) + } else { + assert.NoError(t, uw.Init()) + } + }) + } +} + +func TestUwsgi_Cleanup(t *testing.T) { + tests := map[string]struct { + prepare func() *Uwsgi + }{ + "not initialized": { + prepare: func() *Uwsgi { + return New() + }, + }, + "after check": { + prepare: func() *Uwsgi { + uw := New() + uw.newConn = func(config Config) uwsgiConn { return prepareMockOk() } + _ = uw.Check() + return uw + }, + }, + "after collect": { + prepare: func() *Uwsgi { + uw := New() + uw.newConn = func(config Config) uwsgiConn { return prepareMockOk() } + _ = uw.Collect() + return uw + }, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + uw := test.prepare() + + assert.NotPanics(t, uw.Cleanup) + }) + } +} + +func TestUwsgi_Charts(t *testing.T) { + assert.NotNil(t, New().Charts()) +} + +func TestUwsgi_Check(t *testing.T) { + tests := map[string]struct { + prepareMock func() *mockUwsgiConn + wantFail bool + }{ + "success case": { + wantFail: false, + prepareMock: prepareMockOk, + }, + "success case no workers": { + wantFail: false, + prepareMock: prepareMockOkNoWorkers, + }, + "err on connect": { + wantFail: true, + prepareMock: prepareMockErrOnConnect, + }, + "unexpected response": { + wantFail: true, + prepareMock: prepareMockUnexpectedResponse, + }, + "empty response": { + wantFail: true, + prepareMock: prepareMockEmptyResponse, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + uw := New() + mock := test.prepareMock() + uw.newConn = func(config Config) uwsgiConn { return mock } + + if test.wantFail { + assert.Error(t, uw.Check()) + } else { + assert.NoError(t, uw.Check()) + } + }) + } +} + +func TestUwsgi_Collect(t *testing.T) { + tests := map[string]struct { + prepareMock func() *mockUwsgiConn + wantMetrics map[string]int64 + wantCharts int + disconnectBeforeCleanup bool + disconnectAfterCleanup bool + }{ + "success case": { + prepareMock: prepareMockOk, + wantCharts: len(charts) + len(workerChartsTmpl)*2, + disconnectBeforeCleanup: true, + disconnectAfterCleanup: true, + wantMetrics: map[string]int64{ + "worker_1_average_request_time": 1, + "worker_1_delta_requests": 1, + "worker_1_exceptions": 1, + "worker_1_harakiris": 1, + "worker_1_memory_rss": 1, + "worker_1_memory_vsz": 1, + "worker_1_request_handling_status_accepting": 1, + "worker_1_request_handling_status_not_accepting": 0, + "worker_1_requests": 1, + "worker_1_respawns": 1, + "worker_1_status_busy": 0, + "worker_1_status_cheap": 0, + "worker_1_status_idle": 1, + "worker_1_status_pause": 0, + "worker_1_status_sig": 0, + "worker_1_tx": 1, + "worker_2_average_request_time": 1, + "worker_2_delta_requests": 1, + "worker_2_exceptions": 1, + "worker_2_harakiris": 1, + "worker_2_memory_rss": 1, + "worker_2_memory_vsz": 1, + "worker_2_request_handling_status_accepting": 1, + "worker_2_request_handling_status_not_accepting": 0, + "worker_2_requests": 1, + "worker_2_respawns": 1, + "worker_2_status_busy": 0, + "worker_2_status_cheap": 0, + "worker_2_status_idle": 1, + "worker_2_status_pause": 0, + "worker_2_status_sig": 0, + "worker_2_tx": 1, + "workers_exceptions": 2, + "workers_harakiris": 2, + "workers_requests": 2, + "workers_respawns": 2, + "workers_tx": 2, + }, + }, + "success case no workers": { + prepareMock: prepareMockOkNoWorkers, + wantCharts: len(charts), + wantMetrics: map[string]int64{ + "workers_exceptions": 0, + "workers_harakiris": 0, + "workers_requests": 0, + "workers_respawns": 0, + "workers_tx": 0, + }, + disconnectBeforeCleanup: true, + disconnectAfterCleanup: true, + }, + "unexpected response": { + prepareMock: prepareMockUnexpectedResponse, + wantCharts: len(charts), + disconnectBeforeCleanup: true, + disconnectAfterCleanup: true, + }, + "empty response": { + prepareMock: prepareMockEmptyResponse, + wantCharts: len(charts), + disconnectBeforeCleanup: true, + disconnectAfterCleanup: true, + }, + "err on connect": { + prepareMock: prepareMockErrOnConnect, + wantCharts: len(charts), + disconnectBeforeCleanup: false, + disconnectAfterCleanup: false, + }, + "err on query stats": { + prepareMock: prepareMockErrOnQueryStats, + wantCharts: len(charts), + disconnectBeforeCleanup: true, + disconnectAfterCleanup: true, + }, + } + + for name, test := range tests { + t.Run(name, func(t *testing.T) { + uw := New() + mock := test.prepareMock() + uw.newConn = func(config Config) uwsgiConn { return mock } + + mx := uw.Collect() + + require.Equal(t, test.wantMetrics, mx) + + if len(test.wantMetrics) > 0 { + module.TestMetricsHasAllChartsDims(t, uw.Charts(), mx) + } + assert.Equal(t, test.wantCharts, len(*uw.Charts()), "want charts") + + assert.Equal(t, test.disconnectBeforeCleanup, mock.disconnectCalled, "disconnect before cleanup") + uw.Cleanup() + assert.Equal(t, test.disconnectAfterCleanup, mock.disconnectCalled, "disconnect after cleanup") + }) + } +} + +func prepareMockOk() *mockUwsgiConn { + return &mockUwsgiConn{ + statsResponse: dataStats, + } +} + +func prepareMockOkNoWorkers() *mockUwsgiConn { + return &mockUwsgiConn{ + statsResponse: dataStatsNoWorkers, + } +} + +func prepareMockErrOnConnect() *mockUwsgiConn { + return &mockUwsgiConn{ + errOnConnect: true, + } +} + +func prepareMockErrOnQueryStats() *mockUwsgiConn { + return &mockUwsgiConn{ + errOnQueryStats: true, + } +} + +func prepareMockUnexpectedResponse() *mockUwsgiConn { + return &mockUwsgiConn{ + statsResponse: []byte("Lorem ipsum dolor sit amet, consectetur adipiscing elit."), + } +} + +func prepareMockEmptyResponse() *mockUwsgiConn { + return &mockUwsgiConn{} +} + +type mockUwsgiConn struct { + errOnConnect bool + errOnQueryStats bool + statsResponse []byte + disconnectCalled bool +} + +func (m *mockUwsgiConn) connect() error { + if m.errOnConnect { + return errors.New("mock.connect() error") + } + return nil +} + +func (m *mockUwsgiConn) disconnect() { + m.disconnectCalled = true +} + +func (m *mockUwsgiConn) queryStats() ([]byte, error) { + if m.errOnQueryStats { + return nil, errors.New("mock.queryStats() error") + } + return m.statsResponse, nil +} From a781b36a1f617d6d882c4f1891224bb8bc065c73 Mon Sep 17 00:00:00 2001 From: Netdata bot <43409846+netdatabot@users.noreply.github.com> Date: Tue, 13 Aug 2024 13:22:50 -0400 Subject: [PATCH 26/27] Regenerate integrations.js (#18328) Co-authored-by: ilyam8 <22274335+ilyam8@users.noreply.github.com> --- integrations/integrations.js | 75 +++--- integrations/integrations.json | 75 +++--- src/collectors/COLLECTORS.md | 2 +- src/go/plugin/go.d/modules/uwsgi/README.md | 1 + .../go.d/modules/uwsgi/integrations/uwsgi.md | 246 ++++++++++++++++++ 5 files changed, 324 insertions(+), 75 deletions(-) create mode 120000 src/go/plugin/go.d/modules/uwsgi/README.md create mode 100644 src/go/plugin/go.d/modules/uwsgi/integrations/uwsgi.md diff --git a/integrations/integrations.js b/integrations/integrations.js index d14175df06b8e6..2e18cbeb2de0af 100644 --- a/integrations/integrations.js +++ b/integrations/integrations.js @@ -16823,6 +16823,44 @@ export const integrations = [ "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/upsd/metadata.yaml", "related_resources": "" }, + { + "meta": { + "id": "collector-go.d.plugin-uwsgi", + "plugin_name": "go.d.plugin", + "module_name": "uwsgi", + "monitored_instance": { + "name": "uWSGI", + "link": "https://uwsgi-docs.readthedocs.io/en/latest/", + "categories": [ + "data-collection.web-servers-and-web-proxies" + ], + "icon_filename": "uwsgi.svg" + }, + "related_resources": { + "integrations": { + "list": [] + } + }, + "info_provided_to_referring_integrations": { + "description": "" + }, + "keywords": [ + "application server", + "python", + "web applications" + ], + "most_popular": false + }, + "overview": "# uWSGI\n\nPlugin: go.d.plugin\nModule: uwsgi\n\n## Overview\n\nMonitors UWSGI worker health and performance by collecting metrics like requests, transmitted data, exceptions, and harakiris.\n\n\nIt fetches [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) statistics over TCP.\n\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nAutomatically discovers and collects UWSGI statistics from the following default locations:\n\n- localhost:1717\n\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", + "setup": "## Setup\n\n### Prerequisites\n\n#### Enable the uWSGI Stats Server\n\nSee [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) for details.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/uwsgi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/uwsgi.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 1 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| address | The IP address and port where the UWSGI [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) listens for connections. | 127.0.0.1:1717 | yes |\n| timeout | Connection, read, and write timeout duration in seconds. The timeout includes name resolution. | 1 | no |\n\n{% /details %}\n#### Examples\n\n##### Basic\n\nA basic example configuration.\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: local\n address: 127.0.0.1:1717\n\n```\n{% /details %}\n##### Multi-instance\n\n> **Note**: When you define multiple jobs, their names must be unique.\n\nCollecting metrics from local and remote instances.\n\n\n{% details open=true summary=\"Config\" %}\n```yaml\njobs:\n - name: local\n address: 127.0.0.1:1717\n\n - name: remote\n address: 203.0.113.0:1717\n\n```\n{% /details %}\n", + "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `uwsgi` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m uwsgi\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `uwsgi` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep uwsgi\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep uwsgi /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep uwsgi\n```\n\n", + "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", + "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per uWSGI instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| uwsgi.transmitted_data | tx | bytes/s |\n| uwsgi.requests | requests | requests/s |\n| uwsgi.harakiris | harakiris | harakiris/s |\n| uwsgi.respawns | respawns | respawns/s |\n\n### Per worker\n\nThese metrics refer to the Worker process.\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| worker_id | Worker ID. |\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| uwsgi.worker_transmitted_data | tx | bytes/s |\n| uwsgi.worker_requests | requests | requests/s |\n| uwsgi.worker_delta_requests | delta_requests | requests/s |\n| uwsgi.worker_average_request_time | avg | milliseconds |\n| uwsgi.worker_harakiris | harakiris | harakiris/s |\n| uwsgi.worker_exceptions | exceptions | exceptions/s |\n| uwsgi.worker_status | idle, busy, cheap, pause, sig | status |\n| uwsgi.worker_request_handling_status | accepting, not_accepting | status |\n| uwsgi.worker_respawns | respawns | respawns/s |\n| uwsgi.worker_memory_rss | rss | bytes |\n| uwsgi.worker_memory_vsz | vsz | bytes |\n\n", + "integration_type": "collector", + "id": "go.d.plugin-uwsgi-uWSGI", + "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/uwsgi/metadata.yaml", + "related_resources": "" + }, { "meta": { "id": "collector-go.d.plugin-vcsa", @@ -19401,43 +19439,6 @@ export const integrations = [ "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/tor/metadata.yaml", "related_resources": "" }, - { - "meta": { - "plugin_name": "python.d.plugin", - "module_name": "uwsgi", - "monitored_instance": { - "name": "uWSGI", - "link": "https://github.com/unbit/uwsgi/tree/2.0.21", - "categories": [ - "data-collection.web-servers-and-web-proxies" - ], - "icon_filename": "uwsgi.svg" - }, - "related_resources": { - "integrations": { - "list": [] - } - }, - "info_provided_to_referring_integrations": { - "description": "" - }, - "keywords": [ - "application server", - "python", - "web applications" - ], - "most_popular": false - }, - "overview": "# uWSGI\n\nPlugin: python.d.plugin\nModule: uwsgi\n\n## Overview\n\nThis collector monitors uWSGI metrics about requests, workers, memory and more.\n\nIt collects every metric exposed from the stats server of uWSGI, either from the `stats.socket` or from the web server's TCP/IP socket.\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis collector will auto-detect uWSGI instances deployed on the local host, running on port 1717, or exposing stats on socket `tmp/stats.socket`.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### Enable the uWSGI Stats server\n\nMake sure that you uWSGI exposes it's metrics via a Stats server.\n\nSource: https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `python.d/uwsgi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config python.d/uwsgi.conf\n```\n#### Options\n\nThere are 2 sections:\n\n* Global variables\n* One or more JOBS that can define multiple different instances to monitor.\n\nThe following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values.\n\nAdditionally, the following collapsed table contains all the options that can be configured inside a JOB definition.\n\nEvery configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified.\n\n\n{% details open=true summary=\"Config options\" %}\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Sets the default data collection frequency. | 5 | no |\n| priority | Controls the order of charts at the netdata dashboard. | 60000 | no |\n| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no |\n| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no |\n| name | The JOB's name as it will appear at the dashboard (by default is the job_name) | job_name | no |\n| socket | The 'path/to/uwsgistats.sock' | no | no |\n| host | The host to connect to | no | no |\n| port | The port to connect to | no | no |\n\n{% /details %}\n#### Examples\n\n##### Basic (default out-of-the-box)\n\nA basic example configuration, one job will run at a time. Autodetect mechanism uses it by default. As all JOBs have the same name, only one can run at a time.\n\n{% details open=true summary=\"Config\" %}\n```yaml\nsocket:\n name : 'local'\n socket : '/tmp/stats.socket'\n\nlocalhost:\n name : 'local'\n host : 'localhost'\n port : 1717\n\nlocalipv4:\n name : 'local'\n host : '127.0.0.1'\n port : 1717\n\nlocalipv6:\n name : 'local'\n host : '::1'\n port : 1717\n\n```\n{% /details %}\n##### Multi-instance\n\n> **Note**: When you define multiple jobs, their names must be unique.\n\nCollecting metrics from local and remote instances.\n\n\n{% details open=true summary=\"Config\" %}\n```yaml\nlocal:\n name : 'local'\n host : 'localhost'\n port : 1717\n\nremote:\n name : 'remote'\n host : '192.0.2.1'\n port : 1717\n\n```\n{% /details %}\n", - "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `uwsgi` collector, run the `python.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `python.d.plugin` to debug the collector:\n\n ```bash\n ./python.d.plugin uwsgi debug trace\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `uwsgi` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep uwsgi\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep uwsgi /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep uwsgi\n```\n\n", - "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", - "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per uWSGI instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| uwsgi.requests | a dimension per worker | requests/s |\n| uwsgi.tx | a dimension per worker | KiB/s |\n| uwsgi.avg_rt | a dimension per worker | milliseconds |\n| uwsgi.memory_rss | a dimension per worker | MiB |\n| uwsgi.memory_vsz | a dimension per worker | MiB |\n| uwsgi.exceptions | exceptions | exceptions |\n| uwsgi.harakiris | harakiris | harakiris |\n| uwsgi.respawns | respawns | respawns |\n\n", - "integration_type": "collector", - "id": "python.d.plugin-uwsgi-uWSGI", - "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/uwsgi/metadata.yaml", - "related_resources": "" - }, { "meta": { "plugin_name": "python.d.plugin", diff --git a/integrations/integrations.json b/integrations/integrations.json index a347b2f0700556..3613e497987ac6 100644 --- a/integrations/integrations.json +++ b/integrations/integrations.json @@ -16821,6 +16821,44 @@ "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/upsd/metadata.yaml", "related_resources": "" }, + { + "meta": { + "id": "collector-go.d.plugin-uwsgi", + "plugin_name": "go.d.plugin", + "module_name": "uwsgi", + "monitored_instance": { + "name": "uWSGI", + "link": "https://uwsgi-docs.readthedocs.io/en/latest/", + "categories": [ + "data-collection.web-servers-and-web-proxies" + ], + "icon_filename": "uwsgi.svg" + }, + "related_resources": { + "integrations": { + "list": [] + } + }, + "info_provided_to_referring_integrations": { + "description": "" + }, + "keywords": [ + "application server", + "python", + "web applications" + ], + "most_popular": false + }, + "overview": "# uWSGI\n\nPlugin: go.d.plugin\nModule: uwsgi\n\n## Overview\n\nMonitors UWSGI worker health and performance by collecting metrics like requests, transmitted data, exceptions, and harakiris.\n\n\nIt fetches [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) statistics over TCP.\n\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nAutomatically discovers and collects UWSGI statistics from the following default locations:\n\n- localhost:1717\n\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", + "setup": "## Setup\n\n### Prerequisites\n\n#### Enable the uWSGI Stats Server\n\nSee [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) for details.\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `go.d/uwsgi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config go.d/uwsgi.conf\n```\n#### Options\n\nThe following options can be defined globally: update_every, autodetection_retry.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Data collection frequency. | 1 | no |\n| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |\n| address | The IP address and port where the UWSGI [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) listens for connections. | 127.0.0.1:1717 | yes |\n| timeout | Connection, read, and write timeout duration in seconds. The timeout includes name resolution. | 1 | no |\n\n#### Examples\n\n##### Basic\n\nA basic example configuration.\n\n```yaml\njobs:\n - name: local\n address: 127.0.0.1:1717\n\n```\n##### Multi-instance\n\n> **Note**: When you define multiple jobs, their names must be unique.\n\nCollecting metrics from local and remote instances.\n\n\n```yaml\njobs:\n - name: local\n address: 127.0.0.1:1717\n\n - name: remote\n address: 203.0.113.0:1717\n\n```\n", + "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `uwsgi` collector, run the `go.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `go.d.plugin` to debug the collector:\n\n ```bash\n ./go.d.plugin -d -m uwsgi\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `uwsgi` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep uwsgi\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep uwsgi /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep uwsgi\n```\n\n", + "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", + "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per uWSGI instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| uwsgi.transmitted_data | tx | bytes/s |\n| uwsgi.requests | requests | requests/s |\n| uwsgi.harakiris | harakiris | harakiris/s |\n| uwsgi.respawns | respawns | respawns/s |\n\n### Per worker\n\nThese metrics refer to the Worker process.\n\nLabels:\n\n| Label | Description |\n|:-----------|:----------------|\n| worker_id | Worker ID. |\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| uwsgi.worker_transmitted_data | tx | bytes/s |\n| uwsgi.worker_requests | requests | requests/s |\n| uwsgi.worker_delta_requests | delta_requests | requests/s |\n| uwsgi.worker_average_request_time | avg | milliseconds |\n| uwsgi.worker_harakiris | harakiris | harakiris/s |\n| uwsgi.worker_exceptions | exceptions | exceptions/s |\n| uwsgi.worker_status | idle, busy, cheap, pause, sig | status |\n| uwsgi.worker_request_handling_status | accepting, not_accepting | status |\n| uwsgi.worker_respawns | respawns | respawns/s |\n| uwsgi.worker_memory_rss | rss | bytes |\n| uwsgi.worker_memory_vsz | vsz | bytes |\n\n", + "integration_type": "collector", + "id": "go.d.plugin-uwsgi-uWSGI", + "edit_link": "https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/uwsgi/metadata.yaml", + "related_resources": "" + }, { "meta": { "id": "collector-go.d.plugin-vcsa", @@ -19399,43 +19437,6 @@ "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/tor/metadata.yaml", "related_resources": "" }, - { - "meta": { - "plugin_name": "python.d.plugin", - "module_name": "uwsgi", - "monitored_instance": { - "name": "uWSGI", - "link": "https://github.com/unbit/uwsgi/tree/2.0.21", - "categories": [ - "data-collection.web-servers-and-web-proxies" - ], - "icon_filename": "uwsgi.svg" - }, - "related_resources": { - "integrations": { - "list": [] - } - }, - "info_provided_to_referring_integrations": { - "description": "" - }, - "keywords": [ - "application server", - "python", - "web applications" - ], - "most_popular": false - }, - "overview": "# uWSGI\n\nPlugin: python.d.plugin\nModule: uwsgi\n\n## Overview\n\nThis collector monitors uWSGI metrics about requests, workers, memory and more.\n\nIt collects every metric exposed from the stats server of uWSGI, either from the `stats.socket` or from the web server's TCP/IP socket.\n\nThis collector is supported on all platforms.\n\nThis collector supports collecting metrics from multiple instances of this integration, including remote instances.\n\n\n### Default Behavior\n\n#### Auto-Detection\n\nThis collector will auto-detect uWSGI instances deployed on the local host, running on port 1717, or exposing stats on socket `tmp/stats.socket`.\n\n#### Limits\n\nThe default configuration for this integration does not impose any limits on data collection.\n\n#### Performance Impact\n\nThe default configuration for this integration is not expected to impose a significant performance impact on the system.\n", - "setup": "## Setup\n\n### Prerequisites\n\n#### Enable the uWSGI Stats server\n\nMake sure that you uWSGI exposes it's metrics via a Stats server.\n\nSource: https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html\n\n\n\n### Configuration\n\n#### File\n\nThe configuration file name for this integration is `python.d/uwsgi.conf`.\n\n\nYou can edit the configuration file using the `edit-config` script from the\nNetdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).\n\n```bash\ncd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata\nsudo ./edit-config python.d/uwsgi.conf\n```\n#### Options\n\nThere are 2 sections:\n\n* Global variables\n* One or more JOBS that can define multiple different instances to monitor.\n\nThe following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values.\n\nAdditionally, the following collapsed table contains all the options that can be configured inside a JOB definition.\n\nEvery configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified.\n\n\n| Name | Description | Default | Required |\n|:----|:-----------|:-------|:--------:|\n| update_every | Sets the default data collection frequency. | 5 | no |\n| priority | Controls the order of charts at the netdata dashboard. | 60000 | no |\n| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no |\n| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no |\n| name | The JOB's name as it will appear at the dashboard (by default is the job_name) | job_name | no |\n| socket | The 'path/to/uwsgistats.sock' | no | no |\n| host | The host to connect to | no | no |\n| port | The port to connect to | no | no |\n\n#### Examples\n\n##### Basic (default out-of-the-box)\n\nA basic example configuration, one job will run at a time. Autodetect mechanism uses it by default. As all JOBs have the same name, only one can run at a time.\n\n```yaml\nsocket:\n name : 'local'\n socket : '/tmp/stats.socket'\n\nlocalhost:\n name : 'local'\n host : 'localhost'\n port : 1717\n\nlocalipv4:\n name : 'local'\n host : '127.0.0.1'\n port : 1717\n\nlocalipv6:\n name : 'local'\n host : '::1'\n port : 1717\n\n```\n##### Multi-instance\n\n> **Note**: When you define multiple jobs, their names must be unique.\n\nCollecting metrics from local and remote instances.\n\n\n```yaml\nlocal:\n name : 'local'\n host : 'localhost'\n port : 1717\n\nremote:\n name : 'remote'\n host : '192.0.2.1'\n port : 1717\n\n```\n", - "troubleshooting": "## Troubleshooting\n\n### Debug Mode\n\nTo troubleshoot issues with the `uwsgi` collector, run the `python.d.plugin` with the debug option enabled. The output\nshould give you clues as to why the collector isn't working.\n\n- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on\n your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.\n\n ```bash\n cd /usr/libexec/netdata/plugins.d/\n ```\n\n- Switch to the `netdata` user.\n\n ```bash\n sudo -u netdata -s\n ```\n\n- Run the `python.d.plugin` to debug the collector:\n\n ```bash\n ./python.d.plugin uwsgi debug trace\n ```\n\n### Getting Logs\n\nIf you're encountering problems with the `uwsgi` collector, follow these steps to retrieve logs and identify potential issues:\n\n- **Run the command** specific to your system (systemd, non-systemd, or Docker container).\n- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.\n\n#### System with systemd\n\nUse the following command to view logs generated since the last Netdata service restart:\n\n```bash\njournalctl _SYSTEMD_INVOCATION_ID=\"$(systemctl show --value --property=InvocationID netdata)\" --namespace=netdata --grep uwsgi\n```\n\n#### System without systemd\n\nLocate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:\n\n```bash\ngrep uwsgi /var/log/netdata/collector.log\n```\n\n**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.\n\n#### Docker Container\n\nIf your Netdata runs in a Docker container named \"netdata\" (replace if different), use this command:\n\n```bash\ndocker logs netdata 2>&1 | grep uwsgi\n```\n\n", - "alerts": "## Alerts\n\nThere are no alerts configured by default for this integration.\n", - "metrics": "## Metrics\n\nMetrics grouped by *scope*.\n\nThe scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.\n\n\n\n### Per uWSGI instance\n\nThese metrics refer to the entire monitored application.\n\nThis scope has no labels.\n\nMetrics:\n\n| Metric | Dimensions | Unit |\n|:------|:----------|:----|\n| uwsgi.requests | a dimension per worker | requests/s |\n| uwsgi.tx | a dimension per worker | KiB/s |\n| uwsgi.avg_rt | a dimension per worker | milliseconds |\n| uwsgi.memory_rss | a dimension per worker | MiB |\n| uwsgi.memory_vsz | a dimension per worker | MiB |\n| uwsgi.exceptions | exceptions | exceptions |\n| uwsgi.harakiris | harakiris | harakiris |\n| uwsgi.respawns | respawns | respawns |\n\n", - "integration_type": "collector", - "id": "python.d.plugin-uwsgi-uWSGI", - "edit_link": "https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/uwsgi/metadata.yaml", - "related_resources": "" - }, { "meta": { "plugin_name": "python.d.plugin", diff --git a/src/collectors/COLLECTORS.md b/src/collectors/COLLECTORS.md index be420d3d44b904..0df4fbff81ef29 100644 --- a/src/collectors/COLLECTORS.md +++ b/src/collectors/COLLECTORS.md @@ -1161,7 +1161,7 @@ If you don't see the app/service you'd like to monitor in this list: - [Web server log files](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/weblog/integrations/web_server_log_files.md) -- [uWSGI](https://github.com/netdata/netdata/blob/master/src/collectors/python.d.plugin/uwsgi/integrations/uwsgi.md) +- [uWSGI](https://github.com/netdata/netdata/blob/master/src/go/plugin/go.d/modules/uwsgi/integrations/uwsgi.md) ### Windows Systems diff --git a/src/go/plugin/go.d/modules/uwsgi/README.md b/src/go/plugin/go.d/modules/uwsgi/README.md new file mode 120000 index 00000000000000..44b8559492a874 --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/README.md @@ -0,0 +1 @@ +integrations/uwsgi.md \ No newline at end of file diff --git a/src/go/plugin/go.d/modules/uwsgi/integrations/uwsgi.md b/src/go/plugin/go.d/modules/uwsgi/integrations/uwsgi.md new file mode 100644 index 00000000000000..e7b98f7e42d8c7 --- /dev/null +++ b/src/go/plugin/go.d/modules/uwsgi/integrations/uwsgi.md @@ -0,0 +1,246 @@ + + +# uWSGI + + + + + +Plugin: go.d.plugin +Module: uwsgi + + + +## Overview + +Monitors UWSGI worker health and performance by collecting metrics like requests, transmitted data, exceptions, and harakiris. + + +It fetches [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) statistics over TCP. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +Automatically discovers and collects UWSGI statistics from the following default locations: + +- localhost:1717 + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per uWSGI instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| uwsgi.transmitted_data | tx | bytes/s | +| uwsgi.requests | requests | requests/s | +| uwsgi.harakiris | harakiris | harakiris/s | +| uwsgi.respawns | respawns | respawns/s | + +### Per worker + +These metrics refer to the Worker process. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| worker_id | Worker ID. | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| uwsgi.worker_transmitted_data | tx | bytes/s | +| uwsgi.worker_requests | requests | requests/s | +| uwsgi.worker_delta_requests | delta_requests | requests/s | +| uwsgi.worker_average_request_time | avg | milliseconds | +| uwsgi.worker_harakiris | harakiris | harakiris/s | +| uwsgi.worker_exceptions | exceptions | exceptions/s | +| uwsgi.worker_status | idle, busy, cheap, pause, sig | status | +| uwsgi.worker_request_handling_status | accepting, not_accepting | status | +| uwsgi.worker_respawns | respawns | respawns/s | +| uwsgi.worker_memory_rss | rss | bytes | +| uwsgi.worker_memory_vsz | vsz | bytes | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Enable the uWSGI Stats Server + +See [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) for details. + + + +### Configuration + +#### File + +The configuration file name for this integration is `go.d/uwsgi.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config go.d/uwsgi.conf +``` +#### Options + +The following options can be defined globally: update_every, autodetection_retry. + + +
Config options + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Data collection frequency. | 1 | no | +| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no | +| address | The IP address and port where the UWSGI [Stats Server](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html) listens for connections. | 127.0.0.1:1717 | yes | +| timeout | Connection, read, and write timeout duration in seconds. The timeout includes name resolution. | 1 | no | + +
+ +#### Examples + +##### Basic + +A basic example configuration. + +
Config + +```yaml +jobs: + - name: local + address: 127.0.0.1:1717 + +``` +
+ +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +
Config + +```yaml +jobs: + - name: local + address: 127.0.0.1:1717 + + - name: remote + address: 203.0.113.0:1717 + +``` +
+ + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `uwsgi` collector, run the `go.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `go.d.plugin` to debug the collector: + + ```bash + ./go.d.plugin -d -m uwsgi + ``` + +### Getting Logs + +If you're encountering problems with the `uwsgi` collector, follow these steps to retrieve logs and identify potential issues: + +- **Run the command** specific to your system (systemd, non-systemd, or Docker container). +- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem. + +#### System with systemd + +Use the following command to view logs generated since the last Netdata service restart: + +```bash +journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep uwsgi +``` + +#### System without systemd + +Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name: + +```bash +grep uwsgi /var/log/netdata/collector.log +``` + +**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues. + +#### Docker Container + +If your Netdata runs in a Docker container named "netdata" (replace if different), use this command: + +```bash +docker logs netdata 2>&1 | grep uwsgi +``` + + From f509879e9b3c99712d4e021d38526f9f71e18104 Mon Sep 17 00:00:00 2001 From: netdatabot Date: Wed, 14 Aug 2024 00:18:28 +0000 Subject: [PATCH 27/27] [ci skip] Update changelog and version for nightly build: v1.46.0-299-nightly. --- CHANGELOG.md | 6 +++--- packaging/version | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a319dca77583b0..83e14b26724b87 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,9 @@ **Merged pull requests:** +- Regenerate integrations.js [\#18328](https://github.com/netdata/netdata/pull/18328) ([netdatabot](https://github.com/netdatabot)) +- add go.d/uwsgi [\#18326](https://github.com/netdata/netdata/pull/18326) ([ilyam8](https://github.com/ilyam8)) +- remove python.d/uwsgi [\#18325](https://github.com/netdata/netdata/pull/18325) ([ilyam8](https://github.com/ilyam8)) - Regenerate integrations.js [\#18324](https://github.com/netdata/netdata/pull/18324) ([netdatabot](https://github.com/netdatabot)) - remove python.d/dovecot [\#18322](https://github.com/netdata/netdata/pull/18322) ([ilyam8](https://github.com/ilyam8)) - add go.d dovecot [\#18321](https://github.com/netdata/netdata/pull/18321) ([ilyam8](https://github.com/ilyam8)) @@ -412,9 +415,6 @@ - remove unused go.d/prometheus meta file [\#17749](https://github.com/netdata/netdata/pull/17749) ([ilyam8](https://github.com/ilyam8)) - Regenerate integrations.js [\#17748](https://github.com/netdata/netdata/pull/17748) ([netdatabot](https://github.com/netdatabot)) - Use semver releases with sentry. [\#17746](https://github.com/netdata/netdata/pull/17746) ([vkalintiris](https://github.com/vkalintiris)) -- add go.d clickhouse [\#17743](https://github.com/netdata/netdata/pull/17743) ([ilyam8](https://github.com/ilyam8)) -- fix clickhouse in apps groups [\#17742](https://github.com/netdata/netdata/pull/17742) ([ilyam8](https://github.com/ilyam8)) -- fix ebpf cgroup swap context [\#17740](https://github.com/netdata/netdata/pull/17740) ([ilyam8](https://github.com/ilyam8)) ## [v1.45.6](https://github.com/netdata/netdata/tree/v1.45.6) (2024-06-05) diff --git a/packaging/version b/packaging/version index ad4424a0396e12..bf6d12bd206b32 100644 --- a/packaging/version +++ b/packaging/version @@ -1 +1 @@ -v1.46.0-295-nightly +v1.46.0-299-nightly