From d9f715948687f27794783c5941817448436426f9 Mon Sep 17 00:00:00 2001 From: Ilya Mashchenko Date: Sat, 30 Nov 2024 21:14:36 +0200 Subject: [PATCH] docs: format, typos, and some simplifications in `docs/` (#19112) --- ...th-netdata-alerts-configuration-manager.md | 6 +- .../notifications/README.md | 4 +- ...nage-alert-notification-silencing-rules.md | 46 +++++----- .../manage-notification-methods.md | 44 +++++----- .../working-with-logs.md | 4 +- docs/dashboards-and-charts/README.md | 4 +- docs/dashboards-and-charts/alerts-tab.md | 40 ++++----- .../anomaly-advisor-tab.md | 4 +- docs/dashboards-and-charts/dashboards-tab.md | 28 +++---- docs/dashboards-and-charts/events-feed.md | 6 +- docs/dashboards-and-charts/home-tab.md | 24 +++--- .../import-export-print-snapshot.md | 10 +-- docs/dashboards-and-charts/kubernetes-tab.md | 2 +- .../metrics-tab-and-single-node-tabs.md | 8 +- docs/dashboards-and-charts/netdata-charts.md | 74 ++++++++--------- docs/dashboards-and-charts/node-filter.md | 8 +- docs/dashboards-and-charts/nodes-tab.md | 4 +- docs/dashboards-and-charts/top-tab.md | 4 +- .../visualization-date-and-time-controls.md | 16 ++-- .../deployment-with-centralization-points.md | 6 +- .../README.md | 2 +- .../collect-apache-nginx-web-logs.md | 22 ++--- .../customize.md | 2 +- .../monitor-debug-applications-ebpf.md | 10 +-- .../monitor-hadoop-cluster.md | 6 +- .../style-guide.md | 26 +++--- docs/exporting-metrics/README.md | 83 ++++++++----------- docs/glossary.md | 37 ++++----- docs/guidelines.md | 24 +++--- docs/metric-correlations.md | 35 ++++---- docs/netdata-agent/README.md | 4 +- .../anonymous-telemetry-events.md | 4 +- .../README.md | 9 +- .../Running-behind-apache.md | 34 ++++---- .../Running-behind-h2o.md | 12 +-- .../Running-behind-lighttpd.md | 4 +- .../Running-behind-nginx.md | 22 ++--- .../sizing-netdata-agents/README.md | 2 +- docs/netdata-assistant.md | 4 +- docs/netdata-cloud/README.md | 39 +++++---- .../enterprise-sso-authentication.md | 17 ++-- .../role-based-access-model.md | 6 +- docs/netdata-cloud/view-plan-and-billing.md | 28 +++---- .../README.md | 5 +- .../best-practices.md | 15 ++-- .../README.md | 4 +- ...ctive-journal-source-without-encryption.md | 4 +- ...cryption-using-self-signed-certificates.md | 16 ++-- ...urnal-centralization-without-encryption.md | 4 +- .../metrics-centralization-points/README.md | 11 ++- ...nd-high-availability-of-netdata-parents.md | 12 +-- .../configuration.md | 12 +-- .../metrics-centralization-points/faq.md | 14 ++-- .../replication-of-past-samples.md | 6 +- docs/security-and-privacy-design/README.md | 83 +++++++++---------- .../netdata-agent-security.md | 2 +- .../netdata-cloud-security.md | 29 +++---- docs/top-monitoring-netdata-functions.md | 22 ++--- 58 files changed, 487 insertions(+), 526 deletions(-) diff --git a/docs/alerts-and-notifications/creating-alerts-with-netdata-alerts-configuration-manager.md b/docs/alerts-and-notifications/creating-alerts-with-netdata-alerts-configuration-manager.md index f9a443c9d6787a..27270eb8fc4ffb 100644 --- a/docs/alerts-and-notifications/creating-alerts-with-netdata-alerts-configuration-manager.md +++ b/docs/alerts-and-notifications/creating-alerts-with-netdata-alerts-configuration-manager.md @@ -32,11 +32,11 @@ You can read more about the different options in the [Alerts reference documenta ### Alerting Conditions - **Thresholds**: Set thresholds for warning and critical Alert states, specifying whether the Alert should trigger above or below these thresholds. Advanced settings allow for custom formulas. - - **Recovery Thresholds**: Set thresholds for downgrading the Alert from critical to warning or from warning to clear. +- **Recovery Thresholds**: Set thresholds for downgrading the Alert from critical to warning or from warning to clear. - **Check Interval**: Define how frequently the health check should run. - **Delay Notifications**: Manage notification delays for Alert escalations or de-escalations. -- **Agent Specific Options**: Options exclusive to the Netdata Agent, like repeat notification frequencies and notification recipients. - - **Custom Exec Script**: Define custom scripts to execute when an Alert triggers. +- **Agent-Specific Options**: Options exclusive to the Netdata Agent, like repeat notification frequencies and notification recipients. +- **Custom Exec Script**: Define custom scripts to execute when an Alert triggers. ### Alert Name, Description, and Summary Section diff --git a/docs/alerts-and-notifications/notifications/README.md b/docs/alerts-and-notifications/notifications/README.md index 2efcdbe48b041c..dd81043b7792e6 100644 --- a/docs/alerts-and-notifications/notifications/README.md +++ b/docs/alerts-and-notifications/notifications/README.md @@ -4,6 +4,6 @@ This section includes the documentation of the integrations for both of Netdata' -- Netdata Cloud provides centralized alert notifications, utilizing the health status data already sent to Netdata Cloud from connected nodes to send alerts to configured integrations. [Supported integrations](/docs/alerts-&-notifications/notifications/centralized-cloud-notifications) include Amazon SNS, Discord, Slack, Splunk, and others. +- Netdata Cloud provides centralized alert notifications, using the health status data already sent to Netdata Cloud from connected nodes to send alerts to configured integrations. [Supported integrations](/docs/alerts-and-notifications/notifications/centralized-cloud-notifications) include Amazon SNS, Discord, Slack, Splunk, and others. -- The Netdata Agent offers a [wider range of notification options](/docs/alerts-&-notifications/notifications/agent-dispatched-notifications) directly from the Agent itself. You can choose from over a dozen services, including email, Slack, PagerDuty, Twilio, and others, for more granular control over notifications on each node. +- The Netdata Agent offers a [wider range of notification options](/docs/alerts-and-notifications/notifications/agent-dispatched-notifications) directly from the Agent itself. You can choose from over a dozen services, including email, Slack, PagerDuty, Twilio, and others, for more granular control over notifications on each node. diff --git a/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/manage-alert-notification-silencing-rules.md b/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/manage-alert-notification-silencing-rules.md index d537ef7ea0fae7..c4de1ad633d1ed 100644 --- a/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/manage-alert-notification-silencing-rules.md +++ b/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/manage-alert-notification-silencing-rules.md @@ -7,12 +7,12 @@ From the Cloud interface, you can manage your space's Alert notification silenci To manage **space's Alert notification silencing rule settings**, you will need the following: - A Netdata Cloud account -- Access to the space as an **administrator** or **manager** (**troubleshooters** can only view space rules) +- Access to Space as an **administrator** or **manager** (**troubleshooters** can only view space rules) To manage your **personal Alert notification silencing rule settings**, you will need the following: - A Netdata Cloud account -- Access to the space with any role except **billing** +- Access to Space with any role except **billing** ### Steps @@ -21,33 +21,33 @@ To manage your **personal Alert notification silencing rule settings**, you will 3. Click on the **Notification Silencing Rules** tab. 4. You will be presented with a table of the configured Alert notification silencing rules for: - - The space (if you aren't an **observer**) - - Yourself + - The space (if you aren't an **observer**) + - Yourself You will be able to: - 1. **Add a new** Alert notification silencing rule configuration. - - Choose if it applies to **All users** or **Myself** (All users is only available for **administrators** and **managers**). - - You need to provide a name for the configuration so you can easily refer to it. - - Define criteria for Nodes, to which Rooms will the rule apply, on what Nodes and whether or not it applies to host labels key-value pairs. - - Define criteria for Alerts, such as Alert name is being targeted and on what Alert context. You can also specify if it will apply to a specific Alert role. - - Define when it will be applied: - - Immediately, from now until it is turned off or until a specific duration (start and end date automatically set). - - Scheduled, you can specify the start and end time for when the rule becomes active and then inactive (time is set according to your browser's local timezone). - Note: You are only able to add a rule if your space is on a [paid plan](/docs/netdata-cloud/view-plan-and-billing.md). - 2. **Edit an existing** Alert notification silencing rule configuration. You will be able to change: - - The name provided for it - - Who it applies to - - Selection criteria for Nodes and Alerts - - When it will be applied - 3. **Enable/Disable** a given Alert notification silencing rule configuration. - - Use the toggle to enable or disable - 4. **Delete an existing** Alert notification silencing rule. - - Use the trash icon to delete your configuration + 1. **Add a new** Alert notification silencing rule configuration. + - Choose if it applies to **All users** or **Myself** (All users is only available for **administrators** and **managers**). + - You need to provide a name for the configuration so you can refer to it. + - Define criteria for Nodes, to which Rooms will the rule apply, on what Nodes and whether it applies to host labels key-value pairs. + - Define criteria for Alerts, such as Alert name is being targeted and in what Alert context. You can also specify if it applies to a specific Alert role. + - Define when it is applied: + - Immediately, from now until it is turned off or until a specific duration (start and end date automatically set). + - Scheduled, you can specify the start and end time for when the rule becomes active and then inactive (time is set according to your browser's local timezone). + Note: You are only able to add a rule if your space is on a [paid plan](/docs/netdata-cloud/view-plan-and-billing.md). + 2. **Edit an existing** Alert notification silencing rule configuration. You will be able to change: + - The name provided for it + - Who it applies to + - Selection criteria for Nodes and Alerts + - When it will be applied + 3. **Enable/Disable** a given Alert notification silencing rule configuration. + - Use the toggle to enable or disable + 4. **Delete an existing** Alert notification silencing rule. + - Use the trash icon to delete your configuration ## Silencing Rules Examples -| Rule name | Rooms | Nodes | Host Label | Alert name | Alert context | Alert instance | Alert role | Description | +| Rule name | Rooms | Nodes | Host Label | Alert name | Alert context | Alert instance | Alert role | Description | |:---------------------------------|:-------------------|:---------|:-------------------------|:-------------------------------------------------|:--------------|:-------------------------|:------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Space silencing | All Rooms | * | * | * | * | * | * | This rule silences the entire space, targets all nodes, and for all users. E.g. infrastructure-wide maintenance window. | | DB Servers Rooms | PostgreSQL Servers | * | * | * | * | * | * | This rule silences the nodes in the Room named PostgreSQL Servers, for example, it doesn't silence the `All Nodes` Room. E.g. My team with membership to this Room doesn't want to receive notifications for these nodes. | diff --git a/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/manage-notification-methods.md b/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/manage-notification-methods.md index 463b1010135a29..f34baf7ddd351f 100644 --- a/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/manage-notification-methods.md +++ b/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/manage-notification-methods.md @@ -9,7 +9,7 @@ From the Cloud interface, you can manage your Space's notification settings as w To manage Space notification settings, you will need the following: - A Netdata Cloud account -- Access to the Space as an **administrator** +- Access to Space as an **administrator** ### Available Actions per Notification Method Based on Service Level @@ -31,20 +31,20 @@ To manage Space notification settings, you will need the following: 2. Click on the **Alerts & Notifications** tab on the left-hand side. 3. Click on the **Notification Methods** tab. 4. You will be presented with a table of the configured notification methods for the Space. You will be able to: - 1. **Add a new** notification method configuration. - - Choose the service from the list of available ones. The available options will depend on your subscription plan. - - You can optionally provide a name for the configuration so you can easily refer to it. - - You can define the filtering criteria, regarding which Rooms the method will apply, and what notifications you want to receive (All Alerts and unreachable, All Alerts, Critical only). - - Depending on the service, different inputs will be present. Please note that there are mandatory and optional inputs. - - If you have doubts on how to configure the service, you can find a link at the top of the modal that takes you to the specific documentation page to help you. - 2. **Edit an existing** notification method configuration. Personal level ones can't be edited here, see [Manage User Notification Settings](#manage-user-notification-settings). You will be able to change: - - The name provided for it - - Filtering criteria - - Service-specific inputs - 3. **Enable/Disable** a given notification method configuration. - - Use the toggle to enable or disable the notification method configuration. - 4. **Delete an existing** notification method configuration. Netdata provided ones can't be deleted, e.g., Email. - - Use the trash icon to delete your configuration. + 1. **Add a new** notification method configuration. + - Choose the service from the list of available ones. The available options will depend on your subscription plan. + - You can optionally provide a name for the configuration so you can refer to it. + - You can define the filtering criteria, regarding which Rooms the method will apply, and what notifications you want to receive (All Alerts and unreachable, All Alerts, Critical only). + - Depending on the service, different inputs will be present. Please note that there are mandatory and optional inputs. + - If you have doubts on how to configure the service, you can find a link at the top of the modal that takes you to the specific documentation page to help you. + 2. **Edit an existing** notification method configuration. Personal level ones can't be edited here, see [Manage User Notification Settings](#manage-user-notification-settings). You will be able to change: + - The name provided for it + - Filtering criteria + - Service-specific inputs + 3. **Enable/Disable** a given notification method configuration. + - Use the toggle to enable or disable the notification method configuration. + 4. **Delete an existing** notification method configuration. Netdata provided ones can't be deleted, e.g., Email. + - Use the trash icon to delete your configuration. ## Manage User Notification Settings @@ -61,11 +61,11 @@ Note: If an administrator has disabled a Personal [service level](/docs/alerts-a 1. Click on the **User notification settings** shortcut on top of the help button. 2. You are presented with: - - The Personal [service level](/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/centralized-cloud-notifications-reference.md#service-level) notification methods you can manage. - - The list of Spaces and Rooms inside those where you have access to. + - The Personal [service level](/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/centralized-cloud-notifications-reference.md#service-level) notification methods you can manage. + - The list of Spaces and Rooms inside those where you have access to. 3. On this modal you will be able to: - 1. **Enable/Disable** the notification method for you; this applies across all Spaces and Rooms. - - Use the toggle to enable or disable the notification method. - 2. **Define what notifications you want** per Space/Room: All Alerts and unreachable, All Alerts, Critical only, or No notifications. - 3. **Activate notifications** for a Room you aren't a member of. - - From the **All Rooms** tab, click on the Join button for the Room(s) you want. + 1. **Enable/Disable** the notification method for you; this applies across all Spaces and Rooms. + - Use the toggle to enable or disable the notification method. + 2. **Define what notifications you want** per Space/Room: All Alerts and unreachable, All Alerts, Critical only, or No notifications. + 3. **Activate notifications** for a Room you aren't a member of. + - From the **All Rooms** tab, click on the Join button for the Room(s) you want. diff --git a/docs/category-overview-pages/working-with-logs.md b/docs/category-overview-pages/working-with-logs.md index d28074d2e87906..752fbb3f1260b8 100644 --- a/docs/category-overview-pages/working-with-logs.md +++ b/docs/category-overview-pages/working-with-logs.md @@ -1,8 +1,8 @@ # Working with Logs -This section talks about ways Netdata collects and visualizes logs. +This section talks about the ways Netdata collects and visualizes logs. -The [systemd journal plugin](/src/collectors/systemd-journal.plugin/) is the core Netdata component for reading systemd journal logs. +The [systemd journal plugin](/src/collectors/systemd-journal.plugin) is the core Netdata component for reading systemd journal logs. For structured logs, Netdata provides tools like [log2journal](/src/collectors/log2journal/README.md) and [systemd-cat-native](/src/libnetdata/log/systemd-cat-native.md) to convert them into compatible systemd journal entries. diff --git a/docs/dashboards-and-charts/README.md b/docs/dashboards-and-charts/README.md index 86d0d5240ea3ee..b32d97ccc2a1d1 100644 --- a/docs/dashboards-and-charts/README.md +++ b/docs/dashboards-and-charts/README.md @@ -7,7 +7,7 @@ When you access the Netdata dashboard through the Cloud, you'll always have the By default, the Agent dashboard shows the latest version (matching Netdata Cloud). However, there are a few exceptions: - Without internet access, the Agent can't download the newest dashboards. In this case, it will automatically use the bundled version. -- Users have defined, e.g. through URL bookmark, that they want to see the previous version of the dashboard (accessible `http://NODE:19999/v1`, replacing `NODE` with the IP address or hostname of your Agent). +- Users have defined, e.g., through URL bookmark that they want to see the previous version of the dashboard (accessible `http://NODE:19999/v1`, replacing `NODE` with the IP address or hostname of your Agent). ## Main sections @@ -37,4 +37,4 @@ You can access the dashboard at and [sign-in with a To view your Netdata dashboard, open a web browser and enter the address `http://NODE:19999` - replace `NODE` with your Agent's IP address or hostname. If the Agent is on the same machine, use `http://localhost:19999`. -Documentation for previous Agent dashboard can still be found [here](/src/web/gui/README.md). +Documentation for the previous Agent dashboard can still be found [here](/src/web/gui/README.md). diff --git a/docs/dashboards-and-charts/alerts-tab.md b/docs/dashboards-and-charts/alerts-tab.md index 66c019ec0fb9fb..712befa9fde0ac 100644 --- a/docs/dashboards-and-charts/alerts-tab.md +++ b/docs/dashboards-and-charts/alerts-tab.md @@ -4,34 +4,26 @@ Netdata comes with hundreds of pre-configured health alerts designed to notify y ## Active tab -From the Active tab you can see all the active alerts in your Room. You will be presented with a table having information about each alert that is in warning or critical state. +From the Active tab, you can see all the active alerts in your Room. You will be presented with a table having information about each alert that is in warning or critical state. You can always sort the table by a certain column by clicking on the name of that column, and using the gear icon on the top right to control which columns are visible at any given time. ### Filter alerts -From this tab, you can also filter alerts with the right hand bar. More specifically you can filter: - -- Alert status - - Filter based on the status of the alerts (e.g. Warning, Critical) -- Alert class - - Filter based on the class of the alert (e.g. Latency, Utilization, Workload etc.) -- Alert type & component - - Filter based on the alert's type (e.g. System, Web Server) and component (e.g. CPU, Disk, Load) -- Alert role - - Filter by the role that the alert is set to notify (e.g. Sysadmin, Webmaster etc.) -- Host labels - - Filter based on the host labels that are configured for the nodes across the Room (e.g. `_cloud_instance_region` to match `us-east-1`) -- Node status - - Filter by node availability status (e.g. Live or Offline) -- Netdata version - - Filter by Netdata version (e.g. `v1.45.3`) -- Nodes - - Filter the alerts based on the nodes of your Room. +From this tab, you can also filter alerts with the right-hand bar. More specifically, you can filter: + +- **Alert status**: Filter based on the status of the alerts (e.g., Warning, Critical) +- **Alert class**: Filter based on the class of the alert (e.g., Latency, Utilization, Workload, etc.) +- **Alert type & component**: Filter based on the alert's type (e.g., System, Web Server) and component (e.g., CPU, Disk, Load) +- **Alert role**: Filter by the role that the alert is set to notify (e.g., Sysadmin, Webmaster etc.) +- **Host labels**: Filter based on the host labels that are configured for the nodes across the Room (e.g., `_cloud_instance_region` to match `us-east-1`) +- **Node status**: Filter by node availability status (e.g., Live or Offline) +- **Netdata version**: Filter by Netdata version (e.g., `v1.45.3`) +- **Nodes**: Filter the alerts based on the nodes of your Room. ### View alert details -By clicking on the name of an entry of the table you can access that alert's details page, providing you with: +By clicking on the name of an entry of the table, you can access that alert's details page, providing you with: - Latest and Triggered time values - The alert's description @@ -49,9 +41,9 @@ From this tab, the "Silencing" column shows if there is any rule present for eac ## Alert Configurations tab -From this tab you can view all the configurations for all running alerts in your Room. Each row concerns one alert, and it provides information about it in the rest of the table columns. +From this tab, you can view all the configurations for all running alerts in your Room. Each row concerns one alert, and it provides information about it in the rest of the table columns. -By running alerts we mean alerts that are related to some metric that is or was collected. Netdata may have more alerts pre-configured that aren't applicable to your monitoring use-cases. +By running alerts, we mean alerts that are related to some metric that is or was collected. Netdata may have more alerts pre-configured that aren't applicable to your monitoring use-cases. You can control which columns are visible by using the gear icon on the right-hand side. @@ -61,6 +53,6 @@ Similarly to the previous tab, you can see the silencing status of an alert, whi From the actions column you can explore the alert's configuration, split by the different nodes that have this alert configured. -From there you can click on any of the rows to get to the individual alert configurations for that node. +From there, you can click on any of the rows to get to the individual alert configurations for that node. -Click on an alert row to see the alert's page, with all the information about when it was last triggered and what it's configuration is. +Click on an alert row to see the alert's page, with all the information about when it was last triggered, and what its configuration is. diff --git a/docs/dashboards-and-charts/anomaly-advisor-tab.md b/docs/dashboards-and-charts/anomaly-advisor-tab.md index bf3243ef17011e..92e849c74a2eff 100644 --- a/docs/dashboards-and-charts/anomaly-advisor-tab.md +++ b/docs/dashboards-and-charts/anomaly-advisor-tab.md @@ -22,5 +22,5 @@ The right side of the page displays an anomaly index for the highlighted timelin ## Usage Tips -- If you are interested in a subset of specific nodes then filtering to just those nodes before highlighting is recommended to get better results. When you highlight a timeframe, Netdata will ask the Agents for a ranking across all metrics, so if there is a subset of nodes there will be less "averaging" going on and you'll get a less noisy ranking. -- Ideally try and highlight close to a spike or window of interest so that the resulting ranking can narrow-in more easily on the timeline you are interested in. +- If you are interested in a subset of specific nodes, then filtering to just those nodes before highlighting is recommended to get better results. When you highlight a timeframe, Netdata will ask the Agents for a ranking across all metrics, so if there is a subset of nodes, there will be less "averaging" going on, and you'll get a less noisy ranking. +- Ideally, try and highlight close to a spike or window of interest so that the resulting ranking can narrow-in more easily on the timeline you are interested in. diff --git a/docs/dashboards-and-charts/dashboards-tab.md b/docs/dashboards-and-charts/dashboards-tab.md index d5363b7c72063e..fa85a7e3696fd3 100644 --- a/docs/dashboards-and-charts/dashboards-tab.md +++ b/docs/dashboards-and-charts/dashboards-tab.md @@ -10,18 +10,18 @@ From the Dashboards tab, click on the **+** button. In the modal, give your custom dashboard a name, and click **+ Add**. -- The **Add Chart** button on the top right of the interface adds your first chart card. From the dropdown, select either **All Nodes** or a specific node. - +- The **Add Chart** button on the top right of the interface adds your first chart card. From the dropdown, select either **All Nodes** or a specific node. + Next, select the context. You'll see a preview of the chart before you finish adding it. In this modal you can also [interact with the chart](/docs/dashboards-and-charts/netdata-charts.md), meaning you can configure all the aspects of the [NIDL framework](/docs/dashboards-and-charts/netdata-charts.md#nidl-framework) of the chart and more in detail, you can: - - define which `group by` method to use - - select the aggregation function over the data source - - select nodes - - select instances - - select dimensions - - select labels - - select the aggregation function over time - - After you are done configuring the chart, you can also change the type of the chart from the right hand side of the [Title bar](/docs/dashboards-and-charts/netdata-charts.md#title-bar), and select which of the final dimensions you want to be visible and in what order, from the [Dimensions bar](/docs/dashboards-and-charts/netdata-charts.md#dimensions-bar). + - define which `group by` method to use + - select the aggregation function over the data source + - select nodes + - select instances + - select dimensions + - select labels + - select the aggregation function over time + + After you are done configuring the chart, you can also change the type of the chart from the right-hand side of the [Title bar](/docs/dashboards-and-charts/netdata-charts.md#title-bar), and select which of the final dimensions you want to be visible and in what order, from the [Dimensions bar](/docs/dashboards-and-charts/netdata-charts.md#dimensions-bar). - The **Add Text** button on the top right of the interface creates a new card with user-defined text, which you can use to describe or document a particular dashboard's meaning and purpose. @@ -52,11 +52,11 @@ new location. Once you release your mouse, other elements re-sort to the grid sy ### Resize elements -To resize any element on a dashboard, click on the bottom-right corner and drag it to its new size. Other elements re-sort to the grid system automatically. +To resize any element on a dashboard, click in the bottom-right corner and drag it to its new size. Other elements re-sort to the grid system automatically. ### Go to chart -Quickly jump to the location of the chart in either the [Metrics tab](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md) or if the chart refers to a single node, its single node dashboard by clicking the 3-dot icon in the corner of any chart to open a menu. Hit the **Go to Chart** item. +Quickly jump to the location of the chart in either the [Metrics tab](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md) or if the chart refers to a single node, its single node dashboard by clicking the 3-dot icon in the corner of any chart to open a menu. Hit the **Go-to-Chart** item. You'll land directly on that chart of interest, but you can now scroll up and down to correlate your findings with other charts. Of course, you can continue to zoom, highlight, and pan through time just as you're used to with Netdata Charts. @@ -93,4 +93,4 @@ Because of the visual complexity of individual charts, dashboards require a mini ## What's next? -Once you've designed a dashboard or two, make sure to [invite your team](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#invite-your-team) if you haven't already. You can add these new users to the same Room to let them see the same dashboards without any effort. +Once you've designed a dashboard or two, make sure to [invite your team](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#team-collaboration) if you haven't already. You can add these new users to the same Room to let them see the same dashboards without any effort. diff --git a/docs/dashboards-and-charts/events-feed.md b/docs/dashboards-and-charts/events-feed.md index 8e31ebb5f453f0..314e6c6f2847c7 100644 --- a/docs/dashboards-and-charts/events-feed.md +++ b/docs/dashboards-and-charts/events-feed.md @@ -61,14 +61,14 @@ At a high-level view, these are the domains from which the Events feed will prov ## Who can access the events? -All users will be able to see events from the Topology and Alerts domain but Auditing events, once these are added, will only be accessible to administrators. For more details check the [Netdata Role-Based Access model](/docs/netdata-cloud/authentication-and-authorization/role-based-access-model.md). +All users will be able to see events from the Topology and Alerts domain, but Auditing events, once these are added, will only be accessible to administrators. For more details, check the [Netdata Role-Based Access model](/docs/netdata-cloud/authentication-and-authorization/role-based-access-model.md). ## How to use the events feed 1. Click on the **Events** tab (located near the top of your screen) 2. You will be presented with a table listing the events that occurred from the timeframe defined on the [date time picker](/docs/dashboards-and-charts/visualization-date-and-time-controls.md#date-and-time-selector) -3. You can use the filtering capabilities available on right-hand bar to slice through the results provided +3. You can use the filtering capabilities available on the right-hand bar to slice through the results provided > **Note** > -> When you try to query a longer period than what your space allows you will see an error message highlighting that you are querying data outside your plan. +> When you try to query a longer period than what your space allows, you will see an error message highlighting that you are querying data outside your plan. diff --git a/docs/dashboards-and-charts/home-tab.md b/docs/dashboards-and-charts/home-tab.md index 23764815ff73bd..15c75a22028ac2 100644 --- a/docs/dashboards-and-charts/home-tab.md +++ b/docs/dashboards-and-charts/home-tab.md @@ -16,20 +16,20 @@ A map consisting of node entries allows for quick hoverable information about ea The map classification can be altered, allowing the categorization of nodes by: -- Status (e.g. Live) -- OS (e.g. Ubuntu) -- Technology (e.g. Container) -- Agent version (e.g. v1.45.2) -- Replication factor (e.g. Single, Multi) -- Cloud provider (e.g AWS) -- Cloud region (e.g. us-east-1) -- Instance type (e.g. c6a.xlarge) +- Status (e.g.,. Live) +- OS (e.g., Ubuntu) +- Technology (e.g., Container) +- Agent version (e.g., v1.45.2) +- Replication factor (e.g., Single, Multi) +- Cloud provider (e.g., AWS) +- Cloud region (e.g., us-east-1) +- Instance type (e.g., c6a.xlarge) Color-coding can also be configured between: -- Status (e.g. Live, Offline) -- Connection stability (e.g. Stable, Unstable) -- Replication factor (e.g. None, Single) +- Status (e.g., Live, Offline) +- Connection stability (e.g., Stable, Unstable) +- Replication factor (e.g., None, Single) ## Data replication @@ -49,7 +49,7 @@ The second table contains the top alerts in the last 24 hours, along with their ## Netdata Assistant shortcut -In the Home tab there is a shortcut button in order to start an instant conversation with the [Netdata Assistant](https://github.com/netdata/netdata/edit/master/docs/netdata-assistant.md). +In the Home tab, there is a shortcut button to start an instant conversation with the [Netdata Assistant](https://github.com/netdata/netdata/edit/master/docs/netdata-assistant.md). ## Space metrics diff --git a/docs/dashboards-and-charts/import-export-print-snapshot.md b/docs/dashboards-and-charts/import-export-print-snapshot.md index f2df15dab80924..9fd8bd5470f594 100644 --- a/docs/dashboards-and-charts/import-export-print-snapshot.md +++ b/docs/dashboards-and-charts/import-export-print-snapshot.md @@ -1,7 +1,7 @@ # Import, export, and print a snapshot ->❗This feature is only available on v1 dashboards, it hasn't been port-forwarded to v2. -> For more information on accessing dashboards check [this documentation](/docs/dashboards-and-charts/README.md). +> ❗This feature is only available on v1 dashboards, it hasn't been port-forwarded to v2. +> For more information on accessing dashboards, check [this documentation](/docs/dashboards-and-charts/README.md). Netdata can export snapshots of the contents of your dashboard at a given time, which you can then import into any other node running Netdata. Or, you can create a print-ready version of your dashboard to save to PDF or actually print to @@ -12,9 +12,7 @@ timeframe](/docs/dashboards-and-charts/visualization-date-and-time-controls.md) colleague for further analysis. -Or, send the Netdata team a snapshot of your dashboard when [filing a bug -report](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml) on -GitHub. +Or, send the Netdata team a snapshot of your dashboard when [filing a bug report](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml) on GitHub. ![The export, import, and print buttons](https://user-images.githubusercontent.com/1153921/114218399-360fb600-991e-11eb-8dea-fabd2bffc5b3.gif) @@ -31,7 +29,7 @@ snapshot and the system from which it was taken. Click **Import** to begin to pr Netdata takes the data embedded inside the snapshot and re-creates a static replica on your dashboard. When the import finishes, you're free to move around and examine the charts. -Some caveats and tips to keep in mind: +Some warnings and tips to keep in mind: - Only metrics in the export timeframe are available to you. If you zoom out or pan through time, you'll see the beginning and end of the snapshot. diff --git a/docs/dashboards-and-charts/kubernetes-tab.md b/docs/dashboards-and-charts/kubernetes-tab.md index 3289615f0f87eb..388126f850890a 100644 --- a/docs/dashboards-and-charts/kubernetes-tab.md +++ b/docs/dashboards-and-charts/kubernetes-tab.md @@ -32,7 +32,7 @@ their associated pods. ## Kubernetes Containers overview -At the top of the Kubernetes containers section there is a map, that with a given context colorizes the containers in terms of their utilization. +At the top of the Kubernetes containers section, there is a map that with a given context colorizes the containers in terms of their utilization. The filtering of this map is controlled by using the [NIDL framework](/docs/dashboards-and-charts/netdata-charts.md#nidl-framework) from the definition bar of the chart. diff --git a/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md b/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md index bf31b8a714b818..6a6f0cde20b2c7 100644 --- a/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md +++ b/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md @@ -16,10 +16,10 @@ On the right-hand side, there is a bar that: - Allows for quick navigation through the sections of the dashboard - Provides a filtering mechanism that can filter charts by: - - Host labels - - Node status - - Netdata version - - Individual nodes + - Host labels + - Node status + - Netdata version + - Individual nodes - Presents the active alerts for the Room From this bar you can also view the maximum chart anomaly rate on each menu section by clicking the `AR%` button. diff --git a/docs/dashboards-and-charts/netdata-charts.md b/docs/dashboards-and-charts/netdata-charts.md index 50b7c15a2b5bfb..e24f9145c49cec 100644 --- a/docs/dashboards-and-charts/netdata-charts.md +++ b/docs/dashboards-and-charts/netdata-charts.md @@ -9,7 +9,7 @@ These charts provide a lot of useful information, so that you can: - Enjoy the high-resolution, granular metrics collected by Netdata - Examine all the metrics by hovering over them with your cursor -- Filter the metrics in any way you want using the [Definition bar](#definition-bar) +- Filter the metrics in any way you want to use the [Definition bar](#definition-bar) - View the combined anomaly rate of all underlying data with the [Anomaly Rate ribbon](#anomaly-rate-ribbon) - Explore even more details about a chart's metrics through [hovering over certain elements of it](#hover-over-the-chart) - Use intuitive tooling and shortcuts to pan, zoom or highlight areas of interest in your charts @@ -28,20 +28,20 @@ A Netdata chart looks like this: A Netdata Chart -With a quick glance you have immediate information available at your disposal: +With a quick glance, you have immediate information available at your disposal: - [Chart title and units](#title-bar) - [Anomaly Rate ribbon](#anomaly-rate-ribbon) - [Definition bar](#definition-bar) -- [Tool bar](#tool-bar) +- [Toolbar](#toolbar) - [Chart area](#hover-over-the-chart) - [Legend with dimensions](#dimensions-bar) ## Fundamental elements -While Netdata's charts require no configuration and are easy to interact with, they have a lot of underlying complexity. To meaningfully organize charts out of the box based on what's happening in your nodes, Netdata uses the concepts of [dimensions](#dimensions), [contexts](#contexts), and [families](#families). +While Netdata's charts require no configuration and are easy to interact with, they have a lot of underlying complexities. To meaningfully organize charts out of the box based on what's happening in your nodes, Netdata uses the concepts of [dimensions](#dimensions), [contexts](#contexts), and [families](#families). -Understanding how these work will help you more easily navigate the dashboard, +Understanding how these works will help you more easily navigate the dashboard, [write new alerts](/src/health/REFERENCE.md), or play around with the [API](/src/web/api/README.md). @@ -52,7 +52,7 @@ average (the default), minimum, or maximum. These values can then be given any t utilization is represented as a percentage, disk I/O as `MiB/s`, and available RAM as an absolute value in `MiB` or `GiB`. -Beneath every chart (or on the right-side if you configure the dashboard) is a legend of dimensions. When there are +Beneath every chart (or on the right side if you configure the dashboard) is a legend of dimensions. When there are multiple dimensions, you'll see a different entry in the legend for each dimension. The **Apps CPU Time** chart (with the [context](#contexts) `apps.cpu`), which visualizes CPU utilization of @@ -108,7 +108,7 @@ Title bar elements: - **Chart title**: on the chart title you can see the title together with the metric being displayed, as well as the unit of measurement. - **Chart status icon**: possible values are: Loading, Timeout, Error or No data, otherwise this icon is not shown. -Along with viewing chart type, context and units, on this bar you have access to immediate actions over the chart: +Along with viewing chart type, context, and units, on this bar you have access to immediate actions over the chart: Netdata Chart Title bar immediate actions @@ -140,7 +140,7 @@ Each composite chart has a definition bar to provide information and options abo To help users instantly understand and validate the data they see on charts, we developed the NIDL (Nodes, Instances, Dimensions, Labels) framework. This information is visualized on all charts. -> You can explore the in-depth infographic, by clicking on this image and opening it in a new tab, +> You can explore the in-depth infographic by clicking on this image and opening it in a new tab, > allowing you to zoom in to the different parts of it. > > @@ -153,9 +153,9 @@ At the Definition bar of each chart, there are a few dropdown menus: Netdata Chart NIDL Dropdown menus -These dropdown menus have 2 functions: +These dropdown menus have two functions: -1. Provide additional information about the visualized chart, to help with understanding the data that is presented. +1. Provide additional information about the visualized chart to help with understanding the data that is presented. 2. Provide filtering and grouping capabilities, altering the query on the fly, to help get different views of the dataset. The NIDL framework attaches metadata to every metric that is collected to provide for each of them the following consolidated data for the visible time frame: @@ -164,18 +164,18 @@ The NIDL framework attaches metadata to every metric that is collected to provid 2. The anomaly rate of each of them for the time-frame of the query. This is used to quickly spot which of the nodes, instances, dimensions or labels have anomalies in the requested time-frame. 3. The minimum, average and maximum values of all the points used for the query. This is used to quickly spot which of the nodes, instances, dimensions or labels are responsible for a spike or a dive in the chart. -All of these dropdown menus can be used for instantly filtering the information shown, by including or excluding specific nodes, instances, dimensions or labels. Directly from the dropdown menu, without the need to edit a query string and without any additional knowledge of the underlying data. +All of these dropdown menus can be used for instantly filtering the information shown by including or excluding specific nodes, instances, dimensions or labels. Directly from the dropdown menu, without the need to edit a query string and without any additional knowledge of the underlying data. ### Group by dropdown -The "Group by" dropdown menu allows selecting 1 or more groupings to be applied at once on the same dataset. +The "Group by" dropdown menu allows selecting one or more groupings to be applied at once on the same dataset. Netdata Chart Group by dropdown It supports: -1. **Group by Node**, to summarize the data of each node, and provide one dimension on the chart for each of the nodes involved. Filtering nodes is supported at the same time, using the nodes dropdown menu. -2. **Group by Instance**, to summarize the data of each instance and provide one dimension on the chart for each of the instances involved. Filtering instances is supported at the same time, using the instances dropdown menu. +1. **Group by Node**, to summarize the data of each node, and provide one dimension on the chart for each of the nodes involved. Filtering nodes is supported at the same time, using the node dropdown menu. +2. **Group by Instance**, to summarize the data of each instance and provide one dimension on the chart for each of the instances involved. Filtering instances is supported at the same time, using the instance dropdown menu. 3. **Group by Dimension**, so that each metric in the visualization is the aggregation of a single dimension. This provides a per dimension view of the data from all the nodes in the Room, taking into account filtering criteria if defined. 4. **Group by Label**, to summarize the data for each label value. Multiple label keys can be selected at the same time. @@ -184,7 +184,7 @@ Using this menu, you can slice and dice the data in any possible way, to quickly > ### Tip > > A very pertinent example is composite charts over contexts related to cgroups (VMs and containers). -> You have the means to change the default group by or apply filtering to get a better view into what data your are trying to analyze. +> You have the means to change the default group by or apply filtering to get a better view into what data you’re trying to analyze. > For example, if you change the group by to _instance_ you get a view with the data of all the instances (cgroups) that contribute to that chart. > Then you can use further filtering tools to focus the data that is important to you and even save the result to your own dashboards. > @@ -202,7 +202,7 @@ For example, the `system.cpu` chart shows the average for each dimension from ev The following aggregate functions are available for each selected dimension: -- **Average**: Displays the average value from contributing nodes. If a composite chart has 5 nodes with the following +- **Average**: Displays the average value from contributing nodes. If a composite chart has five nodes with the following values for the `out` dimension—`-2.1`, `-5.5`, `-10.2`, `-15`, `-0.1`—the composite chart displays a value of `−6.58`. - **Sum**: Displays the sum of contributed values. Using the same nodes, dimension, and values as above, the composite @@ -215,50 +215,50 @@ The following aggregate functions are available for each selected dimension: ### Nodes dropdown In this dropdown, you can view or filter the nodes contributing time-series metrics to the chart. -This menu also provides the contribution of each node to the volume of the chart, and a break down of the anomaly rate of the queried data per node. +This menu also provides the contribution of each node to the volume of the chart, and a break-down of the anomaly rate of the queried data per node. Netdata Chart Nodes dropdown If one or more nodes can't contribute to a given chart, the definition bar shows a warning symbol plus the number of affected nodes, then lists them in the dropdown along with the associated error. Nodes might return errors because of -networking issues, a stopped `netdata` service, or because that node does not have any metrics for that context. +networking issues, a stopped `netdata` service, or because that node doesn’t have any metrics for that context. ### Instances dropdown In this dropdown, you can view or filter the instances contributing time-series metrics to the chart. -This menu also provides the contribution of each instance to the volume of the chart, and a break down of the anomaly rate of the queried data per instance. +This menu also provides the contribution of each instance to the volume of the chart, and a break-down of the anomaly rate of the queried data per instance. Netdata Chart Instances dropdown ### Dimensions dropdown In this dropdown, you can view or filter the original dimensions contributing time-series metrics to the chart. -This menu also presents the contribution of each original dimensions on the chart, and a break down of the anomaly rate of the data per dimension. +This menu also presents the contribution of each original dimension on the chart, and a break-down of the anomaly rate of the data per dimension. Netdata Chart Dimensions Dropdown ### Labels dropdown In this dropdown, you can view or filter the contributing time-series labels of the chart. -This menu also presents the contribution of each label on the chart,and a break down of the anomaly rate of the data per label. +This menu also presents the contribution of each label on the chart,and a break-down of the anomaly rate of the data per label. Netdata Chart Labels Dropdown ### Aggregate functions over time -When the granularity of the data collected is higher than the plotted points on the chart an aggregation function over +When the granularity of the data collected is higher than the plotted points on the chart, an aggregation function over time is applied. Netdata Chart Aggregate functions over time -By default the aggregation applied is _average_ but the user can choose different options from the following: +By default, the aggregation applied is _average_ but the user can choose different options from the following: - Min, Max, Average or Sum - Percentile - you can specify the percentile you want to focus on: 25th, 50th, 75th, 80th, 90th, 95th, 97th, 98th and 99th. Netdata Chart Aggregate functions over time Percentile selection - Trimmed Mean or Trimmed Median - - you can choose the percentage of data tha you want to focus on: 1%, 2%, 3%, 5%, 10%, 15%, 20% and 25%. + - you can choose the percentage of data that you want to focus on: 1%, 2%, 3%, 5%, 10%, 15%, 20% and 25%. Netdata Chart Aggregate functions over time Trimmed Mean or Median selection - Median - Standard deviation @@ -281,13 +281,13 @@ If the value collected is an outlier, it is marked as anomalous. Netdata Chart Anomaly Rate Ribbon -This unmatched capability of real-time predictions as data is collected allows you to **detect anomalies for potentially millions of metrics across your entire infrastructure within a second of occurrence**. +This unmatched capability of real-time predictions, as data is collected, allows you to **detect anomalies for potentially millions of metrics across your entire infrastructure within a second of occurrence**. The Anomaly Rate ribbon on top of each chart visualizes the combined anomaly rate of all the underlying data, highlighting areas of interest that may not be easily visible to the naked eye. Hovering over the Anomaly Rate ribbon provides a histogram of the anomaly rates per presented dimension, for the specific point in time. -Anomaly Rate visualization does not make Netdata slower. Anomaly rate is saved in the Netdata database, together with metric values, and due to the smart design of Netdata, it does not even incur a disk footprint penalty. +Anomaly Rate visualization doesn’t make Netdata slower. Anomaly rate is saved in the Netdata database, together with metric values, and due to the smart design of Netdata, it doesn’t even incur a disk footprint penalty. ## Hover over the chart @@ -305,7 +305,7 @@ When hovering the anomaly ribbon, the overlay sorts all dimensions by anomaly ra Additionally, when hovering over the chart, the overlay may display an indication in the "Info" column. Currently, this column is used to inform users of any data collection issues that might affect the chart. -Below each chart, there is an information ribbon. This ribbon currently shows 3 states related to the points presented in the chart: +Below each chart, there is an information ribbon. This ribbon currently shows three states related to the points presented in the chart: 1. **Partial Data** At least one of the dimensions in the chart has partial data, meaning that not all instances available contributed data to this point. This can happen when a container is stopped, or when a node is restarted. This indicator helps to gain confidence of the dataset, in situations when unusual spikes or dives appear due to infrastructure maintenance, or due to failures to part of the infrastructure. @@ -323,14 +323,14 @@ All these indicators are also visualized per dimension, in the pop-over that app ## Play, Pause and Reset Your charts are controlled using the available [Time controls](/docs/dashboards-and-charts/visualization-date-and-time-controls.md#time-controls). -Besides these, when interacting with the chart you can also activate these controls by: +Besides these, when interacting with the chart, you can also activate these controls by: - Hovering over any chart to temporarily pause it - this momentarily switches time control to Pause, so that you can - hover over a specific timeframe. When moving out of the chart time control will go back to Play (if it was it's - previous state) + hover over a specific timeframe. When moving out of the chart, time control will go back to Play (if it was, it's + a previous state) - Clicking on the chart to lock it - this enables the Pause option on the time controls, to the current timeframe. This is if you want to jump to a different chart to look for possible correlations. -- Double clicking to release a previously locked chart - move the time control back to Play +- Double-clicking to release a previously locked chart - move the time control back to Play | Interaction | Keyboard/mouse | Touchpad/touchscreen | Time control | |:------------------|:---------------|:---------------------|:----------------------| @@ -338,11 +338,11 @@ Besides these, when interacting with the chart you can also activate these contr | **Stop** a chart | `click` | `tap` | **Pause** | | **Reset** a chart | `double click` | `n/a` | **Play** | -Note: These interactions are available when the default "Pan" action is used from the [Tool Bar](#tool-bar). +Note: These interactions are available when the default "Pan" action is used from the [Tool Bar](#toolbar). -## Tool bar +## Toolbar -While exploring the chart, a tool bar will appear. This tool bar is there to support you on this task. +While exploring the chart, a toolbar will appear. This toolbar is there to support you on this task. The available manipulation tools you can select are: Netdata Chart Tool bar @@ -375,11 +375,11 @@ Selecting timeframes is useful when you see an interesting spike or change in a > **Note** > -> To clear a highlighted timeframe, simply click on the chart area. +> To clear a highlighted timeframe, click on the chart area. ### Select and zoom -You can zoom to a specific timeframe, either horizontally of vertically, by selecting a timeframe. +You can zoom to a specific timeframe, either horizontally or vertically, by selecting a timeframe. | Interaction | Keyboard/mouse | Touchpad/touchscreen | |:-------------------------------------------|:-------------------------------------|:---------------------| @@ -421,4 +421,4 @@ behaving strangely. ## Resize a chart -To resize the chart, click-and-drag the icon on the bottom-right corner of any chart. To restore the chart to its original height, double-click the same icon. +To resize the chart, click-and-drag the icon in the bottom-right corner of any chart. To restore the chart to its original height, double-click the same icon. diff --git a/docs/dashboards-and-charts/node-filter.md b/docs/dashboards-and-charts/node-filter.md index 9f5371fff78ab0..28e6699e1a6893 100644 --- a/docs/dashboards-and-charts/node-filter.md +++ b/docs/dashboards-and-charts/node-filter.md @@ -4,11 +4,11 @@ The node filter allows you to quickly filter the nodes visualized in a Room's vi Inside the filter, the nodes get categorized into three groups: -| Group | Description | -|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Group | Description | +|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Live | Nodes that are currently online, collecting and streaming metrics to Cloud. Live nodes display raised [Alert](/docs/dashboards-and-charts/alerts-tab.md) counters, [Machine Learning](/src/ml/README.md) availability, and [Functions](/docs/top-monitoring-netdata-functions.md) availability | -| Stale | Nodes that are offline and not streaming metrics to Cloud. Only historical data can be presented from a parent node. For these nodes you can only see their ML status, as they are not online to provide more information | -| Offline | Nodes that are offline, not streaming metrics to Cloud and not available in any parent node. Offline nodes are automatically deleted after 30 days and can also be deleted manually. | +| Stale | Nodes that are offline and not streaming metrics to Cloud. Only historical data can be presented from a parent node. For these nodes you can only see their ML status, as they are not online to provide more information | +| Offline | Nodes that are offline, not streaming metrics to Cloud and not available in any parent node. Offline nodes are automatically deleted after 30 days and can also be deleted manually. | By using the search bar, you can narrow down to specific nodes based on their name. diff --git a/docs/dashboards-and-charts/nodes-tab.md b/docs/dashboards-and-charts/nodes-tab.md index e73d556fb9c1f0..704762c247d527 100644 --- a/docs/dashboards-and-charts/nodes-tab.md +++ b/docs/dashboards-and-charts/nodes-tab.md @@ -1,6 +1,6 @@ # Nodes tab -The nodes tab provides a summarized view of your [Room](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#rooms), allowing you to view quick information per node. +The Nodes tab provides a summarized view of your [Room](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#rooms), allowing you to view quick information per node. > **Tip** > @@ -22,7 +22,7 @@ In the top right-hand corner, you can: Each node row allows you to: - View the node's status -- Go to a single node dashboard, by clicking the node name +- Go to a single node dashboard by clicking the node name - View information about the node, along with a button to display more in the right-hand sidebar - View active alerts for the node - View Machine Learning status diff --git a/docs/dashboards-and-charts/top-tab.md b/docs/dashboards-and-charts/top-tab.md index 6b96010a756b13..7a148be227b9ae 100644 --- a/docs/dashboards-and-charts/top-tab.md +++ b/docs/dashboards-and-charts/top-tab.md @@ -9,7 +9,7 @@ They can be used to retrieve additional information to help you troubleshoot or > > **Note** > -> If you get an error saying that your node can't execute Functions please check the [prerequisites](/docs/top-monitoring-netdata-functions.md#prerequisites). +> If you get an error saying that your node can't execute Functions, check the [prerequisites](/docs/top-monitoring-netdata-functions.md). The main view of this tab provides you with (depending on the Function) two elements: a visualization on the top and a table on the bottom. @@ -24,4 +24,4 @@ On the top right-hand corner you can: The bar on the right-hand side allows you to select which Function to run, on which node, and then depending on the Function, there might be more fine-grained filtering available. -For example the `Block-devices` Function allows you to filter per Device, Type, ID, Model and Serial number or the Block devices on your node. +For example, the `Block-devices` Function allows you to filter per Device, Type, ID, Model and Serial number or the Block devices on your node. diff --git a/docs/dashboards-and-charts/visualization-date-and-time-controls.md b/docs/dashboards-and-charts/visualization-date-and-time-controls.md index 434821e15e689a..1e82c96ccd7f05 100644 --- a/docs/dashboards-and-charts/visualization-date-and-time-controls.md +++ b/docs/dashboards-and-charts/visualization-date-and-time-controls.md @@ -2,7 +2,7 @@ Netdata's dashboard features powerful date visualization controls that include a time control, a timezone selector and a rich date and timeframe selector. -The controls come with useful defaults and rich customization, to help you narrow your focus when troubleshooting issues or anomalies. +The controls come with useful defaults and rich customization to help you narrow your focus when troubleshooting issues or anomalies. ## Time controls @@ -12,12 +12,12 @@ The time control provides you the following options: **Play**, **Pause** and **F - **Pause** - the content of the page isn't refreshed due to a manual request to pause it or, for example, when your investigating data on a chart (cursor is on top of a chart) - **Force Play** - the content of the page will be automatically refreshed even if this is in the background -With this, we aim to bring more clarity and allow you to distinguish if the content you are looking at is live or historical and also allow you to always refresh the content of the page when the tabs are in the background. +With this, we aim to bring more clarity and allow you to distinguish if the content you’re looking at is live or historical and also allow you to always refresh the content of the page when the tabs are in the background. Main use cases for **Force Play**: -- You use a terminal or deployment tools to do changes in your infra and want to see the effect immediately, Netdata is in the background, displaying the impact of these changes -- You want to have Netdata on the background, example displayed on a TV, to constantly see metrics through dashboards or to watch the alert status +- You use a terminal or deployment tools to do changes in your infra and want to see the effect immediately; Netdata is in the background, displaying the impact of these changes +- You want to have Netdata in the background, example displayed on a TV, to constantly see metrics through dashboards or to watch the alert status ![The time control with Play, Pause and Force Play](https://user-images.githubusercontent.com/70198089/225850250-1fe12477-23f8-4b4d-b497-79b416963e10.png) @@ -27,10 +27,10 @@ The date and time selector allows you to change the visible timeframe and change ### Pick timeframes to visualize -While [panning through time and zooming in/out](/docs/dashboards-and-charts/netdata-charts.md) from charts it is helpful when you're looking a recent history, or want to do granular troubleshooting, what if you want to see metrics from 6 hours ago? Or 6 days? +While [panning through time and zooming in/out](/docs/dashboards-and-charts/netdata-charts.md) from charts, it is helpful when you're looking a recent history, or want to do granular troubleshooting, what if you want to see metrics from 6 hours ago? Or 6 days? Netdata's dashboard features a **timeframe selector** to help you visualize specific timeframes in a few helpful ways. -By default, it shows a certain number of minutes of historical metrics based on the your browser's viewport to ensure it's always showing per-second granularity. +By default, it shows a certain number of minutes of historical metrics based on your browser's viewport to ensure it's always showing per-second granularity. #### Open the timeframe selector @@ -75,8 +75,8 @@ timeframe begins at noon on the beginning and end dates. Click **Apply** to see across a larger period of time. **You can only see metrics as far back in history as your metrics retention policy allows**. Netdata uses an internal -time-series database (TSDB) to store as many metrics as it can within a specific amount of disk space. The default -storage is 256 MiB, which should be enough for 1-3 days of historical metrics. If you navigate back to a timeframe +time-series database (TSDB) to store as much metrics as it can within a specific amount of disk space. The default +storage is 256 MiB, which should be enough for 1–3 days of historical metrics. If you navigate back to a timeframe beyond stored historical metrics, you'll see this message: ![image](https://user-images.githubusercontent.com/70198089/225851033-43b95164-a651-48f2-8915-6aac9739ed93.png) diff --git a/docs/deployment-guides/deployment-with-centralization-points.md b/docs/deployment-guides/deployment-with-centralization-points.md index 87fd4a61a87c6b..5e3c0296c087bf 100644 --- a/docs/deployment-guides/deployment-with-centralization-points.md +++ b/docs/deployment-guides/deployment-with-centralization-points.md @@ -1,10 +1,10 @@ # Deployment with Centralization Points -An observability centralization point can centralize both metrics and logs. The sending systems are called Children, while the receiving systems are called a Parents. +An observability centralization point can centralize both metrics and logs. The sending systems are called Children, while the receiving systems are called Parents. When metrics and logs are centralized, the Children are never queried for metrics and logs. The Netdata Parents have all the data needed to satisfy queries. -- **Metrics** are centralized by Netdata, with a feature we call **Streaming**. The Parents listen for incoming connections and permit access only to Children that connect to it with the right API key. Children are configured to push their metrics to the Parents and they initiate the connections to do so. +- **Metrics** are centralized by Netdata, with a feature we call **Streaming**. The Parents listen for incoming connections and permit access only to Children that connect to it with the right API key. Children are configured to push their metrics to the Parents, and they initiate the connections to do so. - **Logs** are centralized with methodologies provided by `systemd-journald`. This involves installing `systemd-journal-remote` on both the Parent and the Children, and configuring the keys required for this communication. @@ -16,7 +16,7 @@ When metrics and logs are centralized, the Children are never queried for metric | Centrally dispatched alert notifications | Yes, at Netdata Cloud | | Data are exclusively on-prem | Yes, Netdata Cloud queries Netdata Agents to satisfy dashboard queries. | -A configuration with 2 observability centralization points, looks like this: +A configuration with 2 observability centralization points looks like this: ```mermaid flowchart LR diff --git a/docs/developer-and-contributor-corner/README.md b/docs/developer-and-contributor-corner/README.md index 817938126200a4..7399fcec583f71 100644 --- a/docs/developer-and-contributor-corner/README.md +++ b/docs/developer-and-contributor-corner/README.md @@ -1,3 +1,3 @@ # Developer and Contributor Corner -In this section of our Documentation you will find more advanced information, suited for developers and contributors alike. +In this section of our Documentation, you will find more advanced information, suited for developers and contributors alike. diff --git a/docs/developer-and-contributor-corner/collect-apache-nginx-web-logs.md b/docs/developer-and-contributor-corner/collect-apache-nginx-web-logs.md index 80e7a8b84a47b6..ee778233a741ab 100644 --- a/docs/developer-and-contributor-corner/collect-apache-nginx-web-logs.md +++ b/docs/developer-and-contributor-corner/collect-apache-nginx-web-logs.md @@ -2,7 +2,7 @@ Parsing web server log files with Netdata, revealing the volume of redirects, requests and other metrics, can give you a better overview of your infrastructure. -Too many bad requests? Maybe a recent deploy missed a few small SVG icons. Too many requests? Time to batten down the hatches—it's a DDoS. +Too many bad requests? Maybe a recent deployment missed a few small SVG icons. Too many requests? Time to batten down the hatches—it's a DDoS. You can use the [LTSV log format](http://ltsv.org/), track TLS and cipher usage, and the whole parser is faster than ever. In one test on a system with SSD storage, the collector consistently parsed the logs for 200,000 requests in @@ -30,33 +30,33 @@ and their default locations for log files: ```yaml # [ JOBS ] jobs: -# NGINX -# debian, arch + # NGINX + # debian, arch - name: nginx path: /var/log/nginx/access.log -# gentoo + # gentoo - name: nginx path: /var/log/nginx/localhost.access_log -# APACHE -# debian + # APACHE + # debian - name: apache path: /var/log/apache2/access.log -# gentoo + # gentoo - name: apache path: /var/log/apache2/access_log -# arch + # arch - name: apache path: /var/log/httpd/access_log -# debian + # debian - name: apache_vhosts path: /var/log/apache2/other_vhosts_access.log -# GUNICORN + # GUNICORN - name: gunicorn path: /var/log/gunicorn/access.log @@ -64,7 +64,7 @@ jobs: path: /var/log/gunicorn/gunicorn-access.log ``` -However, if your log files were not auto-detected, it might be because they are in a different location. Try the default +However, if your log files weren’t auto-detected, it might be because they’re in a different location. Try the default `web_log.conf` file. ```bash diff --git a/docs/developer-and-contributor-corner/customize.md b/docs/developer-and-contributor-corner/customize.md index 7d9895dc004440..af3215974c1cfe 100644 --- a/docs/developer-and-contributor-corner/customize.md +++ b/docs/developer-and-contributor-corner/customize.md @@ -9,7 +9,7 @@ thousands of metrics, you may want to alter your experience based on a particula ## Dashboard settings -To change dashboard settings, click the on the **settings** icon +To change dashboard settings, click on the **settings** icon ![Import icon](https://raw.githubusercontent.com/netdata/netdata-ui/98e31799c1ec0983f433537ff16d2ac2b0d994aa/src/components/icon/assets/gear.svg) in the top panel. diff --git a/docs/developer-and-contributor-corner/monitor-debug-applications-ebpf.md b/docs/developer-and-contributor-corner/monitor-debug-applications-ebpf.md index 56f0276bb279f8..bc86bbc6b95571 100644 --- a/docs/developer-and-contributor-corner/monitor-debug-applications-ebpf.md +++ b/docs/developer-and-contributor-corner/monitor-debug-applications-ebpf.md @@ -16,7 +16,7 @@ application so that you can monitor, troubleshoot, and debug to your heart's con ## Configure `apps.plugin` to recognize your custom application To start troubleshooting an application with eBPF metrics, you need to ensure your Netdata dashboard collects and -displays those metrics independent from any other process. +displays those metrics independent of any other process. You can use the `apps_groups.conf` file to configure which applications appear in charts generated by [`apps.plugin`](/src/collectors/apps.plugin/README.md). Once you edit this file and create a new group for the application @@ -107,7 +107,7 @@ to monitor this application. Scroll down to the **Applications** section. These with metrics specific to that process. Pay particular attention to the charts in the **ebpf file**, **ebpf syscall**, **ebpf process**, and **ebpf net** -sub-sections. These charts are populated by low-level Linux kernel metrics thanks to eBPF, and showcase the volume of +subsections. These charts are populated by low-level Linux kernel metrics thanks to eBPF, and showcase the volume of calls to open/close files, call functions like `do_fork`, IO activity on the VFS, and much more. See the [eBPF collector documentation](/src/collectors/ebpf.plugin/README.md#integration-with-appsplugin) for the full list @@ -126,7 +126,7 @@ charts](https://user-images.githubusercontent.com/1153921/85311677-a8380c80-b46a In these charts, you can see first a spike in syscalls to open and close files from the configure/build process, followed by a similar spike from the Apache benchmark. -> 👋 Don't forget that you can view chart data directly via Netdata's API! +> 👋 Remember that you can view chart data directly via Netdata's API! > > For example, open your browser and navigate to `http://NODE:19999/api/v1/data?chart=apps.file_open`, replacing `NODE` > with the IP address or hostname of your Agent. The API returns JSON of that chart's dimensions and metrics, which you @@ -215,7 +215,7 @@ process](https://user-images.githubusercontent.com/1153921/84831957-27e45800-afe Starting at 14:51:49, Netdata sees the `zombie` group creating one new process every second, but no closed tasks. This continues for roughly 30 seconds, at which point the factory program was killed with `SIGINT`, which results in the 31 -closed tasks in the subsequent second. +closed tasks in the later second. Zombie processes may not be catastrophic, but if you're developing an application on Linux, you should eliminate them. If a service in your stack creates them, you should consider filing a bug report. @@ -236,4 +236,4 @@ from any number of distributed nodes to see how your application interacts with systems. Now that you can see eBPF metrics in Netdata Cloud, you can [invite your -team](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#invite-your-team) and share your findings with others. +team](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#team-collaboration) and share your findings with others. diff --git a/docs/developer-and-contributor-corner/monitor-hadoop-cluster.md b/docs/developer-and-contributor-corner/monitor-hadoop-cluster.md index 2832c387c1c2e8..ec704292ca7728 100644 --- a/docs/developer-and-contributor-corner/monitor-hadoop-cluster.md +++ b/docs/developer-and-contributor-corner/monitor-hadoop-cluster.md @@ -3,7 +3,7 @@ Hadoop is an [Apache project](https://hadoop.apache.org/) is a framework for processing large sets of data across a distributed cluster of systems. -And while Hadoop is designed to be a highly-available and fault-tolerant service, those who operate a Hadoop cluster +And while Hadoop is designed to be a highly available and fault-tolerant service, those who operate a Hadoop cluster will want to monitor the health and performance of their [Hadoop Distributed File System (HDFS)](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html) and [Zookeeper](https://zookeeper.apache.org/) implementations. @@ -90,7 +90,7 @@ The JSON result for a DataNode's `/jmx` endpoint is slightly different: If Netdata can't access the `/jmx` endpoint for either a NameNode or DataNode, it will not be able to auto-detect and collect metrics from your HDFS implementation. -Zookeeper auto-detection relies on an accessible client port and a allow-listed `mntr` command. For more details on +Zookeeper auto-detection relies on an accessible client port and an allow-listed `mntr` command. For more details on `mntr`, see Zookeeper's documentation on [cluster options](https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_clusterOptions) and [Zookeeper commands](https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands). @@ -118,7 +118,7 @@ At the bottom of the file, you will see two example jobs, both of which are comm ``` Uncomment these lines and edit the `url` value(s) according to your setup. Now's the time to add any other configuration -details, which you can find inside of the `hdfs.conf` file itself. Most production implementations will require TLS +details, which you can find inside the `hdfs.conf` file itself. Most production implementations will require TLS certificates. The result for a simple HDFS setup, running entirely on `localhost` and without certificate authentication, might look diff --git a/docs/developer-and-contributor-corner/style-guide.md b/docs/developer-and-contributor-corner/style-guide.md index 16e07f54d06fb2..6d63be378230a1 100644 --- a/docs/developer-and-contributor-corner/style-guide.md +++ b/docs/developer-and-contributor-corner/style-guide.md @@ -24,7 +24,7 @@ One way we write empowering, educational content is by using a consistent voice _Voice_ is like your personality, which doesn't really change day to day. -_Tone_ is how you express your personality. Your expression changes based on your attitude or mood, or based on who +_Tone_ is how you express your personality. Your expression changes based on your attitude or mood, or based on whom you're around. In writing, you reflect tone in your word choice, punctuation, sentence structure, or emoji. The same idea about voice and tone applies to organizations, too. Our voice shouldn't change much between two pieces of @@ -35,7 +35,7 @@ content, no matter who wrote each, but the tone might be quite different based o Netdata's voice is authentic, passionate, playful, and respectful. - **Authentic** writing is honest and fact-driven. Focus on Netdata's strength while accurately communicating what - Netdata can and cannot do, and emphasize technical accuracy over hard sells and marketing jargon. + Netdata can and can’t do, and emphasize technical accuracy over hard sells and marketing jargon. - **Passionate** writing is strong and direct. Be a champion for the product or feature you're writing about, and let your unique personality and writing style shine. - **Playful** writing is friendly, thoughtful, and engaging. Don't take yourself too seriously, as long as it's not at @@ -45,7 +45,7 @@ Netdata's voice is authentic, passionate, playful, and respectful. ### Tone -Netdata's tone is fun and playful, but clarity and conciseness comes first. We also tend to be informal, and aren't +Netdata's tone is fun and playful, but clarity and conciseness come first. We also tend to be informal, and aren't afraid of a playful joke or two. While we have general standards for voice and tone, we do want every individual's unique writing style to reflect in @@ -61,14 +61,14 @@ of these are expanded into individual sections in the [language, grammar, and mechanics](#language-grammar-and-mechanics) section below. - Would this language make sense to someone who doesn't work here? -- Could someone quickly scan this document and understand the material? +- Could anyone quickly scan this document and understand the material? - Create an information hierarchy with key information presented first and clearly called out to improve clarity and readability. - Avoid directional language like "sidebar on the right of the page" or "header at the top of the page" since presentation elements may adapt for devices. - Use descriptive links rather than "click here" or "learn more". - Include alt text for images and image links. - Ensure any information contained within a graphic element is also available as plain text. -- Avoid idioms that may not be familiar to the user or that may not make sense when translated. +- Avoid idioms that may not be familiar to the user, or that may not make sense when translated. - Avoid local, cultural, or historical references that may be unfamiliar to users. - Prioritize active, direct language. - Avoid referring to someone's age unless it is directly relevant; likewise, avoid referring to people with age-related @@ -85,7 +85,7 @@ the [language, grammar, and mechanics](#language-grammar-and-mechanics) section ## Language, grammar, and mechanics -To ensure Netdata's writing is clear, concise, and universal, we have established standards for language, grammar, and +To ensure Netdata's writing is clear, concise, and universal, we’ve established standards for language, grammar, and certain writing mechanics. However, if you're writing about Netdata for an external publication, such as a guest blog post, follow that publication's style guide or standards, while keeping the [preferred spelling of Netdata terms](#netdata-specific-terms) in mind. @@ -134,7 +134,7 @@ to say, "Netdata's one-line installer requires fewer steps than manually install A particular word, phrase, or metaphor you're familiar with might not translate well to the other cultures featured among Netdata's global community. We recommended you avoid slang or colloquialisms in your writing. -In addition, don't use abbreviations that have not yet been defined in the content. See our section on +In addition, don't use abbreviations that haven’t yet been defined in the content. See our section on [abbreviations](#abbreviations-acronyms-and-initialisms) for additional guidance. If you must use industry jargon, such as "mean time to resolution," define the term as clearly and concisely as you can. @@ -162,7 +162,7 @@ capitalization. In summary: Whenever you refer to the company Netdata, Inc., or the open-source monitoring Agent the company develops, capitalize both words. -However, if you are referring to a process, user, or group on a Linux system, use lowercase and fence the word in an +However, if you’re referring to a process, user, or group on a Linux system, use lowercase and fence the word in an inline code block: `` `netdata` ``. | | | @@ -222,7 +222,7 @@ before "and" or "or." ### Future releases or features -Do not mention future releases or upcoming features in writing unless they have been previously communicated via a +Do not mention future releases or upcoming features in writing unless they’ve been previously communicated via a public roadmap. In particular, documentation must describe, as accurately as possible, the Netdata Agent _as of the [latest @@ -239,7 +239,7 @@ reader. | Not recommended | To install Netdata, click [here](/packaging/installer/README.md). | | **Recommended** | To install Netdata, read the [installation instructions](/packaging/installer/README.md). | -Use links as often as required to provide necessary context. Blog posts and guides require fewer hyperlinks than +Use links as often as required to provide the necessary context. Blog posts and guides require fewer hyperlinks than documentation. ### Contractions @@ -310,7 +310,7 @@ our writing more universal, and users on `sudo`-less systems are generally alrea differently. For example, most users need to use `sudo` with the `edit-config` script, because the Netdata config directory is owned -by the `netdata` user. Same goes for restarting the Netdata Agent with `systemctl`. +by the `netdata` user. The same goes for restarting the Netdata Agent with `systemctl`. | | | |-----------------|----------------------------------------------------------------------------------------------------------------------------------------------| @@ -447,7 +447,7 @@ The following tables describe the standard spelling, capitalization, and usage o | Term | Definition | |-----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| **Connected Node** | A node that you've proved ownership of by completing the [connecting to Cloud process](/src/claim/README.md). The claimed node will then appear in your Space and any Rooms you added it to. | +| **Connected Node** | A node that you've proved ownership of by completing the [connecting to Cloud process](/src/claim/README.md). The claimed node will then appear in your Space and any Rooms you added it to. | | **Netdata** | The company behind the open-source Netdata Agent and the Netdata Cloud web application. Never use _netdata_ or _NetData_.

In general, focus on the user's goals, actions, and solutions rather than what the company provides. For example, write _Learn more about enabling alert notifications on your preferred platforms_ instead of _Netdata sends alert notifications to your preferred platforms_. | | **Netdata Agent** | The free and open source [monitoring agent](https://github.com/netdata/netdata) that you can install on all of your distributed systems, whether they're physical, virtual, containerized, ephemeral, and more. The Agent monitors systems running Linux, Docker, Kubernetes, macOS, FreeBSD, and more, and collects metrics from hundreds of popular services and applications. | | **Netdata Cloud** | The web application hosted at [https://app.netdata.cloud](https://app.netdata.cloud) that helps you monitor an entire infrastructure of distributed systems in real time.

Never use _Cloud_ without the preceding _Netdata_ to avoid ambiguity. | @@ -463,5 +463,5 @@ The following tables describe the standard spelling, capitalization, and usage o | Term | Definition | |-----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | **filesystem** | Use instead of _file system_. | -| **pre-configured** | The concept that many of Netdata's features come with sane defaults that users don't need to configure to find immediate value. | +| **pre-configured** | The concept that many of Netdata's features come with sane defaults that users don't need to configure to find immediate value. | | **real time**/**real-time** | Use _real time_ as a noun phrase, most often with _in_: _Netdata collects metrics in real time_. Use _real-time_ as an adjective: _Netdata collects real-time metrics from hundreds of supported applications and services. | diff --git a/docs/exporting-metrics/README.md b/docs/exporting-metrics/README.md index 24e33ad4687141..33f26795ccdeec 100644 --- a/docs/exporting-metrics/README.md +++ b/docs/exporting-metrics/README.md @@ -1,9 +1,6 @@ # Export metrics to external time-series databases -Netdata allows you to export metrics to external time-series databases with the [exporting -engine](/src/exporting/README.md). This system uses a number of **connectors** to initiate connections to [more than -thirty](#supported-databases) supported databases, including InfluxDB, Prometheus, Graphite, ElasticSearch, and much -more. +Netdata allows you to export metrics to external time-series databases with the [exporting engine](/src/exporting/README.md). This system uses a number of **connectors** to initiate connections to [more than thirty](#supported-databases) supported databases, including InfluxDB, Prometheus, Graphite, ElasticSearch, and much more. The exporting engine resamples Netdata's thousands of per-second metrics at a user-configurable interval, and can export metrics to multiple time-series databases simultaneously. @@ -22,46 +19,38 @@ Netdata supports exporting metrics to the following databases through several [connectors](/src/exporting/README.md#features). Once you find the connector that works for your database, open its documentation and the [enabling a connector](/docs/exporting-metrics/enable-an-exporting-connector.md) doc for details on enabling it. -- **AppOptics**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **AWS Kinesis**: [AWS Kinesis Data Streams](/src/exporting/aws_kinesis/README.md) -- **Azure Data Explorer**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **Azure Event Hubs**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **Blueflood**: [Graphite](/src/exporting/graphite/README.md) -- **Chronix**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **Cortex**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **CrateDB**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **ElasticSearch**: [Graphite](/src/exporting/graphite/README.md), [Prometheus remote - write](/src/exporting/prometheus/remote_write/README.md) -- **Gnocchi**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **Google BigQuery**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **Google Cloud Pub/Sub**: [Google Cloud Pub/Sub Service](/src/exporting/pubsub/README.md) -- **Graphite**: [Graphite](/src/exporting/graphite/README.md), [Prometheus remote - write](/src/exporting/prometheus/remote_write/README.md) -- **InfluxDB**: [Graphite](/src/exporting/graphite/README.md), [Prometheus remote - write](/src/exporting/prometheus/remote_write/README.md) -- **IRONdb**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **JSON**: [JSON document databases](/src/exporting/json/README.md) -- **Kafka**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **KairosDB**: [Graphite](/src/exporting/graphite/README.md), [OpenTSDB](/src/exporting/opentsdb/README.md) -- **M3DB**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **MetricFire**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **MongoDB**: [MongoDB](/src/exporting/mongodb/README.md) -- **New Relic**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **OpenTSDB**: [OpenTSDB](/src/exporting/opentsdb/README.md), [Prometheus remote - write](/src/exporting/prometheus/remote_write/README.md) -- **PostgreSQL**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) - via [PostgreSQL Prometheus Adapter](https://github.com/CrunchyData/postgresql-prometheus-adapter) -- **Prometheus**: [Prometheus scraper](/src/exporting/prometheus/README.md) -- **TimescaleDB**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md), - [netdata-timescale-relay](/src/exporting/TIMESCALE.md) -- **QuasarDB**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **SignalFx**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **Splunk**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **TiKV**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **Thanos**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **VictoriaMetrics**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) -- **Wavefront**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) - -Can't find your preferred external time-series database? Ask our [community](https://community.netdata.cloud/) for -solutions, or file an [issue on -GitHub](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml). +- **AppOptics**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **AWS Kinesis**: [AWS Kinesis Data Streams](/src/exporting/aws_kinesis/README.md). +- **Azure Data Explorer**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **Azure Event Hubs**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **Blueflood**: [Graphite](/src/exporting/graphite/README.md). +- **Chronix**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **Cortex**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **CrateDB**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **ElasticSearch**: [Graphite](/src/exporting/graphite/README.md), [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **Gnocchi**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **Google BigQuery**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **Google Cloud Pub/Sub**: [Google Cloud Pub/Sub Service](/src/exporting/pubsub/README.md). +- **Graphite**: [Graphite](/src/exporting/graphite/README.md), [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **InfluxDB**: [Graphite](/src/exporting/graphite/README.md), [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **IRONdb**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **JSON**: [JSON document databases](/src/exporting/json/README.md). +- **Kafka**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **KairosDB**: [Graphite](/src/exporting/graphite/README.md), [OpenTSDB](/src/exporting/opentsdb/README.md). +- **M3DB**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **MetricFire**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **MongoDB**: [MongoDB](/src/exporting/mongodb/README.md). +- **New Relic**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **OpenTSDB**: [OpenTSDB](/src/exporting/opentsdb/README.md), [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **PostgreSQL**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md) via [PostgreSQL Prometheus Adapter](https://github.com/CrunchyData/postgresql-prometheus-adapter). +- **Prometheus**: [Prometheus scraper](/src/exporting/prometheus/README.md). +- **TimescaleDB**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md), [netdata-timescale-relay](/src/exporting/TIMESCALE.md). +- **QuasarDB**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **SignalFx**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **Splunk**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **TiKV**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **Thanos**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **VictoriaMetrics**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). +- **Wavefront**: [Prometheus remote write](/src/exporting/prometheus/remote_write/README.md). + +Can't find your preferred external time-series database? Ask our [community](https://community.netdata.cloud/) for solutions, or file an [issue on GitHub](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml). diff --git a/docs/glossary.md b/docs/glossary.md index 24afc4e5764461..ce9953d34fbddc 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -1,12 +1,10 @@ # Glossary -The Netdata community welcomes engineers, SREs, admins, etc. of all levels of expertise with engineering and the Netdata tool. And just as a journey of a thousand miles starts with one step, sometimes, the journey to mastery begins with understanding a single term. +Welcome to Netdata's terminology guide! Whether you're a seasoned engineer or just starting your journey in system monitoring, this glossary helps demystify the terms used throughout our documentation and community. -As such, we want to provide a little Glossary as a reference starting point for new users who might be confused about the Netdata vernacular that more familiar users might take for granted. +This glossary serves as a living document, continuously updated to help you understand Netdata concepts, features, and terminology. Each term links to detailed documentation for deeper exploration. -If you're here looking for the definition of a term you heard elsewhere in our community or products, or if you just want to learn Netdata from the ground up, you've come to the right page. - -Use the alphabetized list below to find the answer to your single-term questions, and click the bolded list items to explore more on the topics! We'll be sure to keep constantly updating this list, so if you hear a word that you would like for us to cover, just let us know or submit a request! +Missing a term? Let us know or submit a request to expand our glossary. Together, we can make Netdata more accessible to everyone. [A](#a) | [B](#b) | [C](#c) | [D](#d)| [E](#e) | [F](#f) | [G](#g) | [H](#h) | [I](#i) | [J](#j) | [K](#k) | [L](#l) | [M](#m) | [N](#n) | [O](#o) | [P](#p) | [Q](#q) | [R](#r) | [S](#s) | [T](#t) | [U](#u) | [V](#v) | [W](#w) | [X](#x) | [Y](#y) | [Z](#z) @@ -27,7 +25,7 @@ Use the alphabetized list below to find the answer to your single-term questions ## C -- [**Child**](/docs/observability-centralization-points/metrics-centralization-points/README.md): A node, running Netdata, that streams metric data to one or more parent. +- [**Child**](/docs/observability-centralization-points/metrics-centralization-points/README.md): A node, running Netdata, that streams metric data to one or more Parents. - [**Cloud** or **Netdata Cloud**](/docs/netdata-cloud/README.md): Netdata Cloud is a web application that gives you real-time visibility for your entire infrastructure. With Netdata Cloud, you can view key metrics, insightful charts, and active alerts from all your nodes in a single web interface. @@ -37,7 +35,7 @@ Use the alphabetized list below to find the answer to your single-term questions - [**Context**](/docs/dashboards-and-charts/netdata-charts.md#contexts): A way of grouping charts by the types of metrics collected and dimensions displayed. It's kind of like a machine-readable naming and organization scheme. -- [**Custom dashboards**](/src/web/gui/custom/README.md) A dashboard that you can create using simple HTML (no javascript is required for basic dashboards). +- [**Custom dashboards**](/src/web/gui/custom/README.md) A dashboard that you can create using simple HTML (no JavaScript is required for basic dashboards). ## D @@ -55,9 +53,9 @@ Use the alphabetized list below to find the answer to your single-term questions - [**Family**](/docs/dashboards-and-charts/netdata-charts.md#families): 1. What we consider our Netdata community of users and engineers. 2. A single instance of a hardware or software resource that needs to be displayed separately from similar instances. -- [**Flood Protection**](/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/centralized-cloud-notifications-reference.md#flood-protection): If a node has too many state changes like firing too many alerts or going from reachable to unreachable, Netdata Cloud enables flood protection. As long as a node is in flood protection mode, Netdata Cloud does not send notifications about this node +- [**Flood Protection**](/docs/alerts-and-notifications/notifications/centralized-cloud-notifications/centralized-cloud-notifications-reference.md#flood-protection): If a node has too many state changes like firing too many alerts or going from reachable to unreachable, Netdata Cloud enables flood protection. As long as a node is in flood protection mode, Netdata Cloud doesn’t send notifications about this node -- [**Functions** or **Netdata Functions**](/docs/top-monitoring-netdata-functions.md): Routines exposed by a collector on the Netdata Agent that can bring additional information to support troubleshooting or trigger some action to happen on the node itself. +- [**Functions** or **Netdata Functions**](/docs/top-monitoring-netdata-functions.md): Routines exposed by a collector on the Netdata Agent that can bring additional information to support troubleshooting or trigger some action to happen on the node itself. ## G @@ -81,7 +79,7 @@ Use the alphabetized list below to find the answer to your single-term questions ## M -- [**Metrics Collection**](/src/collectors/README.md): With zero configuration, Netdata auto-detects thousands of data sources upon starting and immediately collects per-second metrics. Netdata can immediately collect metrics from these endpoints thanks to 300+ collectors, which all come pre-installed when you install Netdata. +- [**Metrics Collection**](/src/collectors/README.md): With zero configuration, Netdata auto-detects thousands of data sources upon starting and immediately collects per-second metrics. Netdata can immediately collect metrics from these endpoints thanks to 300+ collectors, which all come pre-installed when you install Netdata. - [**Metric Correlations**](/docs/metric-correlations.md): A Netdata feature that lets you quickly find metrics and charts related to a particular window of interest that you want to explore further. @@ -91,17 +89,16 @@ Use the alphabetized list below to find the answer to your single-term questions - [**Metrics Streaming Replication**](/docs/observability-centralization-points/README.md): Each node running Netdata can stream the metrics it collects, in real time, to another node. Metric streaming allows you to replicate metrics data across multiple nodes, or centralize all your metrics data into a single time-series database (TSDB). - ## N - [**Netdata**](https://www.netdata.cloud/): Netdata is a monitoring tool designed by system administrators, DevOps engineers, and developers to collect everything, help you visualize -metrics, troubleshoot complex performance problems, and make data interoperable with the rest of your monitoring stack. + metrics, troubleshoot complex performance problems, and make data interoperable with the rest of your monitoring stack. - [**Netdata Agent** or **Agent**](/packaging/installer/README.md): Netdata's distributed monitoring Agent collects thousands of metrics from systems, hardware, and applications with zero configuration. It runs permanently on all your physical/virtual servers, containers, cloud deployments, and edge/IoT devices. - [**Netdata Cloud** or **Cloud**](/docs/netdata-cloud/README.md): Netdata Cloud is a web application that gives you real-time visibility for your entire infrastructure. With Netdata Cloud, you can view key metrics, insightful charts, and active alerts from all your nodes in a single web interface. -- [**Netdata Functions** or **Functions**](/docs/top-monitoring-netdata-functions.md): Routines exposed by a collector on the Netdata Agent that can bring additional information to support troubleshooting or trigger some action to happen on the node itself. +- [**Netdata Functions** or **Functions**](/docs/top-monitoring-netdata-functions.md): Routines exposed by a collector on the Netdata Agent that can bring additional information to support troubleshooting or trigger some action to happen on the node itself. ## O @@ -123,25 +120,25 @@ metrics, troubleshoot complex performance problems, and make data interoperable ## S -- [**Single Node Dashboard**](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md): A dashboard pre-configured with every installation of the Netdata Agent, with thousand of metrics and hundreds of interactive charts that requires no set up. +- [**Single Node Dashboard**](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md): A dashboard pre-configured with every installation of the Netdata Agent, with thousands of metrics and hundreds of interactive charts that requires no setup. -- [**Space**](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#spaces): A high-level container and virtual collaboration area where you can organize team members, access levels,and the nodes you want to monitor. +- [**Space**](/docs/netdata-cloud/organize-your-infrastructure-invite-your-team.md#spaces): A high-level container and virtual collaboration area where you can organize team members, access levels, and the nodes you want to monitor. ## T - [**Template Entity Type**](/src/health/REFERENCE.md#entity-types): Entity type that defines rules that apply to all charts of a specific context, and use the template label. Templates help you apply one entity to all disks, all network interfaces, all MySQL databases, and so on. -- [**Tiers**](/src/database/engine/README.md#tiers): Tiering is a mechanism of providing multiple tiers of data with different granularity of metrics (the frequency they are collected and stored, i.e. their resolution). +- [**Tiers**](/src/database/engine/README.md#tiers): Tiering is a mechanism of providing multiple tiers of data with different granularity of metrics (the frequency they are collected and stored, i.e., their resolution). ## U -- [**Unlimited Scalability**](https://www.netdata.cloud/#:~:text=love%20community%20contributions!-,Infinite%20Scalability,-By%20storing%20data): With Netdata's distributed architecture, you can seamless observe a couple, hundreds or -even thousands of nodes. There are no actual bottlenecks especially if you retain metrics locally in the Agents. +- [**Unlimited Scalability**](https://www.netdata.cloud/#:~:text=love%20community%20contributions!-,Infinite%20Scalability,-By%20storing%20data): With Netdata's distributed architecture, you can seamlessly observe a couple, hundreds or + even thousands of nodes. There are no actual bottlenecks especially if you retain metrics locally in the Agents. ## V -- [**Visualizations**](/docs/dashboards-and-charts/README.md): Netdata uses dimensions, contexts, and families to sort your metric data into graphs, charts, and alerts that maximize your understand of your infrastructure and your ability to troubleshoot it, along or on a team. +- [**Visualizations**](/docs/dashboards-and-charts/README.md): Netdata uses dimensions, contexts, and families to sort your metric data into graphs, charts, and alerts that maximize your understanding of your infrastructure and your ability to troubleshoot it, along or on a team. ## Z -- **Zero Configuration**: Netdata is pre-configured and capable to autodetect and monitor any well-known application that runs on your system. You just deploy and connect Netdata Agents in your Netdata space, and monitor them in seconds. +- **Zero Configuration**: Netdata is pre-configured and capable of autodetect and monitor any well-known application that runs on your system. You deploy and connect Netdata Agents in your Netdata space, and monitor them in seconds. diff --git a/docs/guidelines.md b/docs/guidelines.md index 02e7a386fda3e2..9059b9433aed34 100644 --- a/docs/guidelines.md +++ b/docs/guidelines.md @@ -1,16 +1,16 @@ # Contribute to the documentation -Welcome to our docs developer guidelines! +Welcome to our documentation developer guidelines! This document will guide you to the process of contributing to our docs (**learn.netdata.cloud**) ## Documentation architecture -Our documentation in is generated by markdown documents in the public Github repositories of the "netdata" organization. +Our documentation in is generated by Markdown documents in the public GitHub repositories of the "netdata" organization. -The structure of the documentation is handled by a [map](https://github.com/netdata/learn/blob/master/map.tsv) file, that contains metadata for every markdown files in the repos we ingest from. +The structure of the documentation is handled by a [map](https://github.com/netdata/learn/blob/master/map.tsv) file that contains metadata for every Markdown file in the repos we ingest from. -Then the ingest script parses that map and organizes the markdown files accordingly. +Then the ingest script parses that map and organizes the Markdown files accordingly. ### Improve existing documentation @@ -20,16 +20,16 @@ Each published document on [Netdata Learn](https://learn.netdata.cloud) includes Clicking on that link is the recommended way to improve our documentation, as it leads you directly to GitHub's code editor. Make your suggested changes, and use the ***Preview changes*** button to ensure your Markdown syntax works as expected. -Under the **Commit changes** header, write descriptive title for your requested change. Click the **Commit changes** button to initiate your pull request (PR). +Under the **Commit changes** header, write descriptive title for your requested change. Click the **Commit changes** button to initiate your pull request (PR). Jump down to our instructions on [PRs](#making-a-pull-request) for your next steps. ### Create a new document -You can create a pull request to add a completely new markdown document in any of our public repositories. -After the Github pull request is merged, our documentation team will decide where in the documentation hierarchy to publish that document. +You can create a pull request to add a completely new Markdown document in any of our public repositories. +After the GitHub pull request is merged, our documentation team will decide where in the documentation hierarchy to publish that document. -If you wish to contribute documentation that is tailored to your specific infrastructure monitoring/troubleshooting experience, please consider submitting a blog post about your experience. +If you wish to contribute documentation tailored to your specific infrastructure monitoring/troubleshooting experience, please consider submitting a blog post about your experience. Check out our [blog](https://github.com/netdata/blog#readme) repo! Any blog submissions that have widespread or universal application will be integrated into our permanent documentation. @@ -41,11 +41,11 @@ Netdata's documentation uses Markdown syntax. If you're not familiar with Markdo #### Edit locally -Editing documentation locally is the preferred method for completely new documents, or complex changes that span multiple documents. Clone the repository where you wish to make your changes, work on a new branch and create a pull request with that branch. +Editing documentation locally is the preferred method for completely new documents, or complex changes that span multiple documents. Clone the repository where you wish to make your changes, work on a new branch, and create a pull request with that branch. ### Links to other documents -Please ensure that any links to a different documentation resource are fully expanded URLs to the relevant markdown document, not links to "learn.netdata.cloud". +Please ensure that any links to a different documentation resource are fully expanded URLs to the relevant Markdown document, not links to "learn.netdata.cloud". e.g. @@ -58,14 +58,14 @@ vs This permalink ensures that the link will not be broken by any future restructuring in learn.netdata.cloud. You can see the URL to the source of any published documentation page in the **Edit this page** link at the bottom. -If you just replace `edit` with `blob` in that URL, you have the permalink to the original markdown document. +If you just replace `edit` with `blob` in that URL, you have the permalink to the original Markdown document. ### Making a pull request Pull requests (PRs) should be concise and informative. See our [PR guidelines](https://github.com/netdata/.github/blob/main/CONTRIBUTING.md#pr-guidelines) for specifics. -The Netdata team will review your PR and assesses it for correctness, conciseness, and overall quality. +The Netdata team will review your PR and assess it for correctness, conciseness, and overall quality. We may point to specific sections and ask for additional information or other fixes. After merging your PR, the Netdata team rebuilds the [documentation site](https://learn.netdata.cloud) to publish the changed documentation. diff --git a/docs/metric-correlations.md b/docs/metric-correlations.md index 0467b8dd099f31..700785783f9a99 100644 --- a/docs/metric-correlations.md +++ b/docs/metric-correlations.md @@ -4,7 +4,7 @@ The Metric Correlations feature lets you quickly find metrics and charts related By displaying the standard Netdata dashboard, filtered to show only charts that are relevant to the window of interest, you can get to the root cause sooner. -Because Metric Correlations uses every available metric from your infrastructure, with as high as 1-second granularity, you get the most accurate insights using every possible metric. +Because Metric Correlations use every available metric from your infrastructure, with as high as 1-second granularity, you get the most accurate insights using every possible metric. ## Using Metric Correlations @@ -12,9 +12,9 @@ When viewing the [Metrics tab or a single-node dashboard](/docs/dashboards-and-c To start correlating metrics, click the **Metric Correlations** button, [highlight a selection of metrics](/docs/dashboards-and-charts/netdata-charts.md#highlight) on a single chart. The selected timeframe needs at least 15 seconds for Metric Correlation to work. -The menu then displays information about the selected area and reference baseline. Metric Correlations uses the reference baseline to discover which additional metrics are most closely connected to the selected metrics. The reference baseline is based upon the period immediately preceding the highlighted window and is the length of 4 times the highlighted window. This is to ensure that the reference baseline is always immediately before the highlighted window of interest and a bit longer so as to ensure it's a more representative short term baseline. +The menu then displays information about the selected area and reference baseline. Metric Correlations uses the reference baseline to discover which additional metrics are most closely connected to the selected metrics. The reference baseline is based upon the period immediately preceding the highlighted window and is the length of four times the highlighted window. This is to ensure that the reference baseline is always immediately before the highlighted window of interest and a bit longer to ensure it's a more representative short-term baseline. -Click the **Find Correlations** button to begin the correlation process. This button is only active if a valid timeframe is selected. Once clicked, the process will evaluate all available metrics on your nodes and return a filtered version of the Netdata dashboard. You will now only see the metrics that changed the most between the base window and the highlighted window you selected.. +Click the **Find Correlations** button to begin the correlation process. This button is only active if a valid timeframe is selected. Once clicked, the process will evaluate all available metrics on your nodes and return a filtered version of the Netdata dashboard. You will now only see the metrics that changed the most between the base window and the highlighted window you selected. These charts are fully interactive, and whenever possible, will only show the **dimensions** related to the timeline you selected. @@ -24,18 +24,18 @@ If you find something else interesting in the results, you can select another wi MC enables a few input parameters that users can define to iteratively explore their data in different ways. As is usually the case in Machine Learning (ML), there is no "one size fits all" algorithm, what approach works best will typically depend on the type of data (which can be very different from one metric to the next) and even the nature of the event or incident you might be exploring in Netdata. -So when you first run MC it will use the most sensible and general defaults. But you can also then vary any of the below options to explore further. +So when you first run MC, it will use the most sensible and general defaults. But you can also then vary any of the below options to explore further. ### Method -There are two algorithms available that aim to score metrics based on how much they have changed between the baseline and highlight windows. +There are two algorithms available that aim to score metrics based on how much they’ve changed between the baseline and highlight windows. - `KS2` - A statistical test ([Two-sample Kolmogorov Smirnov](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov%E2%80%93Smirnov_test)) comparing the distribution of the highlighted window to the baseline to try and quantify which metrics have most evidence of a significant change. You can explore our implementation [here](https://github.com/netdata/netdata/blob/d917f9831c0a1638ef4a56580f321eb6c9a88037/database/metric_correlations.c#L212). -- `Volume` - A heuristic measure based on the percentage change in averages between highlighted window and baseline, with various edge cases sensibly controlled for. You can explore our implementation [here](https://github.com/netdata/netdata/blob/d917f9831c0a1638ef4a56580f321eb6c9a88037/database/metric_correlations.c#L516). +- `Volume` - A heuristic measure based on the percentage change in averages between a highlighted window and baseline, with various edge cases sensibly controlled for. You can explore our implementation [here](https://github.com/netdata/netdata/blob/d917f9831c0a1638ef4a56580f321eb6c9a88037/database/metric_correlations.c#L516). ### Aggregation -Behind the scenes, Netdata will aggregate the raw data as needed such that arbitrary window lengths can be selected for MC. By default, Netdata will just `Average` raw data when needed as part of pre-processing. However other aggregations like `Median`, `Min`, `Max`, `Stddev` are also possible. +Behind the scenes, Netdata will aggregate the raw data as needed such that arbitrary window lengths can be selected for MC. By default, Netdata will just `Average` raw data when needed as part of pre-processing. However, other aggregations like `Median`, `Min`, `Max`, `Stddev` are also possible. ### Data @@ -48,16 +48,19 @@ Unlike other observability agents that only collect raw metrics, Netdata also as ## Metric Correlations on the Agent -When a Metric Correlations request is made to Netdata Cloud, if any node instances have MC enabled then the request will be routed to the node instance with the highest hops (e.g. a parent node if one is found or the node itself if not). If no node instances have MC enabled then the request will be routed to the original Netdata Cloud based service which will request input data from the nodes and run the computation within the Netdata Cloud backend. +Metric Correlations (MC) requests to Netdata Cloud are handled in two ways: + +1. If MC is enabled on any node, the request routes to the highest-level node in the hierarchy (either a Parent node or the node itself) +2. If MC is not enabled on any node, Netdata Cloud processes the request by collecting data from nodes and computing correlations in its backend ## Usage tips - When running Metric Correlations from the [Metrics tab](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md) across multiple nodes, you might find better results if you iterate on the initial results by grouping by node to then filter to nodes of interest and rerun the Metric Correlations. So a typical workflow in this case would be to: - - If unsure which nodes you are interested in then run MC on all nodes. - - Within the initial results returned group the most interesting chart by node to see if the changes are across all nodes or a subset of nodes. - - If you see a subset of nodes clearly jump out when you group by node, then filter for just those nodes of interest and run the MC again. This will result in less aggregation needing to be done by Netdata and so should help give clearer results as you interact with the slider. -- Use the `Volume` algorithm for metrics with a lot of gaps (e.g. request latency when there are few requests), otherwise stick with `KS2` - - By default, Netdata uses the `KS2` algorithm which is a tried and tested method for change detection in a lot of domains. The [Wikipedia](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) article gives a good overview of how this works. Basically, it is comparing, for each metric, its cumulative distribution in the highlight window with its cumulative distribution in the baseline window. The statistical test then seeks to quantify the extent to which we can say these two distributions look similar enough to be considered the same or not. The `Volume` algorithm is a bit more simple than `KS2` in that it basically compares (with some edge cases sensibly handled) the average value of the metric across baseline and highlight and looks at the percentage change. Often both `KS2` and `Volume` will have significant agreement and return similar metrics. - - `Volume` might favour picking up more sparse metrics that were relatively flat and then came to life with some spikes (or vice versa). This is because for such metrics that just don't have that many different values in them, it is impossible to construct a cumulative distribution that can then be compared. So `Volume` might be useful in spotting examples of metrics turning on or off. ![example where volume captured network traffic turning on](https://user-images.githubusercontent.com/2178292/182336924-d02fd3d3-7f09-41da-9cfc-809d01396d9d.png) - - `KS2` since it relies on the full distribution might be better at highlighting more complex changes that `Volume` is unable to capture. For example a change in the variation of a metric might be picked up easily by `KS2` but missed (or just much lower scored) by `Volume` since the averages might remain not all that different between baseline and highlight even if their variance has changed a lot. ![example where KS2 captured a change in entropy distribution that volume alone might not have picked up](https://user-images.githubusercontent.com/2178292/182338289-59b61e6b-089d-431c-bc8e-bd19ba6ad5a5.png) -- Use `Volume` and `Anomaly Rate` together to ask what metrics have turned most anomalous from baseline to highlighted window. You can expand the embedded anomaly rate chart once you have results to see this more clearly. ![example where Volume and Anomaly Rate together help show what dimensions where most anomalous](https://user-images.githubusercontent.com/2178292/182338666-6d19fa92-89d3-4d61-804c-8f10982114f5.png) + - If unsure which nodes you’re interested in, then run MC on all nodes. + - Within the initial results returned group the most interesting chart by node to see if the changes are across all nodes or a subset of nodes. + - If you see a subset of nodes clearly jump out when you group by node, then filter for just those nodes of interest and run the MC again. This will result in less aggregation needing to be done by Netdata and so should help give clearer results as you interact with the slider. +- Use the `Volume` algorithm for metrics with a lot of gaps (e.g., request latency when there are few requests), otherwise stick with `KS2` + - By default, Netdata uses the `KS2` algorithm which is a tried and tested method for change detection in a lot of domains. The [Wikipedia](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) article gives a good overview of how this works. Basically, it is comparing, for each metric, its cumulative distribution in the highlight window with its cumulative distribution in the baseline window. The statistical test then seeks to quantify the extent to which we can say these two distributions look similar enough to be considered the same or not. The `Volume` algorithm is a bit more simple than `KS2` in that it basically compares (with some edge cases sensibly handled) the average value of the metric across baseline and highlight and looks at the percentage change. Often both `KS2` and `Volume` will have significant agreement and return similar metrics. + - `Volume` might favor picking up more sparse metrics that were relatively flat and then came to life with some spikes (or vice versa). This is because for such metrics that just don't have that many different values in them, it is impossible to construct a cumulative distribution that can then be compared. So `Volume` might be useful in spotting examples of metrics turning on or off. ![example where volume captured network traffic turning on](https://user-images.githubusercontent.com/2178292/182336924-d02fd3d3-7f09-41da-9cfc-809d01396d9d.png) + - `KS2` since it relies on the full distribution might be better at highlighting more complex changes that `Volume` is unable to capture. For example, a change in the variation of a metric might be picked up easily by `KS2` but missed (or just much lower scored) by `Volume` since the averages might remain not all that different between baseline and highlight even if their variance has changed a lot. ![example where KS2 captured a change in entropy distribution that volume alone might not have picked up](https://user-images.githubusercontent.com/2178292/182338289-59b61e6b-089d-431c-bc8e-bd19ba6ad5a5.png) +- Use `Volume` and `Anomaly Rate` together to ask what metrics have turned most anomalous from baseline to a highlighted window. You can expand the embedded anomaly rate chart once you have results to see this more clearly. ![example where Volume and Anomaly Rate together help show what dimensions where most anomalous](https://user-images.githubusercontent.com/2178292/182338666-6d19fa92-89d3-4d61-804c-8f10982114f5.png) diff --git a/docs/netdata-agent/README.md b/docs/netdata-agent/README.md index ef538f2426b60f..8c505be4d31098 100644 --- a/docs/netdata-agent/README.md +++ b/docs/netdata-agent/README.md @@ -61,11 +61,11 @@ stateDiagram-v2 8. **Score**: a scoring engine for comparing and correlating metrics. 9. **Stream**: a mechanism to connect Netdata Agents and build Metrics Centralization Points (Netdata Parents). 10. **Visualize**: Netdata's fully automated dashboards for all metrics. -11. **Export**: export metric samples to 3rd party time-series databases, enabling the use of 3rd party tools for visualization, like Grafana. +11. **Export**: export metric samples to third party time-series databases, enabling the use of third party tools for visualization, like Grafana. ## Comparison to other observability solutions -1. **One moving part**: Other monitoring solution require maintaining metrics exporters, time-series databases, visualization engines. Netdata has everything integrated into one package, even when [Metrics Centralization Points](/docs/observability-centralization-points/metrics-centralization-points/README.md) are required, making deployment and maintenance a lot simpler. +1. **One moving part**: Another monitoring solution requires maintaining metrics exporters, time-series databases, and visualization engines. Netdata has everything integrated into one package, even when [Metrics Centralization Points](/docs/observability-centralization-points/metrics-centralization-points/README.md) are required, making deployment and maintenance a lot simpler. 2. **Automation**: Netdata is designed to automate most of the process of setting up and running an observability solution. It is designed to instantly provide comprehensive dashboards and fully automated alerts, with zero configuration. diff --git a/docs/netdata-agent/configuration/anonymous-telemetry-events.md b/docs/netdata-agent/configuration/anonymous-telemetry-events.md index a5b4880c92141b..9558730f633650 100644 --- a/docs/netdata-agent/configuration/anonymous-telemetry-events.md +++ b/docs/netdata-agent/configuration/anonymous-telemetry-events.md @@ -17,7 +17,7 @@ Netdata collects usage information via two different channels: - **Agent dashboard**: We use the [PostHog JavaScript integration](https://posthog.com/docs/integrations/js-integration) (with sensitive event attributes overwritten to be anonymized) to send product usage events when you access an [Agent's dashboard](/docs/dashboards-and-charts/README.md). - **Agent backend**: The `netdata` daemon executes the [`anonymous-statistics.sh`](https://github.com/netdata/netdata/blob/6469cf92724644f5facf343e4bdd76ac0551a418/daemon/anonymous-statistics.sh.in) script when Netdata starts, stops cleanly, or fails. -You can opt-out from sending anonymous statistics to Netdata through three different [opt-out mechanisms](#opt-out). +You can opt out from sending anonymous statistics to Netdata through three different [opt-out mechanisms](#opt-out). ## Agent Dashboard - PostHog JavaScript @@ -68,7 +68,7 @@ directory`, which is usually `/usr/libexec/netdata/plugins.d`. ## Opt-out -You can opt-out from sending anonymous statistics to Netdata through three different opt-out mechanisms: +You can opt out from sending anonymous statistics to Netdata through three different opt-out mechanisms: **Create a file called `.opt-out-from-anonymous-statistics`.** This empty file, stored in your Netdata configuration directory (usually `/etc/netdata`), immediately stops the statistics script from running, and works with any type of diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/README.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/README.md index af35c3c662b387..86c5128eaf26b7 100644 --- a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/README.md +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/README.md @@ -1,7 +1,7 @@ # Running the Netdata Agent behind a reverse proxy -If you need to access a Netdata Agent's user interface or API in a production environment we recommend you put Netdata behind -another web server and secure access to the dashboard via SSL, user authentication and firewall rules. +If you need to access a Netdata Agent's user interface or API in a production environment, we recommend you put Netdata behind +another web server and secure access to the dashboard via SSL, user authentication, and firewall rules. A dedicated web server also provides more robustness and capabilities than the Agent's [internal web server](/src/web/README.md). @@ -12,8 +12,7 @@ We have documented running behind [Lighttpd](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-lighttpd.md), [Caddy](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-caddy.md), and [H2O](/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md). -If you prefer a different web server, we suggest you follow the documentation for nginx and tell us how you did it - by adding your own "Running behind webserverX" document. +If you prefer a different web server, we suggest you follow the documentation for nginx and tell us how you did it by adding your own "Running behind webserverX" document. When you run Netdata behind a reverse proxy, we recommend you firewall protect all your Netdata servers, so that only the web server IP will be allowed to directly access Netdata. To do this, run this on each of your servers (or use your firewall manager): @@ -22,7 +21,7 @@ PROXY_IP="1.2.3.4" iptables -t filter -I INPUT -p tcp --dport 19999 \! -s ${PROXY_IP} -m conntrack --ctstate NEW -j DROP ``` -The above will prevent anyone except your web server to access a Netdata dashboard running on the host. +The above will prevent anyone except your web server from accessing a Netdata dashboard running on the host. You can also use `netdata.conf`: diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-apache.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-apache.md index 23e4ae233bfb3e..663998c0236589 100644 --- a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-apache.md +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-apache.md @@ -1,6 +1,6 @@ # Running Netdata behind Apache's mod_proxy -Below you can find instructions for configuring an apache server to: +Below, you can find instructions for configuring an apache server to: 1. Proxy a single Netdata via an HTTP and HTTPS virtual host. 2. Dynamically proxy any number of Netdata servers. @@ -148,12 +148,12 @@ _Assuming the main goal is to make Netdata running in HTTPS._ 1. Make a subdomain for Netdata on which you enable and force HTTPS - You can use a free Let's Encrypt certificate 2. Go to "Apache & nginx Settings", and in the following section, add: - ```text - RewriteEngine on - RewriteRule (.*) http://localhost:19999/$1 [P,L] - ``` + ```text + RewriteEngine on + RewriteRule (.*) http://localhost:19999/$1 [P,L] + ``` -3. Optional: If your server is remote, then just replace "localhost" with your actual hostname or IP, it just works. +3. Optional: If your server is remote, then replace "localhost" with your actual hostname or IP, it just works. Repeat the operation for as many servers as you need. @@ -218,7 +218,7 @@ Note: Changes are applied by reloading or restarting Apache. ## Configuration of Content Security Policy -If you want to enable CSP within your Apache, you should consider some special requirements of the headers. Modify your configuration like that: +If you want to enable CSP within your Apache, you should consider some special requirements for the headers. Modify your configuration like that: ```text Header always set Content-Security-Policy "default-src http: 'unsafe-inline' 'self' 'unsafe-eval'; script-src http: 'unsafe-inline' 'self' 'unsafe-eval'; style-src http: 'self' 'unsafe-inline'" @@ -229,9 +229,9 @@ Note: Changes are applied by reloading or restarting Apache. ## Using Netdata with Apache's `mod_evasive` module The `mod_evasive` Apache module helps system administrators protect their web server from brute force and distributed -denial of service attack (DDoS) attacks. +denial-of-service attack (DDoS) attacks. -Because Netdata sends a request to the web server for every chart update, it's normal to create 20-30 requests per +Because Netdata sends a request to the web server for every chart update, it's normal to create 20–30 requests per second, per client. If you're using `mod_evasive` on your Apache web server, this volume of requests will trigger the module's protection, and your dashboard will become unresponsive. You may even begin to see 403 errors. @@ -239,10 +239,10 @@ To mitigate this issue, you will need to change the value of the `DOSPageCount` which can typically be found at `/etc/httpd/conf.d/mod_evasive.conf` or `/etc/apache2/mods-enabled/evasive.conf`. The `DOSPageCount` option sets the limit of the number of requests from a single IP address for the same page per page -interval, which is usually 1 second. The default value is `2` requests per second. Clearly, Netdata's typical usage will +interval, which is usually 1 second. The default value is `2` requests per second. Netdata's typical usage will exceed that threshold, and `mod_evasive` will add your IP address to a blocklist. -Our users have found success by setting `DOSPageCount` to `30`. Try this, and raise the value if you continue to see 403 +Our users have found success by setting `DOSPageCount` to `30`. Try this and raise the value if you continue to see 403 errors while accessing the dashboard. ```text @@ -273,7 +273,7 @@ See issues [#2011](https://github.com/netdata/netdata/issues/2011) and ## Netdata configuration -You might edit `/etc/netdata/netdata.conf` to optimize your setup a bit. For applying these changes you need to restart Netdata. +You might edit `/etc/netdata/netdata.conf` to optimize your setup a bit. For applying these changes, you need to restart Netdata. ### Response compression @@ -316,14 +316,14 @@ You can also use a unix domain socket. This will also provide a faster route bet bind to = unix:/tmp/netdata.sock ``` -Apache 2.4.24+ can not read from `/tmp` so create your socket in `/var/run/netdata` +Apache 2.4.24+ can’t read from `/tmp` so create your socket in `/var/run/netdata` ```text [web] bind to = unix:/var/run/netdata/netdata.sock ``` -At the apache side, prepend the 2nd argument to `ProxyPass` with `unix:/tmp/netdata.sock|`, like this: +At the apache side, prepend the second argument to `ProxyPass` with `unix:/tmp/netdata.sock|`, like this: ```text ProxyPass "/netdata/" "unix:/tmp/netdata.sock|http://localhost:19999/" connectiontimeout=5 timeout=30 keepalive=on @@ -341,7 +341,7 @@ If your apache server is not on localhost, you can set: ## Prevent the double access.log -apache logs accesses and Netdata logs them too. You can prevent Netdata from generating its access log, by setting this in `/etc/netdata/netdata.conf`: +Apache logs accesses and Netdata logs them too. You can prevent Netdata from generating its access log, by setting this in `/etc/netdata/netdata.conf`: ```text [logs] @@ -352,5 +352,5 @@ apache logs accesses and Netdata logs them too. You can prevent Netdata from gen Make sure the requests reach Netdata, by examining `/var/log/netdata/access.log`. -1. if the requests do not reach Netdata, your apache does not forward them. -2. if the requests reach Netdata but the URLs are wrong, you have not re-written them properly. +1. if the requests don’t reach Netdata, your apache doesn’t forward them. +2. if the requests reach Netdata but the URLs are wrong, you haven’t re-written them properly. diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md index f2dc45b828e71f..a165998479ae28 100644 --- a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md @@ -14,15 +14,15 @@ It is notable for having much simpler configuration than many popular HTTP serve ## H2O configuration file -On most systems, the H2O configuration is found under `/etc/h2o`. H2O uses [YAML 1.1](https://yaml.org/spec/1.1/), with a few special extensions, for it’s configuration files, with the main configuration file being `/etc/h2o/h2o.conf`. +On most systems, the H2O configuration is found under `/etc/h2o`. H2O uses [YAML 1.1](https://yaml.org/spec/1.1/), with a few special extensions, for its configuration files, with the main configuration file being `/etc/h2o/h2o.conf`. -You can edit the H2O configuration file with Nano, Vim or any other text editors with which you are comfortable. +You can edit the H2O configuration file with Nano, Vim or any other text editors with which you’re comfortable. After making changes to the configuration files, perform the following: - Test the configuration with `h2o -m test -c /etc/h2o/h2o.conf` -- Restart H2O to apply tha changes with `/etc/init.d/h2o restart` or `service h2o restart` +- Restart H2O to apply the changes with `/etc/init.d/h2o restart` or `service h2o restart` ## Ways to access Netdata via H2O @@ -63,7 +63,7 @@ hosts: ### As a subfolder for multiple Netdata servers, via one H2O instance -This is the recommended configuration when one H2O instance will be used to manage multiple Netdata servers via sub-folders. +This is the recommended configuration when one H2O instance will be used to manage multiple Netdata servers via subfolders. ```yaml hosts: @@ -87,9 +87,9 @@ hosts: proxy.reverse.url: http://198.51.100.2:19999 ``` -Of course you can add as many backend servers as you like. +Of course, you can add as many backend servers as you like. -Using the above, you access Netdata on the backend servers, like this: +Using the above, you access Netdata on the backend servers like this: - `http://netdata.example.com/netdata/server1/` to reach Netdata on `198.51.100.1:19999` - `http://netdata.example.com/netdata/server2/` to reach Netdata on `198.51.100.2:19999` diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-lighttpd.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-lighttpd.md index 48b9b2c935b972..e3fccbdcc2ca01 100644 --- a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-lighttpd.md +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-lighttpd.md @@ -9,7 +9,7 @@ $HTTP["url"] =~ "^/netdata/" { } ``` -If you have older lighttpd you have to use a chain (such as below), as explained [at this Stack Overflow answer](http://stackoverflow.com/questions/14536554/lighttpd-configuration-to-proxy-rewrite-from-one-domain-to-another). +If you have older lighttpd, you have to use a chain (such as below), as explained [at this Stack Overflow answer](http://stackoverflow.com/questions/14536554/lighttpd-configuration-to-proxy-rewrite-from-one-domain-to-another). ```text $HTTP["url"] =~ "^/netdata/" { @@ -29,7 +29,7 @@ then you can get away with just proxy.server = ( "" => ( ( "host" => "127.0.0.1", "port" => 19999 ))) ``` -Though if it's public facing you might then want to put some authentication on it. `htdigest` support looks like: +Though if it's public facing, you might then want to put some authentication on it. `htdigest` support looks like: ```text auth.backend = "htdigest" diff --git a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-nginx.md b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-nginx.md index d38fbe8272e786..db35ef1bcba9ab 100644 --- a/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-nginx.md +++ b/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-nginx.md @@ -4,7 +4,7 @@ [Nginx](https://nginx.org/en/) is an HTTP and reverse proxy server, a mail proxy server, and a generic TCP/UDP proxy server used to host websites and applications of all sizes. -The software is known for its low impact on memory resources, high scalability, and its modular, event-driven architecture which can offer secure, predictable performance. +The software is known for its low impact on memory resources, high scalability, and its modular, event-driven architecture, which can offer secure, predictable performance. ## Why Nginx @@ -12,9 +12,9 @@ The software is known for its low impact on memory resources, high scalability, - Nginx is used and useful in cases when you want to access different instances of Netdata from a single server. -- Password-protect access to Netdata, until distributed authentication is implemented via the Netdata Cloud Sign In mechanism. +- Password-protect access to Netdata until distributed authentication is implemented via the Netdata Cloud Sign In mechanism. -- A proxy was necessary to encrypt the communication to Netdata, until v1.16.0, which provided TLS (HTTPS) support. +- A proxy was necessary to encrypt the communication to Netdata until v1.16.0, which provided TLS (HTTPS) support. ## Nginx configuration file @@ -24,7 +24,7 @@ Configuration options in Nginx are known as directives. Directives are organized Depending on your installation source, you’ll find an example configuration file at `/etc/nginx/conf.d/default.conf` or `etc/nginx/sites-enabled/default`, in some cases you may have to manually create the `sites-available` and `sites-enabled` directories. -You can edit the Nginx configuration file with Nano, Vim or any other text editors you are comfortable with. +You can edit the Nginx configuration file with Nano, Vim or any other text editors you’re comfortable with. After making changes to the configuration files: @@ -112,7 +112,7 @@ server { ### As a subfolder for multiple Netdata servers, via one Nginx -This is the recommended configuration when one Nginx will be used to manage multiple Netdata servers via sub-folders. +This is the recommended configuration when one Nginx will be used to manage multiple Netdata servers via subfolders. ```text upstream backend-server1 { @@ -155,9 +155,9 @@ server { } ``` -Of course you can add as many backend servers as you like. +Of course, you can add as many backend servers as you like. -Using the above, you access Netdata on the backend servers, like this: +Using the above, you access Netdata on the backend servers like this: - `http://netdata.example.com/netdata/server1/` to reach `backend-server1` - `http://netdata.example.com/netdata/server2/` to reach `backend-server2` @@ -173,7 +173,7 @@ proxy_set_header X-Forwarded-Proto https; proxy_pass https://localhost:19999; ``` -Optionally it is also possible to [enable TLS/SSL on Nginx](http://nginx.org/en/docs/http/configuring_https_servers.html), this way the user will encrypt not only the communication between Nginx and Netdata but also between the user and Nginx. +Optionally, it is also possible to [enable TLS/SSL on Nginx](http://nginx.org/en/docs/http/configuring_https_servers.html), this way the user will encrypt not only the communication between Nginx and Netdata but also between the user and Nginx. If Nginx is not configured as described here, you will probably receive the error `SSL_ERROR_RX_RECORD_TOO_LONG`. @@ -214,7 +214,7 @@ You can also use a unix domain socket. This will also provide a faster route bet bind to = unix:/var/run/netdata/netdata.sock ``` -At the Nginx side, use something like this to use the same unix domain socket: +On the Nginx side, use something like this to use the same unix domain socket: ```text upstream backend { @@ -265,11 +265,11 @@ To disable Netdata's gzip compression, open `netdata.conf` and in the `[web]` se ## SELinux -If you get an 502 Bad Gateway error you might check your Nginx error log: +If you get an 502 Bad Gateway error, you might check your Nginx error log: ```sh # cat /var/log/nginx/error.log: 2016/09/09 12:34:05 [crit] 5731#5731: *1 connect() to 127.0.0.1:19999 failed (13: Permission denied) while connecting to upstream, client: 1.2.3.4, server: netdata.example.com, request: "GET / HTTP/2.0", upstream: "http://127.0.0.1:19999/", host: "netdata.example.com" ``` -If you see something like the above, chances are high that SELinux prevents nginx from connecting to the backend server. To fix that, just use this policy: `setsebool -P httpd_can_network_connect true`. +If you see something like the above, chances are high that SELinux prevents nginx from connecting to the backend server. To fix that, use this policy: `setsebool -P httpd_can_network_connect true`. diff --git a/docs/netdata-agent/sizing-netdata-agents/README.md b/docs/netdata-agent/sizing-netdata-agents/README.md index 3880e214ca1aff..06fe70563f68b8 100644 --- a/docs/netdata-agent/sizing-netdata-agents/README.md +++ b/docs/netdata-agent/sizing-netdata-agents/README.md @@ -65,7 +65,7 @@ Check [RAM Requirements](/docs/netdata-agent/sizing-netdata-agents/ram-requireme Netdata uses a custom 32-bit floating-point format tailored for efficient storage of time-series data, along with an anomaly bit. This, combined with a fixed-step database design, enables efficient storage and retrieval of data. | Tier | Approximate Sample Size (bytes) | - |-----------------------------------|---------------------------------| + |-----------------------------------|---------------------------------| | High-resolution tier (per-second) | 0.6 | | Mid-resolution tier (per-minute) | 6 | | Low-resolution tier (per-hour) | 18 | diff --git a/docs/netdata-assistant.md b/docs/netdata-assistant.md index e01aa27741c6fc..ea803b88b86316 100644 --- a/docs/netdata-assistant.md +++ b/docs/netdata-assistant.md @@ -4,7 +4,7 @@ The Netdata Assistant is a feature that uses large language models and the Netda ## Using Netdata Assistant -- Navigate to the alerts tab +- Navigate to the Alerts tab - If there are active alerts, the `Actions` column will have an Assistant button ![actions column](https://github-production-user-asset-6210df.s3.amazonaws.com/24860547/253559075-815ca123-e2b6-4d44-a780-eeee64cca420.png) @@ -13,7 +13,7 @@ The Netdata Assistant is a feature that uses large language models and the Netda ![Netdata Assistant popup](https://github-production-user-asset-6210df.s3.amazonaws.com/24860547/253559645-62850c7b-cd1d-45f2-b2dd-474ecbf2b713.png) -- In case you need more information, or want to understand deeper, Netdata Assistant also provides useful web links to resources that can help. +- If you need more information or want to dig deeper into an issue, Netdata Assistant will also provide you with helpful web links to resources that can help you. ![useful resources](https://github-production-user-asset-6210df.s3.amazonaws.com/24860547/253560071-e768fa6d-6c9a-4504-bb1f-17d5f4707627.png) diff --git a/docs/netdata-cloud/README.md b/docs/netdata-cloud/README.md index 73a0bcc658ebb8..74c27620ec1618 100644 --- a/docs/netdata-cloud/README.md +++ b/docs/netdata-cloud/README.md @@ -1,6 +1,6 @@ # Netdata Cloud -Netdata Cloud is a service that complements Netdata installations. It is a key component in achieving optimal cost structure for large scale observability. +Netdata Cloud is a service that complements Netdata installations. It is a key part in achieving optimal cost structure for large scale observability. Technically, Netdata Cloud is a thin control plane that allows the Netdata ecosystem to be a virtually unlimited scalable and flexible observability pipeline. With Netdata Cloud, this observability pipeline can span multiple teams, cloud providers, data centers and services, while remaining a uniform and highly integrated infrastructure, providing real-time and high-fidelity insights. @@ -45,7 +45,7 @@ Netdata Cloud provides the following features, on top of what the Netdata Agents 2. **Role Based Access Control (RBAC)**: Netdata Cloud has all the mechanisms for user-management and access control. It allows assigning all users a role, segmenting the infrastructure into rooms, and associating Rooms with roles and users. -3. **Access from anywhere**: Netdata Agents are installed on-prem and this is where all your data are always stored. Netdata Cloud allows querying all the Netdata Agents (Standalone, Children and Parents) in real-time when dashboards are accessed via Netdata Cloud. +3. **Access from anywhere**: Netdata Agents are installed on-prem, and this is where all your data is always stored. Netdata Cloud allows querying all the Netdata Agents (Standalone, Children and Parents) in real-time when dashboards are accessed via Netdata Cloud. This enables a much simpler access control, eliminating the complexities of setting up VPNs to access observability, and the bandwidth costs for centralizing all metrics to one place. @@ -61,11 +61,16 @@ Netdata Cloud provides the following features, on top of what the Netdata Agents ## Data Exposed to Netdata Cloud -Netdata is thin layer of top of Netdata Agents. It does not receive the samples collected, or the logs Netdata Agents maintain. +Netdata Cloud serves as a thin layer on top of Netdata Agents. It does not receive the samples collected, or the logs Netdata Agents maintain. -This is a key design decision for Netdata. If we were centralizing metric samples and logs, Netdata would have the same constrains and cost structure other observability solutions have, and we would be forced to lower metrics resolution, filter out metrics and eventually increase significantly the cost of observability. +Netdata's design deliberately avoids centralizing raw metrics and logs. This prevents the common constraints of traditional observability solutions: reduced metric resolution, forced data filtering, and higher costs. -Instead, Netdata Cloud receives and stores only metadata related to the metrics collected, such as the nodes collecting metrics and their labels, the metric names, their labels and their retention, the data collection plugins and modules running, the configured alerts and their transitions. +Instead, Netdata Cloud only stores metadata such as: + +- Node information and labels +- Metric names, labels, and retention periods +- Active collectors and modules +- Alert configurations and state changes This information is a small fraction of the total information maintained by Netdata Agents, allowing Netdata Cloud to remain high-resolution, high-fidelity and real-time, while being able to: @@ -78,15 +83,15 @@ Metric samples and logs are transferred via Netdata Cloud to your Web Browser, o You can subscribe to Netdata Cloud updates at the [Netdata Cloud Status](https://status.netdata.cloud/) page. -Netdata Cloud is a highly available, auto-scalable solution, however being a monitoring solution, we need to ensure dashboards are accessible during crisis. +Netdata Cloud is a highly available, auto-scalable solution; however, being a monitoring solution, we need to ensure dashboards are accessible during crisis. Netdata Agents provide the same dashboard Netdata Cloud provides, with the following limitations: 1. Netdata Agents (Children and Parents) dashboards are limited to their databases, while on Netdata Cloud the dashboard presents the entire infrastructure, from all Netdata Agents connected to it. -2. When you are not logged-in or the Agent is not connected to Netdata Cloud, certain features of the Netdata Agent dashboard will not be available. +2. When you are not logged in or the Agent is not connected to Netdata Cloud, certain features of the Netdata Agent dashboard will not be available. - When you are logged-in and the Agent is connected to Netdata Cloud, the dashboard has the same functionality as Netdata Cloud. + When you are logged in and the Agent is connected to Netdata Cloud, the dashboard has the same functionality as Netdata Cloud. To ensure dashboard high availability, Netdata Agent dashboards are available by directly accessing them, even when the connectivity between Children and Parents or Netdata Cloud faces issues. This allows the use of the individual Netdata Agents' dashboards during crisis, at different levels of aggregation. @@ -98,16 +103,16 @@ Netdata Cloud queries Netdata Agents, so it provides exactly the same fidelity a The Netdata Agent and Netdata Cloud have similar query performance, but there are additional network latencies involved when the dashboards are viewed via Netdata Cloud. -Accessing Netdata Agents on the same LAN has marginal network latency and their response time is only affected by the queries. However, accessing the same Netdata Agents via Netdata Cloud has a bigger network round-trip time, that looks like this: +Accessing Netdata Agents on the same LAN has marginal network latency, and their response time is only affected by the queries. However, accessing the same Netdata Agents via Netdata Cloud has a bigger network round-trip time that looks like this: 1. Your web browser makes a request to Netdata Cloud. 2. Netdata Cloud sends the request to your Netdata Agents. If multiple Netdata Agents are involved, they are queried in parallel. 3. Netdata Cloud receives their responses and aggregates them into a single response. 4. Netdata Cloud replies to your web browser. -If you are sitting on the same LAN as the Netdata Agents, the latency will be 2 times the round-trip network latency between this LAN and Netdata Cloud. +If you are sitting on the same LAN as the Netdata Agents, the latency will be two times the round-trip network latency between this LAN and Netdata Cloud. -However, when there are multiple Netdata Agents involved, the queries will be faster compared to a monitoring solution that has one centralization point. Netdata Cloud splits each query into multiple parts and each of the Netdata Agents involved will only perform a small part of the original query. So, when querying a large infrastructure, you enjoy the performance of the combined power of all your Netdata Agents, which is usually quite higher than any single-centralization-point monitoring solution. +With multiple Netdata Agents, queries are faster than single-point monitoring solutions. Netdata Cloud distributes each query across multiple Agents, where each Agent processes only a portion of the query. This distributed approach uses your infrastructure's combined processing power, delivering superior performance compared to centralized solutions. ## Does Netdata Cloud require Observability Centralization Points? @@ -115,20 +120,20 @@ No. Any or all Netdata Agents can be connected to Netdata Cloud. We recommend to create [observability centralization points](/docs/observability-centralization-points/README.md), as required for operational efficiency (ephemeral nodes, teams or services isolation, central control of alerts, production systems performance), security policies (internet isolation), or cost optimization (use existing capacities before allocating new ones). -We suggest to review the [Best Practices for Observability Centralization Points](/docs/observability-centralization-points/best-practices.md). +We suggest reviewing the [Best Practices for Observability Centralization Points](/docs/observability-centralization-points/best-practices.md). ## When I have Netdata Parents, do I need to connect Netdata Children to Netdata Cloud too? -No, it is not needed, but it provides high-availability. +No, it is unnecessary, but it provides high availability. -When Netdata Parents are connected to Netdata Cloud, all their Netdata Children are available, via these Parents. +When Netdata Parents are connected to Netdata Cloud, all their Netdata Children are available via these Parents. -When multiple Netdata Parents maintain a database for the same Netdata Children (e.g. clustered Parents, or Parents and Grandparents), Netdata Cloud is able to detect the unique nodes in an infrastructure and query each node only once, using one of the available Parents. +When multiple Parent nodes store data from the same Child nodes (in clusters or multi-level hierarchies), Netdata Cloud queries each unique node once through a single available Parent. Netdata Cloud prefers: -- The most distant (from the Child) Parent available, when doing metrics visualization queries (since usually these Parents have been added for this purpose). +- The most distant (from the Child) Parent is available when doing metrics visualization queries (since usually these Parents have been added for this purpose). -- The closest (to the Child) Parent available, for [Top Monitoring](/docs/top-monitoring-netdata-functions.md) (since top-monitoring provides live data, like the processes running, the list of sockets open, etc). The streaming protocol of Netdata Parents and Children is able to forward such requests to the right child, via the Parents, to respond with live and accurate data. +- The closest (to the Child) Parent available for [Top Monitoring](/docs/top-monitoring-netdata-functions.md) (since top-monitoring provides live data, like the processes running, the list of sockets open, etc.). The streaming protocol of Netdata Parents and Children is able to forward such requests to the right child, via the Parents, to respond with live and accurate data. Netdata Children may be connected to Netdata Cloud for high-availability, in case the Netdata Parents are unreachable. diff --git a/docs/netdata-cloud/authentication-and-authorization/enterprise-sso-authentication.md b/docs/netdata-cloud/authentication-and-authorization/enterprise-sso-authentication.md index 184ff5db9d77e3..a800ecf76ef566 100644 --- a/docs/netdata-cloud/authentication-and-authorization/enterprise-sso-authentication.md +++ b/docs/netdata-cloud/authentication-and-authorization/enterprise-sso-authentication.md @@ -1,25 +1,25 @@ # Enterprise SSO Authentication Netdata provides you with means to streamline and control how your team connects and authenticates to Netdata Cloud. We provide - different Single Sign-On (SSO) integrations that allow you to connect with the tool that your organization is using to manage your - user accounts. +different Single Sign-On (SSO) integrations that allow you to connect with the tool that your organization is using to manage your +user accounts. - > **Note** This feature focus is on the Authentication flow, it doesn't support the Authorization with managing Users and Roles. +> **Note** This feature focus is on the Authentication flow, it doesn't support the Authorization with managing Users and Roles. ## How to set it up? -If you want to setup your Netdata Space to allow user Authentication through an Enterprise SSO tool you need to: +If you want to set up your Netdata Space to allow user Authentication through an Enterprise SSO tool, you need to: * Confirm the integration to the tool you want is available ([Authentication integrations](https://learn.netdata.cloud/docs/netdata-cloud/authentication-&-authorization/cloud-authentication-&-authorization-integrations)) * Have a Netdata Cloud account -* Have Access to the Space as an administrator +* Have Access to Space as an administrator * Your Space needs to be on the Business plan or higher -Once you ensure the above prerequisites you need to: +Once you ensure the above prerequisites, you need to: 1. Click on the Space settings cog (located above your profile icon) 2. Click on the Authentication tab -3. Select the card for the integration you are looking for, click on Configure +3. Select the card for the integration you’re looking for, click on Configure 4. Fill the required attributes need to establish the integration with the tool ## How to authenticate to Netdata? @@ -33,6 +33,7 @@ The **Value** can be found by clicking the **DNS TXT record** button in your spa Log into your domain provider’s website, and navigate to the DNS records section. Create a new TXT record with the following specifications: + - Value/Answer/Description: `"netdata-verification=[VERIFICATION CODE]"` - Name/Host/Alias: Leave this blank or type @ to include a subdomain. - Time to live (TTL): "86400" (this can also be inherited from the default configuration). @@ -43,5 +44,5 @@ Create a new TXT record with the following specifications: 2. Enter your email address 3. Complete the SSO flow -Note: If you're not authenticated on the Enterprise SSO tool you'll be prompted to authenticate there +Note: If you're not authenticated on the Enterprise SSO tool, you'll be prompted to authenticate there first before being allowed to proceed to Netdata Cloud. diff --git a/docs/netdata-cloud/authentication-and-authorization/role-based-access-model.md b/docs/netdata-cloud/authentication-and-authorization/role-based-access-model.md index 2226a1a0d8d29e..eb33fb087a3528 100644 --- a/docs/netdata-cloud/authentication-and-authorization/role-based-access-model.md +++ b/docs/netdata-cloud/authentication-and-authorization/role-based-access-model.md @@ -4,8 +4,8 @@ Netdata Cloud's role-based-access mechanism allows you to control what functiona ## What roles are available? -With the advent of the paid plans we revamped the roles to cover needs expressed by Netdata users, like providing more limited access to their customers, or -being able to join any Room. We also aligned the offered roles to the target audience of each plan. The end result is the following: +With the advent of the paid plans, we revamped the roles to cover needs expressed by Netdata users, like providing more limited access to their customers, or +being able to join any Room. We also aligned the offered roles to the target audience of each plan. The result is the following: | **Role** | **Community** | **Homelab** | **Business** | **Enterprise On-Premise** | |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------|:-------------------|:-------------------|:--------------------------| @@ -18,7 +18,7 @@ being able to join any Room. We also aligned the offered roles to the target aud ## Which functionalities are available for each role? -In more detail, you can find on the following tables which functionalities are available for each role on each domain. +In more detail, you can find on the following tables which functionalities are available for each role in each domain. ### Space Management diff --git a/docs/netdata-cloud/view-plan-and-billing.md b/docs/netdata-cloud/view-plan-and-billing.md index 2b1a342255fd8f..934409a281110f 100644 --- a/docs/netdata-cloud/view-plan-and-billing.md +++ b/docs/netdata-cloud/view-plan-and-billing.md @@ -6,7 +6,7 @@ For more info visit the [Netdata Cloud Pricing](https://netdata.cloud/pricing) p ## Plans -Plans define the features and customization options available within a Space. Different Spaces can have different plans, giving you flexibility based on your needs. +Plans define the features and customization options available within Space. Different Spaces can have different plans, giving you flexibility based on your needs. Netdata Cloud plans (excluding Community) involve: @@ -44,7 +44,7 @@ Please refer to the [Netdata Cloud Pricing](https://netdata.cloud/pricing) page ### Prerequisites - A Netdata Cloud account -- Admin or Billing user access to the Space +- Admin or Billing user access to Space ### Steps @@ -53,26 +53,26 @@ Please refer to the [Netdata Cloud Pricing](https://netdata.cloud/pricing) page 1. Navigate to **Space settings** (the cog above your profile icon). 2. Select the **Plan & Billing** tab. 3. You'll see: - - **Credit** amount, if applicable, usable for future invoices or subscription changes. More on this at [Plan changes and credit balance](/docs/netdata-cloud/view-plan-and-billing.md#plan-changes-and-credit-balance). - - **Billing email** linked to your subscription, where all related notifications are sent. - - A link to the **Billing options and Invoices** in our billing provider's Customer Portal, where you can: - - Manage subscriptions and payment methods. - - Update billing information such as email, address, phone number, and Tax ID. - - View invoice history. - - The **Change plan** button, showing details of your current plan with options to upgrade or cancel. - - Your **Usage chart**, displaying daily and period counts of live nodes and how they relate to your billing. + - **Credit** amount, if applicable, usable for future invoices or subscription changes. More on this at [Plan changes and credit balance](/docs/netdata-cloud/view-plan-and-billing.md#plan-changes-and-credit-balance). + - **Billing email** linked to your subscription, where all related notifications are sent. + - A link to the **Billing options and Invoices** in our billing provider's Customer Portal, where you can: + - Manage subscriptions and payment methods. + - Update billing information such as email, address, phone number, and Tax ID. + - View invoice history. + - The **Change plan** button, showing details of your current plan with options to upgrade or cancel. + - Your **Usage chart**, displaying daily and period counts of live nodes and how they relate to your billing. #### Update a Subscription Plan 1. In the **Plan & Billing** tab, click **Change plan** to see: - - Billing frequency and committed nodes (if applicable). - - Current billing information, which must be updated through our billing provider's Customer Portal via **Change billing info and payment method** link. - - Options to enter a promotion code and a breakdown of charges, including subscription total, applicable discounts, credit usage, tax details, and total payable amount. + - Billing frequency and committed nodes (if applicable). + - Current billing information, which must be updated through our billing provider's Customer Portal via **Change billing info and payment method** link. + - Options to enter a promotion code and a breakdown of charges, including subscription total, applicable discounts, credit usage, tax details, and total payable amount. > **Note** > > - Checkout is performed directly if there's an active plan. -> - Plan changes, including downgrades or cancellations, may impact notification settings or user access. More details at [Plan changes and credit balance](/docs/netdata-cloud/view-plan-and-billing.md#plan-changes-and-credit-balance). +> - Plan changes, including downgrades or cancellations, may impact notification settings or user access. More details on [Plan changes and credit balance](/docs/netdata-cloud/view-plan-and-billing.md#plan-changes-and-credit-balance). ## FAQ diff --git a/docs/observability-centralization-points/README.md b/docs/observability-centralization-points/README.md index ede2037ad42fe2..2547df503b5ad8 100644 --- a/docs/observability-centralization-points/README.md +++ b/docs/observability-centralization-points/README.md @@ -14,6 +14,5 @@ Observability Centralization Points are crucial for ensuring comprehensive monit When multiple independent centralization points are available: -- Netdata Cloud queries all of them in parallel, to provide a unified infrastructure view. - -- Without Netdata Cloud, the dashboards of each of the Netdata Parents provide unified views of the infrastructure aggregated to each of them (metrics and logs). +- Netdata Cloud provides a unified infrastructure view by querying all points in parallel. +- Parent nodes without Cloud access provide consolidated views of their connected infrastructure's metrics and logs. diff --git a/docs/observability-centralization-points/best-practices.md b/docs/observability-centralization-points/best-practices.md index 74a84da1251db3..2c77b0567b4d9f 100644 --- a/docs/observability-centralization-points/best-practices.md +++ b/docs/observability-centralization-points/best-practices.md @@ -2,17 +2,20 @@ When planning the deployment of Observability Centralization Points, the following factors need consideration: -1. **Volume of Monitored Systems**: The number of systems being monitored dictates the scaling and number of centralization points required. Larger infrastructures may necessitate multiple centralization points to manage the volume of data effectively and maintain performance. +1. **Volume of Monitored Systems**: The number of systems being monitored dictates the scaling and number of centralization points required. Larger infrastructures may require multiple centralization points to manage the volume of data effectively and maintain performance. 2. **Cost of Data Transfer**: Particularly in multi-cloud or hybrid environments, the location of centralization points can significantly impact egress bandwidth costs. Strategically placing centralization points in each data center or cloud region can minimize these costs by reducing the need for cross-network data transfer. 3. **Usability without Netdata Cloud**: When not using Netdata Cloud, observability with Netdata is simpler when there are fewer centralization points, making it easier to remember where observability is and how to access it. -4. When Netdata Cloud is used, infrastructure level views are provided independently of the centralization points, so it is preferable to centralize as required for security (e.g. internet access), cost control (e.g. egress bandwidth, dedicated resources) and operational efficiency (regions, services or teams isolation). +4. Netdata Cloud provides infrastructure-wide views regardless of centralization points, allowing you to optimize your setup based on: + - Security requirements (such as internet access controls) + - Cost management (including bandwidth and resource allocation) + - Operational needs (like regional, service, or team isolation) ## Cost Optimization -Netdata has been designed for observability cost optimization. For optimal cost we recommend using Netdata Cloud and multiple independent observability centralization points: +Netdata has been designed for observability cost optimization. For optimal cost, we recommend using Netdata Cloud and multiple independent observability centralization points: - **Scale out**: add more, smaller centralization points to distribute the load. This strategy provides the least resource consumption per unit of workload, maintaining optimal performance and resource efficiency across your observability infrastructure. @@ -20,13 +23,15 @@ Netdata has been designed for observability cost optimization. For optimal cost - **Unified or separate centralization for logs and metrics**: Netdata allows centralizing metrics and logs together or separately. Consider factors such as access frequency, data retention policies, and compliance requirements to enhance performance and reduce costs. -- **Decentralized configuration management**: each Netdata centralization point can have its own unique configuration for retention and alerts. This enables 1) finer control on infrastructure costs and 2) localized control for separate services or teams. +- **Decentralized configuration management**: each Netdata centralization point can have its own unique configuration for retention and alerts. This enables: + - Finer control on infrastructure costs + - Localized control for separate services or teams ## Pros and Cons Compared to other observability solutions, the design of Netdata offers: -- **Enhanced Scalability and Flexibility**: Netdata's support for multiple independent observability centralization points allows for a more scalable and flexible architecture. This feature is particularly advantageous in distributed and complex environments, enabling tailored observability strategies that can vary by region, service, or team requirements. +- **Enhanced Scalability and Flexibility**: Netdata's support for multiple independent observability centralization points allows for a more scalable and flexible architecture. This feature is particularly helpful in distributed and complex environments, enabling tailored observability strategies that can vary by region, service, or team requirements. - **Resilience and Fault Tolerance**: The ability to deploy multiple centralization points also contributes to greater system resilience and fault tolerance. Replication is a native feature of Netdata centralization points, so in the event of a failure at one centralization point, others can continue to function, ensuring continuous observability. diff --git a/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/README.md b/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/README.md index e40396a7eca940..4ad9602abc3076 100644 --- a/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/README.md +++ b/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/README.md @@ -50,6 +50,6 @@ stateDiagram-v2 Logs centralization points can be built using the `systemd-journald` methodologies, by configuring `systemd-journal-remote` (on the centralization point) and `systemd-journal-upload` (on the production system). -The logs centralization points and the metrics centralization points do not need to be the same. For clarity and simplicity however, when not otherwise required for operational or regulatory reasons, we recommend to have unified centralization points for both metrics and logs. +The logs centralization points and the metrics centralization points do not need to be the same. For clarity and simplicity, however, when not otherwise required for operational or regulatory reasons, we recommend to have unified centralization points for both metrics and logs. -A Netdata running at the logs centralization point, will automatically detect and present the logs of all servers aggregated to it in a unified way (i.e. logs from all servers multiplexed in the same view). This Netdata may or may not be a Netdata Parent for metrics. +A Netdata running at the logs centralization point will automatically detect and present the logs of all servers aggregated to it in a unified way (i.e., logs from all servers multiplexed in the same view). This Netdata may or may not be a Netdata Parent for metrics. diff --git a/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/active-journal-source-without-encryption.md b/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/active-journal-source-without-encryption.md index 8abccad01af0f6..9cb22ff4b85ce2 100644 --- a/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/active-journal-source-without-encryption.md +++ b/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/active-journal-source-without-encryption.md @@ -2,7 +2,7 @@ This page will guide you through creating an active journal source without the use of encryption. -Once you enable an active journal source on a server, `systemd-journal-gatewayd` will expose an REST API on TCP port 19531. This API can be used for querying the logs, exporting the logs, or monitoring new log entries, remotely. +Once you enable an active journal source on a server, `systemd-journal-gatewayd` will expose an REST API on TCP port 19531. This API can be used for querying the logs, exporting the logs, or monitoring new log entries remotely. > ⚠️ **IMPORTANT**
> These instructions will expose your logs to the network, without any encryption or authorization.
@@ -97,7 +97,7 @@ You can repeat this process to create as many `systemd-journal-remote` services, ## Verify it works -To verify the central server is receiving logs, run this on the central server: +To verify that the central server is receiving logs, run this on the central server: ```bash sudo ls -l /var/log/journal/remote/ diff --git a/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/passive-journal-centralization-with-encryption-using-self-signed-certificates.md b/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/passive-journal-centralization-with-encryption-using-self-signed-certificates.md index 8509a33da99b54..23f14fced11e49 100644 --- a/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/passive-journal-centralization-with-encryption-using-self-signed-certificates.md +++ b/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/passive-journal-centralization-with-encryption-using-self-signed-certificates.md @@ -2,11 +2,11 @@ This page will guide you through creating a **passive** journal centralization setup using **self-signed certificates** for encryption and authorization. -Once you centralize your infrastructure logs to a server, Netdata will automatically detect all the logs from all servers and organize them in sources. With the setup described in this document, on recent systemd versions, Netdata will automatically name all remote sources using the names of the clients, as they are described at their certificates (on older versions, the names will be IPs or reverse DNS lookups of the IPs). +Once you centralize your infrastructure logs to a server, Netdata will automatically detect all the logs from all servers and organize them in sources. With the setup described in this document, on recent systemd versions, Netdata will automatically name all remote sources using the names of the clients, as they’re described at their certificates (on older versions, the names will be IPs or reverse DNS lookups of the IPs). A **passive** journal server waits for clients to push their metrics to it, so in this setup we will: -1. configure a certificates authority and issue self-signed certificates for your servers. +1. configure a certificate authority and issue self-signed certificates for your servers. 2. configure `systemd-journal-remote` on the server, to listen for incoming connections. 3. configure `systemd-journal-upload` on the clients, to push their logs to the server. @@ -16,7 +16,7 @@ Keep in mind that the authorization involved works like this: So, **the server will accept logs from any client having a valid certificate**. 2. The client (`systemd-journal-upload`) validates that the receiver (`systemd-journal-remote`) uses a trusted certificate (like the server does) and it also checks that the hostname or IP of the URL specified to its configuration, matches one of the names or IPs of the server it gets connected to. So, **the client does a validation that it connected to the right server**, using the URL hostname against the names and IPs of the server on its certificate. -This means, that if both certificates are issued by the same certificate authority, only the client can potentially reject the server. +This means that if both certificates are issued by the same certificate authority, only the client can potentially reject the server. ## Self-signed certificates @@ -24,7 +24,7 @@ To simplify the process of creating and managing self-signed certificates, we ha This helps to also automate the distribution of the certificates to your servers (it generates a new bash script for each of your servers, which includes everything required, including the certificates). -We suggest to keep this script and all the involved certificates at the journals centralization server, in the directory `/etc/ssl/systemd-journal`, so that you can make future changes as required. If you prefer to keep the certificate authority and all the certificates at a more secure location, just use the script on that location. +We suggest keeping this script and all the involved certificates at the journal centralization server, in the directory `/etc/ssl/systemd-journal`, so that you can make future changes as required. If you prefer to keep the certificate authority and all the certificates at a more secure location, use the script on that location. On the server that will issue the certificates (usually the centralization server), do the following: @@ -52,7 +52,7 @@ Where: Repeat this process to create the certificates for all your servers. You can add servers as required, at any time in the future. -Existing certificates are never re-generated. Typically certificates need to be revoked and new ones to be issued. But `systemd-journal-remote` tools do not support handling revocations. So, the only option you have to re-issue a certificate is to delete its files in `/etc/ssl/systemd-journal` and run the script again to create a new one. +Existing certificates are never re-generated. Typically, certificates need to be revoked and new ones to be issued. But `systemd-journal-remote` tools don’t support handling revocations. So, the only option you have to re-issue a certificate is to delete its files in `/etc/ssl/systemd-journal` and run the script again to create a new one. Once you run the script of each of your servers, in `/etc/ssl/systemd-journal` you will find shell scripts named `runme-on-XXX.sh`, where `XXX` are the canonical names of your servers. @@ -64,13 +64,13 @@ You can copy and paste (or `scp`) these scripts on your server and each of your sudo scp /etc/ssl/systemd-journal/runme-on-XXX.sh XXX:/tmp/ ``` -For the rest of this guide, we assume that you have copied the right `runme-on-XXX.sh` at the `/tmp` of all the servers for which you issued certificates. +For the rest of this guide, we assume that you’ve copied the right `runme-on-XXX.sh` at the `/tmp` of all the servers for which you issued certificates. ### note about certificates file permissions It is worth noting that `systemd-journal` certificates need to be owned by `systemd-journal-remote:systemd-journal`. -Both the user `systemd-journal-remote` and the group `systemd-journal` are automatically added by the `systemd-journal-remote` package. However, `systemd-journal-upload` (and `systemd-journal-gatewayd` - that is not used in this guide) use dynamic users. Thankfully they are added to the `systemd-journal` remote group. +Both the user `systemd-journal-remote` and the group `systemd-journal` are automatically added by the `systemd-journal-remote` package. However, `systemd-journal-upload` (and `systemd-journal-gatewayd` - that is not used in this guide) use dynamic users. Thankfully they’re added to the `systemd-journal` remote group. So, by having the certificates owned by `systemd-journal-remote:systemd-journal`, satisfies both `systemd-journal-remote` which is not in the `systemd-journal` group, and `systemd-journal-upload` (and `systemd-journal-gatewayd`) which use dynamic users. @@ -200,7 +200,7 @@ Here it is in action, in Netdata: ## Verify it works -To verify the central server is receiving logs, run this on the central server: +To verify that the central server is receiving logs, run this on the central server: ```bash sudo ls -l /var/log/journal/remote/ diff --git a/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/passive-journal-centralization-without-encryption.md b/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/passive-journal-centralization-without-encryption.md index a89379e4b41de8..483932a1b0d355 100644 --- a/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/passive-journal-centralization-without-encryption.md +++ b/docs/observability-centralization-points/logs-centralization-points-with-systemd-journald/passive-journal-centralization-without-encryption.md @@ -2,7 +2,7 @@ This page will guide you through creating a passive journal centralization setup without the use of encryption. -Once you centralize your infrastructure logs to a server, Netdata will automatically detects all the logs from all servers and organize them in sources. +Once you centralize your infrastructure logs to a server, Netdata will automatically detect all the logs from all servers and organize them in sources. With the setup described in this document, journal files are identified by the IPs of the clients sending the logs. Netdata will automatically do reverse DNS lookups to find the names of the server and name the sources on the dashboard accordingly. @@ -101,7 +101,7 @@ sudo systemctl start systemd-journal-upload ## Verify it works -To verify the central server is receiving logs, run this on the central server: +To verify that the central server is receiving logs, run this on the central server: ```bash sudo ls -l /var/log/journal/remote/ diff --git a/docs/observability-centralization-points/metrics-centralization-points/README.md b/docs/observability-centralization-points/metrics-centralization-points/README.md index 812b493d77b5a9..c8ee2ba5561e6d 100644 --- a/docs/observability-centralization-points/metrics-centralization-points/README.md +++ b/docs/observability-centralization-points/metrics-centralization-points/README.md @@ -1,4 +1,3 @@ - # Metrics Centralization Points (Netdata Parents) ```mermaid @@ -16,7 +15,7 @@ Netdata **Streaming and Replication** copies the recent past samples (replicatio Each production system (Netdata Child) can stream to **only one** Netdata Parent at a time. The configuration allows configuring multiple Netdata Parents for high availability, but only the first found working will be used. -Netdata Parents receive metric samples **from multiple** production systems (Netdata Children) and have the option to re-stream them to another Netdata Parent. This allows building an infinite hierarchy of Netdata Parents. It also enables the configuration of Netdata Parents Clusters, for high availability. +Netdata Parents receive metric samples **from multiple** production systems (Netdata Children) and can re-stream them to another Netdata Parent. This allows building an infinite hierarchy of Netdata Parents. It also enables the configuration of Netdata Parents Clusters, for high availability. | Feature | Netdata Child (production system) | Netdata Parent (centralization point) | |:---------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------:| @@ -25,7 +24,7 @@ Netdata Parents receive metric samples **from multiple** production systems (Net | Alerts & Notifications | Can be disabled (enabled by default). | Runs health checks and sends notifications for all systems aggregated to it. | | API and Dashboard | Can be disabled (enabled by default). | Serves the dashboard for all systems aggregated to it, using its own retention. | | Exporting Metrics | Not required (enabled by default). | Exports the samples of all metrics collected by the systems aggregated to it. | -| Netdata Functions | Netdata Child must be online. | Forwards Functions requests to the Children connected to it. | +| Netdata Functions | Netdata Child must be online. | Forwards Functions requests to the Children connected to it. | | Connection to Netdata Cloud | Not required. | Each Netdata Parent registers to Netdata Cloud all systems aggregated to it. | ## Supported Configurations @@ -37,11 +36,11 @@ For Netdata Children: For Netdata Parents: -1. **Standalone**: The Parent is standalone, either the only Parent available in the infrastructure, or the top-most of an hierarchy of Parents. -2. **Cluster**: The Parent is part of a cluster of Parents, all having the same data, from the same Children. A Cluster of Parents offers high-availability. +1. **Standalone**: The Parent is standalone, either the only Parent available in the infrastructure, or the top-most of a hierarchy of Parents. +2. **Cluster**: The Parent is part of a cluster of Parents, all having the same data from the same Children. A Cluster of Parents offers high-availability. 3. **Proxy**: The Parent receives metrics and stores them locally, but it also forwards them to a Grand Parent. -A Cluster is configured as a number of circular **Proxies**, ie. each of the nodes in a cluster has all the others configured as its Parents. So, if multiple levels of metrics centralization points (Netdata Parents) are required, only the top-most level can be a cluster. +A Cluster consists of nodes configured as circular **Proxies**, where each node acts as a Parent to all others. When using multiple levels of centralization, only the top level can be configured as a cluster. ## Best Practices diff --git a/docs/observability-centralization-points/metrics-centralization-points/clustering-and-high-availability-of-netdata-parents.md b/docs/observability-centralization-points/metrics-centralization-points/clustering-and-high-availability-of-netdata-parents.md index 412263bebac4db..c4e3b32fc02ff0 100644 --- a/docs/observability-centralization-points/metrics-centralization-points/clustering-and-high-availability-of-netdata-parents.md +++ b/docs/observability-centralization-points/metrics-centralization-points/clustering-and-high-availability-of-netdata-parents.md @@ -13,18 +13,18 @@ flowchart BT P2 .->|failover| P1 ``` -Netdata supports building Parent clusters of 2+ nodes. Clustering and high availability works like this: +Netdata supports building Parent clusters of 2+ nodes. Clustering and high availability work like this: -1. All Netdata Children are configured to stream to all Netdata Parents. The first one found working will be used by each Netdata Child and the others will be automatically used if and when this connection is interrupted. -2. The Netdata Parents are configured to stream to all other Netdata Parents. For each of them, the first found working will be used and the others will be automatically used if and when this connection is interrupted. +1. All Netdata Children are configured to stream to all Netdata Parents. The first one found working will be used by each Netdata Child, and the others will be automatically used if and when this connection is interrupted. +2. The Netdata Parents are configured to stream to all other Netdata Parents. For each of them, the first-found working will be used and the others will be automatically used if and when this connection is interrupted. All the Netdata Parents in such a cluster will receive all the metrics of all Netdata Children connected to any of them. They will also receive the metrics all the other Netdata Parents have. -In case there is a failure on any of the Netdata Parents, the Netdata Children connected to it will automatically failover to another available Netdata Parent, which now will attempt to re-stream all the metrics it receives to the other available Netdata Parents. +If a Parent node fails, its Child nodes automatically connect to another available Parent node, which then re-streams metrics to all other Parent nodes. Netdata Cloud will receive registrations for all Netdata Children from all the Netdata Parents. As long as at least one of the Netdata Parents is connected to Netdata Cloud, all the Netdata Children will be available on Netdata Cloud. -Netdata Children need to maintain a retention only for the time required to switch Netdata Parents. When Netdata Children connect to a Netdata Parent, they negotiate the available retention and any missing data on the Netdata Parent are replicated from the Netdata Children. +Netdata Children need to maintain retention only for the time required to switch Netdata Parents. When Netdata Children connect to a Netdata Parent, they negotiate the available retention and any missing data on the Netdata Parent are replicated from the Netdata Children. ## Restoring a Netdata Parent after maintenance @@ -41,7 +41,7 @@ To block access from Netdata Children, and still allow access from other Netdata The easiest way is to `rsync` the directory `/var/cache/netdata` from the existing Netdata Parent to the new Netdata Parent. -> Important: Starting the new Netdata Parent with default settings, may delete the new files in `/var/cache/netdata` to apply the default disk size constraints. Therefore it is important to set the right retention settings in the new Netdata Parent before starting it up with the copied files. +> Important: Starting the new Netdata Parent with default settings, may delete the new files in `/var/cache/netdata` to apply the default disk size constraints. Therefore, it is important to set the right retention settings in the new Netdata Parent before starting it up with the copied files. To configure retention at the new Netdata Parent, set in `netdata.conf` the following to at least the values the old Netdata Parent has: diff --git a/docs/observability-centralization-points/metrics-centralization-points/configuration.md b/docs/observability-centralization-points/metrics-centralization-points/configuration.md index 2ba5d9b070a714..5f23cdd1ee846e 100644 --- a/docs/observability-centralization-points/metrics-centralization-points/configuration.md +++ b/docs/observability-centralization-points/metrics-centralization-points/configuration.md @@ -4,10 +4,10 @@ Metrics streaming configuration for both Netdata Children and Parents is done vi `netdata.conf` and `stream.conf` have the same `ini` format, but `netdata.conf` is considered a non-sensitive file, while `stream.conf` contains API keys, IPs and other sensitive information that enable communication between Netdata Agents. -`stream.conf` has 2 main sections: +`stream.conf` has two main sections: -- The `[stream]` section includes options for the **sending Netdata** (ie Netdata Children, or Netdata Parents that stream to Grand Parents, or to other sibling Netdata Parents in a cluster). -- The rest includes multiple sections that define API keys for the **receiving Netdata** (ie. Netdata Parents). +- The `[stream]` section includes options for the **sending Netdata** (i.e., Netdata Children, or Netdata Parents that stream to Grand Parents, or to other sibling Netdata Parents in a cluster). +- The rest includes multiple sections that define API keys for the **receiving Netdata** (i.e., Netdata Parents). ## Edit `stream.conf` @@ -61,8 +61,8 @@ While encrypting the connection between your parent and child nodes is recommend This example uses self-signed certificates. > **Note** -> This section assumes you have read the documentation on [how to edit the Netdata configuration files](/docs/netdata-agent/configuration/README.md). - +> This section assumes you have read the documentation on [how to edit the Netdata configuration files](/docs/netdata-agent/configuration/README.md). + 1. **Parent node** To generate an SSL key and certificate using `openssl`, take a look at the related section around [Securing Netdata Agents](/src/web/server/README.md#enable-httpstls-support) in our Documentation. @@ -78,7 +78,7 @@ This example uses self-signed certificates. api key = 11111111-2222-3333-4444-555555555555 ``` -3. Restart the Netdata Agent on both the parent and child nodes, to stream encrypted metrics using TLS/SSL. +3. Restart the Netdata Agent on both the parent and child nodes to stream encrypted metrics using TLS/SSL. ## Troubleshooting Streaming Connections diff --git a/docs/observability-centralization-points/metrics-centralization-points/faq.md b/docs/observability-centralization-points/metrics-centralization-points/faq.md index 917b8088a4c9d1..f3d3ece0f0e423 100644 --- a/docs/observability-centralization-points/metrics-centralization-points/faq.md +++ b/docs/observability-centralization-points/metrics-centralization-points/faq.md @@ -12,10 +12,10 @@ No. When you set up an active-active cluster, even if child nodes connect random ## How much retention do the child nodes need? -Child nodes need to have only the retention required in order to connect to another Parent if one fails or stops for maintenance. +Child nodes need to have only the retention required to connect to another Parent if one fails or stops for maintenance. - If you have a cluster of parents, 5 to 10 minutes in `alloc` mode is usually enough. -- If you have only 1 parent, it would be better to run the child nodes with `dbengine` so that they will have enough retention to back-fill the parent node if it stops for maintenance. +- If you have only one parent, it would be better to run the child nodes with `dbengine` so that they will have enough retention to back-fill the parent node if it stops for maintenance. ## Does streaming between child nodes and parents support encryption? @@ -29,7 +29,7 @@ No. The streaming protocol works on the same port as the internal web server of Although this can be done and for streaming between child and parent nodes it could work, we recommend not doing it. It can lead to several kinds of problems. -It is better to configure all the parent nodes directly in the child nodes `stream.conf`. The child nodes will do everything in their power to find a parent node to connect and they will never give up. +It is better to configure all the parent nodes directly in the child nodes `stream.conf`. The child nodes will do everything in their power to find a parent node to connect, and they will never give up. ## When I have multiple parents for the same children, will I receive alert notifications from all of them? @@ -41,7 +41,7 @@ We recommend using Netdata Cloud to avoid receiving duplicate alert notification Yes. Function requests will be received by the Parents and forwarded to the Child via their streaming connection. Function requests are propagated between parents, so this will work even if multiple levels of Netdata Parents are involved. -## If I have a cluster of parents and get one out for maintenance for a few hours, will it have missing data when it returns back online? +## If I have a cluster of parents and get one out for maintenance for a few hours, will it have missing data when it returns online? Check [Restoring a Netdata Parent after maintenance](/docs/observability-centralization-points/metrics-centralization-points/clustering-and-high-availability-of-netdata-parents.md). @@ -61,9 +61,9 @@ Yes. When configuring the Parents at the Children `stream.conf`, configure them It depends on the ephemerality setting of each Netdata Child. -1. **Permanent nodes**: These are nodes that should be available permanently and if they disconnect an alert should be triggered to notify you. By default, all nodes are considered permanent (not ephemeral). +1. **Permanent nodes**: These are nodes that should be available permanently and if they disconnect, an alert should be triggered to notify you. By default, all nodes are considered permanent (not ephemeral). -2. **Ephemeral nodes**: These are nodes that are ephemeral by nature and they may shutdown at any point in time without any impact on the services you run. +2. **Ephemeral nodes**: These are nodes that are ephemeral by nature, and they may shut down at any point in time without any impact on the services you run. To set the ephemeral flag on a node, edit its netdata.conf and in the `[global]` section set `is ephemeral node = yes`. This setting is propagated to parent nodes and Netdata Cloud. @@ -75,4 +75,4 @@ A node can be forced into this "forgotten" state with the Netdata CLI tool on th netdatacli remove-stale-node ``` -When using Netdata Cloud (via a parent or directly) and a permanent node gets disconnected, Netdata Cloud sends node disconnection notifications. +When using Netdata Cloud (via a parent or directly), and a permanent node gets disconnected, Netdata Cloud sends node disconnection notifications. diff --git a/docs/observability-centralization-points/metrics-centralization-points/replication-of-past-samples.md b/docs/observability-centralization-points/metrics-centralization-points/replication-of-past-samples.md index e0c60e89fe2f17..25ed40e0f0da0c 100644 --- a/docs/observability-centralization-points/metrics-centralization-points/replication-of-past-samples.md +++ b/docs/observability-centralization-points/metrics-centralization-points/replication-of-past-samples.md @@ -6,7 +6,7 @@ The same replication mechanism is used between Netdata Parents (the sending Netd ## Replication Limitations -The current implementation is optimized to replicate small durations and have minimal impact during reconnects. As a result it has the following limitations: +The current implementation is optimized to replicate small durations and have minimal impact during reconnecting. As a result, it has the following limitations: 1. Replication can only append samples to metrics. Only missing samples at the end of each time-series are replicated. @@ -49,9 +49,9 @@ On the receiving side (Netdata Parent): On the sending side (Netdata Children, or Netdata Parent when parents are clustered): -- `[db].replication threads` controls how many concurrent threads will be replicating metrics. The default is 1. Usually the performance is about 2 million samples per second per thread, so increasing this number may allow replication to progress faster between Netdata Parents. +- `[db].replication threads` controls how many concurrent threads will be replicating metrics. The default is 1. Usually the performance is about two million samples per second per thread, so increasing this number may allow replication to progress faster between Netdata Parents. -- `[db].cleanup obsolete charts after` controls for how much time after metrics stop being collected will not be available for replication. The default is 1 hour (3600 seconds). If you plan to have scheduled maintenance on Netdata Parents of more than 1 hour, we recommend increasing this setting. Keep in mind however, that increasing this duration in highly ephemeral environments can have an impact on RAM utilization, since metrics will be considered as collected for longer durations. +- `[db].cleanup obsolete charts after` controls for how much time after metrics stop being collected will not be available for replication. The default is 1 hour (3600 seconds). If you plan to have scheduled maintenance on Netdata Parents of more than 1 hour, we recommend increasing this setting. Keep in mind, however, that increasing this duration in highly ephemeral environments can have an impact on RAM utilization, since metrics will be considered as collected for longer durations. ## Monitoring Replication Progress diff --git a/docs/security-and-privacy-design/README.md b/docs/security-and-privacy-design/README.md index 5333087a9fcd3f..a6766c90e3149a 100644 --- a/docs/security-and-privacy-design/README.md +++ b/docs/security-and-privacy-design/README.md @@ -30,8 +30,8 @@ the [OSSF guidelines](https://bestpractices.coreinfrastructure.org/en/projects/2 Netdata Cloud boasts of comprehensive end-to-end automated testing, encompassing the UI, back-end, and Agents, where involved. In addition, the Netdata Agent uses an array of third-party services for static code analysis, -security analysis, and CI/CD integrations to ensure code quality on a per pull request basis. Tools like Github's -CodeQL, Github's Dependabot, our own unit tests, various types of linters, +security analysis, and CI/CD integrations to ensure code quality on a per pull request basis. Tools like GitHub's +CodeQL, GitHub's Dependabot, our own unit tests, various types of linters, and [Coverity](https://scan.coverity.com/projects/netdata-netdata?tab=overview) are utilized to this end. Moreover, each PR requires two code reviews from our senior engineers before being merged. We also maintain two @@ -43,7 +43,7 @@ stress-testing our entire solution. This robust pipeline ensures the delivery of While Netdata doesn't have a dedicated internal security team, the open-source Netdata Agent undergoes regular testing by third parties. Any security reports received are addressed immediately. In contrast, Netdata Cloud operates in a fully automated and isolated environment with Infrastructure as Code (IaC), ensuring no direct access to production -applications. Monitoring and reporting is also fully automated. +applications. Monitoring and reporting are also fully automated. ### Security Vulnerability Response @@ -51,7 +51,7 @@ Netdata has a transparent and structured process for handling security vulnerabi contributions of security researchers and users who report vulnerabilities to us. All reports are thoroughly investigated, and any identified vulnerabilities trigger a Security Release Process. -We aim to fully disclose any bugs as soon as a user mitigation is available, typically within a week of the report. In +We aim to fully disclose any bugs as soon as user mitigation is available, typically within a week of the report. In case of security fixes, we promptly release a new version of the software. Users can subscribe to our releases on GitHub to stay updated about all security incidents. More details about our vulnerability response process can be found [here](https://github.com/netdata/netdata/security/policy). @@ -60,28 +60,22 @@ found [here](https://github.com/netdata/netdata/security/policy). In line with our commitment to security, we uphold the best practices as outlined by the Open Source Security Foundation. This commitment reflects in every aspect of our operations, from the design phase to the release process, -ensuring the delivery of a secure and reliable product to our users. For more information -check [here](https://bestpractices.coreinfrastructure.org/en/projects/2231). +ensuring the delivery of a secure and reliable product to our users. For more information, check [here](https://bestpractices.coreinfrastructure.org/en/projects/2231). ## Compliance with Regulations -Netdata is committed to ensuring the security, privacy, and integrity of user data. It complies with both the General -Data Protection Regulation (GDPR), a regulation in EU law on data protection and privacy, and the California Consumer -Privacy Act (CCPA), a state statute intended to enhance privacy rights and consumer protection for residents of -California. +Netdata is committed to the highest standards of data security and privacy, complying with the EU's General Data Protection Regulation (GDPR) and California's Consumer Privacy Act (CCPA). ### Compliance with GDPR and CCPA Compliance with GDPR and CCPA are self-assessment processes, and Netdata has undertaken thorough internal audits and controls to ensure it meets all requirements. -As per request basis, any customer may enter with Netdata into a data processing addendum (DPA) governing customer’s -ability to load and permit Netdata to process any personal data or information regulated under applicable data -protection laws, including the GDPR and CCPA. +Netdata offers Data Processing Agreements (DPAs) upon request, allowing customers to process personal data in compliance with applicable privacy regulations, including GDPR and CCPA. ### Data Transfers -While Netdata Agent itself does not engage in any cross-border data transfers, certain **observability metadata** (e.g. +While Netdata Agent itself does not engage in any cross-border data transfers, certain **observability metadata** (e.g., hostnames, metric names, alert names, and alert transitions) is transferred to Netdata Cloud solely to provide routing and alert notifications. **Observability data**, consisting of metric values (time series) and log events, stays strictly within the user's infrastructure, mitigating cross-border data transfer concerns. @@ -92,7 +86,7 @@ maintains only necessary metadata, while full control of observability data rema Netdata Cloud only stores Netdata Cloud users identification data (such as observability users' email addresses) and infrastructure metadata (such as infrastructure hostnames) necessary for Netdata Cloud's operation. All these metadata -are stored in data centers in the United States, using compliant infrastructure providers such as Google Cloud and +is stored in data centers in the United States, using compliant infrastructure providers such as Google Cloud and Amazon Web Services. These transfers and storage are carried out in full compliance with applicable data protection laws, including GDPR and CCPA. @@ -105,7 +99,7 @@ into and accessing their profile, at the bottom left ### Regular Review and Updates -Netdata is dedicated to keeping its practices up-to-date with the latest developments in data protection regulations. +Netdata is dedicated to keeping its practices up to date with the latest developments in data protection regulations. Therefore, as soon as updates or changes are made to these regulations, Netdata reviews and updates its policies and practices accordingly to ensure continual compliance. @@ -127,12 +121,12 @@ improvement. The purpose of collecting these statistics and telemetry data is to guide the development of the open-source Agent, focusing on areas that are most beneficial to users. -Users have the option to opt out of this data collection during the installation of the Agent, or at any time by +Users can opt out of this data collection during the installation of the Agent, or at any time by removing a specific file from their system. -Netdata retains this data indefinitely in order to track changes and trends within the community over time. +Netdata retains this data indefinitely to track changes and trends within the community over time. -Netdata does not share these anonymous statistics or telemetry data with any third parties. +Netdata doesn’t share these anonymous statistics or telemetry data with any third parties. By collecting this data, Netdata is able to continuously improve their service and identify any issues or areas for improvement, while respecting user privacy and maintaining transparency. @@ -146,41 +140,41 @@ include: Netdata Cloud securely handles observability metadata in isolated environments, while observability data remains exclusively within user premises, stored locally and managed by the user. This distinction ensures that only minimal metadata is required for routing and system identification. -3. **Infrastructure as Code (IaC)** : +2. **Infrastructure as Code (IaC)** : Netdata Cloud follows the IaC model, which means it is a microservices environment that is completely isolated. All changes are managed through Terraform, an open-source IaC software tool that provides a consistent CLI workflow for managing cloud services. -4. **TLS Termination and IAM Service** : +3. **TLS Termination and IAM Service** : At the edge of Netdata Cloud, there is a TLS termination, which provides the decryption point for incoming TLS connections. Additionally, an Identity Access Management (IAM) service validates JWT tokens included in request cookies or denies access to them. -5. **Session Identification** : +4. **Session Identification** : Once inside the microservices environment, all requests are associated with session IDs that identify the user making the request. This approach provides additional layers of security and traceability. -6. **Data Storage** : +5. **Data Storage** : Data is stored in various NoSQL and SQL databases and message brokers. The entire environment is fully isolated, providing a secure space for data management. -7. **Authentication** : +6. **Authentication** : Netdata Cloud does not store credentials. It offers three types of authentication: GitHub Single Sign-On (SSO), Google SSO, and email validation. -8. **DDoS Protection** : +7. **DDoS Protection** : Netdata Cloud has multiple protection mechanisms against Distributed Denial of Service (DDoS) attacks, including rate-limiting and automated blacklisting. -9. **Security-Focused Development Process** : +8. **Security-Focused Development Process** : To ensure a secure environment, Netdata employs a security-focused development process. This includes the use of static code analyzers to identify potential security vulnerabilities in the codebase. -10. **High Security Standards** : - Netdata Cloud maintains high security standards and can provide additional customization on a per contract basis. -11. **Employee Security Practices** : - Netdata ensures its employees follow security best practices, including role-based access, periodic access review, - and multi-factor authentication. This helps to minimize the risk of unauthorized access to sensitive data. -12. **Experienced Developers** : +9. **High Security Standards** : + Netdata Cloud maintains high security standards and can provide additional customization on a per-contract basis. +10. **Employee Security Practices** : + Netdata ensures its employees follow security best practices, including role-based access, periodic access review, + and multifactor authentication. This helps to minimize the risk of unauthorized access to sensitive data. +11. **Experienced Developers** : Netdata hires senior developers with vast experience in security-related matters. It enforces two code reviews for every Pull Request (PR), ensuring that any potential issues are identified and addressed promptly. -13. **DevOps Methodologies** : - Netdata's DevOps methodologies use the highest standards in access control in all places, utilizing the best +12. **DevOps Methodologies** : + Netdata's DevOps methodologies use the highest standards in access control in all places, using the best practices available. -14. **Risk-Based Security Program** : +13. **Risk-Based Security Program** : Netdata has a risk-based security program that continually assesses and mitigates risks associated with data security. This program helps maintain a secure environment for user data. @@ -193,10 +187,10 @@ effectively. PCI DSS (Payment Card Industry Data Security Standard) is a set of security standards designed to ensure that all companies that accept, process, store or transmit credit card information maintain a secure environment. -Netdata is committed to providing secure and privacy-respecting services, and it aligns its practices with many of the -key principles of the PCI DSS. However, it's important to clarify that Netdata is not officially certified as PCI -DSS-compliant. While Netdata follows practices that align with PCI DSS's key principles, the company itself has not -undergone the formal certification process for PCI DSS compliance. +Netdata is committed to secure, privacy-focused services that align with key PCI DSS (Payment Card Industry Data Security Standard) principles. + +Netdata is committed to secure, privacy-focused services that align with key PCI DSS (Payment Card Industry Data Security Standard) principles. +However, it's important to clarify that Netdata is not officially certified as PCI DSS-compliant. While Netdata follows practices that align with PCI DSS's key principles, the company itself has not undergone the formal certification process for PCI DSS compliance. PCI DSS compliance is not just about the technical controls but also involves a range of administrative and procedural safeguards that go beyond the scope of Netdata's services. These include, among other things, maintaining a secure @@ -222,10 +216,7 @@ HIPAA compliance is not just about technical controls but also involves a range safeguards that go beyond the scope of Netdata's services. These include, among other things, employee training, physical security, and contingency planning. -Therefore, while Netdata can support HIPAA-regulated entities with their data security needs and is prepared to sign a -Business Associate Agreement (BAA), it is ultimately the responsibility of the healthcare entity to ensure full HIPAA -compliance across all of their operations. Entities should always consult with a legal expert or a HIPAA compliance -consultant to ensure that their use of any product, including Netdata, aligns with HIPAA regulations. +While Netdata supports HIPAA compliance and offers Business Associate Agreements (BAAs), healthcare entities are responsible for ensuring their overall HIPAA compliance, including their use of Netdata. We recommend consulting HIPAA compliance experts for comprehensive guidance. ## SOC 2 Compliance @@ -243,7 +234,7 @@ Netdata's commitment to system monitoring and troubleshooting ensures the availa ### Processing Integrity -Although Netdata primarily focuses on system monitoring and does not typically process customer data in a way that alters it, our commitment to accurate, timely, and valid delivery of services aligns with the processing integrity principle of SOC 2. +Although Netdata primarily focuses on system monitoring and doesn’t typically process customer data in a way that alters it, our commitment to accurate, timely, and valid delivery of services aligns with the processing integrity principle of SOC 2. ### Confidentiality @@ -255,7 +246,7 @@ Aligning with the privacy principle of SOC 2, Netdata adheres to GDPR and CCPA r ### Continuous Improvement and Future Considerations -Netdata is committed to continuous improvement in security and privacy. While we are not currently SOC 2 certified, we understand the importance of this framework and are continuously evaluating our processes and controls against industry best practices. As Netdata grows and evolves, we remain open to pursuing SOC 2 certification or other similar standards to further demonstrate our dedication to data security and privacy. +Netdata is committed to continuous improvement in security and privacy. While we aren’t currently SOC 2 certified, we understand the importance of this framework and are continuously evaluating our processes and controls against industry best practices. As Netdata grows and evolves, we remain open to pursuing SOC 2 certification or other similar standards to further demonstrate our dedication to data security and privacy. ## Conclusion @@ -272,7 +263,7 @@ The use of advanced encryption techniques, role-based access control, and robust strengthen the security of user data. Netdata Cloud also maintains transparency in its data handling practices, giving users control over their data and the ability to easily access, retrieve, correct, and delete their personal data. -Netdata's approach to anonymous statistics collection respects user privacy while enabling the company to improve its +Netdata's approach to an anonymous statistics collection respects user privacy while enabling the company to improve its product based on real-world usage data. Even in such cases, users have the choice to opt-out, underlining Netdata's respect for user autonomy. diff --git a/docs/security-and-privacy-design/netdata-agent-security.md b/docs/security-and-privacy-design/netdata-agent-security.md index aa6b3be12dd8e7..1b82c647cdb945 100644 --- a/docs/security-and-privacy-design/netdata-agent-security.md +++ b/docs/security-and-privacy-design/netdata-agent-security.md @@ -67,6 +67,6 @@ with a nice priority to protect production applications in case the system is st agents are configured by default to be the first processes to be killed by the operating system in case the operating system starves for memory resources (OS-OOM - Operating System Out Of Memory events). -## User Customizable Security Settings +## User-Customizable Security Settings Netdata provides users with the flexibility to customize the Agent's security settings. Users can configure TLS across the system, and the Agent provides extensive access control lists on all its interfaces to limit access to its endpoints based on IP. Additionally, users can configure the CPU and Memory priority of Netdata Agents. diff --git a/docs/security-and-privacy-design/netdata-cloud-security.md b/docs/security-and-privacy-design/netdata-cloud-security.md index 13270e7ec90fce..148202f3359341 100644 --- a/docs/security-and-privacy-design/netdata-cloud-security.md +++ b/docs/security-and-privacy-design/netdata-cloud-security.md @@ -3,7 +3,7 @@ Netdata Cloud is designed with a security-first approach to ensure the highest level of protection for user data. When using Netdata Cloud in environments that require compliance with standards like PCI DSS, SOC 2, or HIPAA, users can be confident that all collected data is stored within their infrastructure. Data viewed on dashboards and alert -notifications travel over Netdata Cloud, but are not stored—instead, they're transformed in transit, aggregated from +notifications travel over Netdata Cloud, but aren’t stored—instead, they're transformed in transit, aggregated from multiple Agents and parents (centralization points), to appear as one data source in the user's browser. ## User Identification and Authorization @@ -11,11 +11,11 @@ multiple Agents and parents (centralization points), to appear as one data sourc Netdata Cloud requires only an email address to create an account and use the service. User identification and authorization are conducted either via third-party integrations (Google, GitHub accounts) or through short-lived access tokens sent to the user’s email account. Email addresses are stored securely in our production database on AWS and are -also used for product and marketing communications. Netdata Cloud does not store user credentials. +also used for product and marketing communications. Netdata Cloud doesn’t store user credentials. ## Data Storage and Transfer -Although Netdata Cloud does not store metric data, it does keep some metadata for each node connected to user spaces. +Although Netdata Cloud doesn’t store metric data, it does keep some metadata for each node connected to user spaces. This metadata includes the hostname, information from the `/api/v1/info` endpoint, metric metadata from `/api/v1/contexts`, and alerts configurations from `/api/v1/alarms`. This data is securely stored in our production database on AWS and copied to Google BigQuery for analytics purposes. @@ -26,10 +26,7 @@ their node. Data in transit between a user and Netdata Cloud is encrypted using ## Data Retention and Erasure -Netdata Cloud maintains backups of customer content for approximately 90 days following a deletion. Users have the -ability to access, retrieve, correct, and delete personal data stored in Netdata Cloud. In case a user is unable to -delete personal data via self-services functionality, Netdata will delete personal data upon the customer's written -request, in accordance with applicable data protection law. +Netdata Cloud retains deleted customer content for 90 days. Users can access, modify, and delete their personal data through self-service tools. If needed, users can request data deletion in writing, which Netdata will process in accordance with data protection laws. ## Infrastructure and Authentication @@ -41,7 +38,7 @@ Netdata Cloud does not store user credentials. ## Security Features and Response -Netdata Cloud offers a variety of security features, including infrastructure-level dashboards, centralized alert notifications, auditing logs, and role-based access to different segments of the infrastructure. It employs several protection mechanisms against DDoS attacks, such as rate-limiting and automated blacklisting. It also uses static code analyzers to prevent other types of attacks. +Netdata Cloud offers a variety of security features, including infrastructure-level dashboards, centralized alert notifications, auditing logs, and role-based access to different segments of the infrastructure. It employs several protection mechanisms against DDoS attacks, such as rate-limiting and automated blocklisting. It also uses static code analyzers to prevent other types of attacks. In the event of potential security vulnerabilities or incidents, Netdata Cloud follows the same process as the Netdata agent. Every report is acknowledged and analyzed by the Netdata team within three working days, and the team keeps the @@ -52,7 +49,7 @@ reporter updated throughout the process. Netdata Cloud uses the highest level of security. There is no user customization available out of the box. Its security settings are designed to provide maximum protection for all users. We are offering customization (like custom SSO integrations, custom data retention policies, advanced user access controls, tailored audit logs, integration with other -security tools, etc.) on a per contract basis. +security tools, etc.) on a per-contract basis. ## Deleting Personal Data @@ -61,7 +58,7 @@ Users who wish to remove all personal data (including email and activities) can ## User Privacy and Data Protection Netdata Cloud is built with an unwavering commitment to user privacy and data protection. We understand that our users' -data is both sensitive and valuable, and we have implemented stringent measures to ensure its safety. +data is both sensitive and valuable, and we’ve implemented stringent measures to ensure its safety. ### Data Collection @@ -74,7 +71,7 @@ Additionally, the IP address used to access Netdata Cloud is stored in web proxy The collected email addresses are stored in our production database on Amazon Web Services (AWS) and copied to Google BigQuery, our data lake, for analytics purposes. These analytics are crucial for our product development process. If a user accepts the use of analytical cookies, their email address and IP are stored in the systems we use to track -application usage (Google Analytics, Posthog, and Gainsight PX). Subscriptions and Payments data are handled by Stripe. +application usage (Google Analytics, Posthog, and Gainsight PX). Stripe handles subscriptions and Payments data. ### Data Sharing @@ -84,18 +81,16 @@ its infrastructure, Stripe for payment processing, Google Analytics, Posthog and ### Data Protection -We use state-of-the-art security measures to protect user data from unauthorized access, use, or disclosure. All +We use the newest security measures to protect user data from unauthorized access, use, or disclosure. All infrastructure data visible on Netdata Cloud passes through the Agent-Cloud Link (ACLK) mechanism, which securely connects a Netdata Agent to Netdata Cloud. The ACLK is encrypted, safe, and is only established if the user connects their node. All data in transit between a user and Netdata Cloud is encrypted using TLS. ### User Control over Data -Netdata provides its users with the ability to access, retrieve, correct, and delete their personal data stored in -Netdata Cloud. This ability may occasionally be limited due to temporary service outages for maintenance or other -updates to Netdata Cloud, or when it is technically not feasible. If a customer is unable to delete personal data via -the self-services functionality, Netdata deletes the data upon the customer's written request, within the timeframe -specified in the Data Protection Agreement (DPA), and in accordance with applicable data protection laws. +Netdata provides its users with the ability to access, retrieve, correct, and delete their personal data stored in Netdata Cloud. +This ability may occasionally be limited due to temporary service outages for maintenance or other updates to Netdata Cloud, or when it is technically not possible. +If self-service data deletion isn't possible, Netdata will process written deletion requests within DPA-specified timeframes, in compliance with data protection laws. ### Compliance with Data Protection Laws diff --git a/docs/top-monitoring-netdata-functions.md b/docs/top-monitoring-netdata-functions.md index a2a8d9ac30150e..9ee6d77bf5a4cc 100644 --- a/docs/top-monitoring-netdata-functions.md +++ b/docs/top-monitoring-netdata-functions.md @@ -1,20 +1,10 @@ # Top Monitoring (Netdata Functions) -Netdata Agent collectors are able to expose functions that can be executed in run-time and on-demand. These will be -executed on the node/host where the function is made available. +Netdata Agent collectors provide on-demand, runtime executable functions on the host where they are deployed. Available since v1.37.1. ## What is a function? -Collectors besides the metric collection, storing, and/or streaming work are capable of executing specific routines on request. These routines will bring additional information to help you troubleshoot or even trigger some action to happen on the node itself. - -For more details please check out documentation on how we use our internal collector to get this from the first collector that exposes functions - [plugins.d](/src/plugins.d/README.md#function). - -## Prerequisites - -The following is required to be able to run Functions from Netdata Cloud. - -- At least one of the nodes connected to your Space should be on a Netdata Agent version higher than `v1.37.1` -- Ensure that the node has the collector that exposes the function you want enabled +Beyond their primary roles of collecting metrics, collectors can execute specific routines when requested. These routines provide additional diagnostic information or trigger actions directly on the host node. ## What functions are currently available? @@ -25,7 +15,7 @@ The following is required to be able to run Functions from Netdata Cloud. | Ipmi-sensors | Readings and status of IPMI sensors. | `ipmi-sensors` | no | [freeipmi](https://github.com/netdata/netdata/tree/master/src/collectors/freeipmi.plugin#readme) | | Mount-points | Disk usage for each mount point, including used and available space, both in terms of percentage and actual bytes, as well as used and available inode counts. | `df` | no | [diskspace](https://github.com/netdata/netdata/tree/master/src/collectors/diskspace.plugin#readme) | | Network-interfaces | Network traffic, packet drop rates, interface states, MTU, speed, and duplex mode for all network interfaces. | `bmon`, `bwm-ng` | no | [proc](https://github.com/netdata/netdata/tree/master/src/collectors/proc.plugin#readme) | -| Processes | Real-time information about the system's resource usage, including CPU utilization, memory consumption, and disk IO for every running process. | `top`, `htop` | yes | [apps](/src/collectors/apps.plugin/README.md) | +| Processes | Real-time information about the system's resource usage, including CPU utilization, memory consumption, and disk IO for every running process. | `top`, `htop` | yes | [apps](/src/collectors/apps.plugin/README.md) | | Systemd-journal | Viewing, exploring and analyzing systemd journal logs. | `journalctl` | yes | [systemd-journal](https://github.com/netdata/netdata/tree/master/src/collectors/systemd-journal.plugin#readme) | | Systemd-list-units | Information about all systemd units, including their active state, description, whether or not they are enabled, and more. | `systemctl list-units` | yes | [systemd-journal](https://github.com/netdata/netdata/tree/master/src/collectors/systemd-journal.plugin#readme) | | Systemd-services | System resource utilization for all running systemd services: CPU, memory, and disk IO. | `systemd-cgtop` | no | [cgroups](https://github.com/netdata/netdata/tree/master/src/collectors/cgroups.plugin#readme) | @@ -34,13 +24,11 @@ The following is required to be able to run Functions from Netdata Cloud. ## How do functions work with streaming? -Via streaming, the definitions of functions are transmitted to a parent node, so it knows all the functions available on any children connected to it. If the parent node is the one connected to Netdata Cloud it is capable of triggering the call to the respective children node to run the function. +When streaming is enabled, function definitions propagate from Child nodes to their Parent node. If this Parent node is connected to Netdata Cloud, it can trigger function execution on any of its connected Child nodes. ## Why are some functions only available on Netdata Cloud? -Since these functions are able to execute routines on the node and due to the potential use cases that they can cover, our concern is to ensure no sensitive information or disruptive actions are exposed through the Agent's API. - -With the communication between the Netdata Agent and Netdata Cloud being through [ACLK](/src/aclk/README.md) this concern is addressed. +Some functions are exclusively available through Netdata Cloud for security reasons. Since functions can execute node-level routines that may access sensitive information, we restrict their exposure through the Agent's API. This security concern is addressed by our [ACLK](/src/aclk/README.md) protocol, which provides secure communication between Netdata Agent and Netdata Cloud. ## Feedback