diff --git a/docs/category-overview-pages/developer-and-contributor-corner.md b/docs/category-overview-pages/developer-and-contributor-corner.md new file mode 100644 index 00000000000000..d4d86382ac0ca8 --- /dev/null +++ b/docs/category-overview-pages/developer-and-contributor-corner.md @@ -0,0 +1,3 @@ +# Developer and Contributor Corner + +In this section of our Documentation you will find more advanced information, suited for developers and contributors alike. \ No newline at end of file diff --git a/docs/category-overview-pages/installation-overview.md b/docs/category-overview-pages/installation-overview.md index e60dd442c09699..703ca26b9bfc4e 100644 --- a/docs/category-overview-pages/installation-overview.md +++ b/docs/category-overview-pages/installation-overview.md @@ -1,7 +1,7 @@ # Installation In this category you can find instructions on all the possible ways you can install Netdata on the -[supported platforms](https://github.com/netdata/netdata/blob/master/packaging/PLATFORM_SUPPORT.md). +[supported platforms](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/versions-and-platforms.md). If this is your first time using Netdata, we recommend that you first start with the [quick installation guide](https://github.com/netdata/netdata/edit/master/packaging/installer/README.md) and then diff --git a/docs/contributing/style-guide.md b/docs/contributing/style-guide.md index b77927a9c1a04b..2487f2eb1cd033 100644 --- a/docs/contributing/style-guide.md +++ b/docs/contributing/style-guide.md @@ -305,7 +305,7 @@ Don't include full paths, beginning from the system's root (`/`), as these might | | | |-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Not recommended | Use `edit-config` to edit Netdata's configuration: `sudo /etc/netdata/edit-config netdata.conf`. | -| **Recommended** | Use `edit-config` to edit Netdata's configuration by first navigating to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory), which is typically at `/etc/netdata`, then running `sudo edit-config netdata.conf`. | +| **Recommended** | Use `edit-config` to edit Netdata's configuration by first navigating to your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/configuration.md#the-netdata-config-directory), which is typically at `/etc/netdata`, then running `sudo edit-config netdata.conf`. | ### `sudo` diff --git a/docs/deployment-guides/README.md b/docs/deployment-guides/README.md new file mode 100644 index 00000000000000..18f5788573f8d8 --- /dev/null +++ b/docs/deployment-guides/README.md @@ -0,0 +1,25 @@ +# Deployment Guides + +Netdata can be used to monitor all kinds of infrastructure, from stand-alone tiny IoT devices to complex hybrid setups combining on-premise and cloud infrastructure, mixing bare-metal servers, virtual machines and containers. + +There are 3 components to structure your Netdata ecosystem: + +1. **Netdata Agents** + + To monitor the physical or virtual nodes of your infrastructure, including all applications and containers running on them. + + Netdata Agents are Open-Source, licensed under GPL v3+. + +2. **Netdata Parents** + + To create [observability centralization points](https://github.com/netdata/netdata/blob/master/docs/observability-centralization-points/README.md) within your infrastructure, to offload Netdata Agents functions from your production systems, to provide high-availability of your data, increased data retention and isolation of your nodes. + + Netdata Parents are implemented using the Netdata Agent software. Any Netdata Agent can be an Agent for a node and a Parent for other Agents, at the same time. + + It is recommended to set up multiple Netdata Parents. They will all seamlessly be integrated by Netdata Cloud into one monitoring solution. + +3. **Netdata Cloud** + + Our SaaS, combining all your infrastructure, all your Netdata Agents and Parents, into one uniform, distributed, scalable, monitoring database, offering advanced data slicing and dicing capabilities, custom dashboards, advanced troubleshooting tools, user management, centralized management of alerts, and more. + +The Netdata Agent is a highly modular software piece, providing data collection via numerous plugins, an in-house crafted time-series database, a query engine, health monitoring and alerts, machine learning and anomaly detection, metrics exporting to third party systems. diff --git a/docs/deployment-guides/deployment-with-centralization-points.md b/docs/deployment-guides/deployment-with-centralization-points.md new file mode 100644 index 00000000000000..b3e2b40dc6fc5d --- /dev/null +++ b/docs/deployment-guides/deployment-with-centralization-points.md @@ -0,0 +1,121 @@ +# Deployment with Centralization Points + +An observability centralization point can centralize both metrics and logs. The sending systems are called Children, while the receiving systems are called a Parents. + +When metrics and logs are centralized, the Children are never queried for metrics and logs. The Netdata Parents have all the data needed to satisfy queries. + +- **Metrics** are centralized by Netdata, with a feature we call **Streaming**. The Parents listen for incoming connections and permit access only to Children that connect to it with the right API key. Children are configured to push their metrics to the Parents and they initiate the connections to do so. + +- **Logs** are centralized with methodologies provided by `systemd-journald`. This involves installing `systemd-journal-remote` on both the Parent and the Children, and configuring the keys required for this communication. + +| Feature | How it works | +|:---------------------------------------------:|:-------------------------------------------------------------------------------------------------------------:| +| Unified infrastructure dashboards for metrics | Yes, at Netdata Cloud | +| Unified infrastructure dashboards for logs | All logs are accessible via the same dashboard at Netdata Cloud, although they are unified per Netdata Parent | +| Centrally configured alerts | Yes, at Netdata Parents | +| Centrally dispatched alert notifications | Yes, at Netdata Cloud | +| Data are exclusively on-prem | Yes, Netdata Cloud queries Netdata Agents to satisfy dashboard queries. | + +A configuration with 2 observability centralization points, looks like this: + +```mermaid +flowchart LR + WEB[["One unified + dashboard + for all nodes"]] + NC(["Netdata Cloud + decides which agents + need to be queried"]) + SA1["Netdata at AWS + A1"] + SA2["Netdata at AWS + A2"] + SAN["Netdata at AWS + AN"] + PA["Netdata Parent A + at AWS + having all metrics & logs + for all Ax nodes"] + SB1["Netdata On-Prem + B1"] + SB2["Netdata On-Prem + B2"] + SBN["Netdata On-Prem + BN"] + PB["Netdata Parent B + On-Prem + having all metrics & logs + for all Bx nodes"] + WEB -->|query| NC -->|query| PA & PB + PA ---|stream| SA1 & SA2 & SAN + PB ---|stream| SB1 & SB2 & SBN +``` + +Netdata Cloud queries the Netdata Parents to provide aggregated dashboard views. + +For alerts, the dispatch of notifications looks like in the following chart: + +```mermaid +flowchart LR + NC(["Netdata Cloud + applies silencing + & user settings"]) + SA1["Netdata at AWS + A1"] + SA2["Netdata at AWS + A2"] + SAN["Netdata at AWS + AN"] + PA["Netdata Parent A + at AWS + having all metrics & logs + for all Ax nodes"] + SB1["Netdata On-Prem + B1"] + SB2["Netdata On-Prem + B2"] + SBN["Netdata On-Prem + BN"] + PB["Netdata Parent B + On-Prem + having all metrics & logs + for all Bx nodes"] + EMAIL{{"e-mail + notifications"}} + MOBILEAPP{{"Netdata Mobile App + notifications"}} + SLACK{{"Slack + notifications"}} + OTHER{{"Other + notifications"}} + PA & PB -->|alert transitions| NC -->|notification| EMAIL & MOBILEAPP & SLACK & OTHER + SA1 & SA2 & SAN ---|stream| PA + SB1 & SB2 & SBN ---|stream| PB +``` + +### Configuration steps for deploying Netdata with Observability Centralization Points + +For Metrics: + +- Install Netdata agents on all systems and the Netdata Parents. + +- Configure `stream.conf` at the Netdata Parents to enable streaming access with an API key. + +- Configure `stream.conf` at the Netdata Children to enable streaming to the configured Netdata Parents. + +For Logs: + +- Install `systemd-journal-remote` on all systems and the Netdata Parents. + +- Configure `systemd-journal-remote` at the Netdata Parents to enable logs reception. + +- Configure `systemd-journal-upload` at the Netdata Children to enable transmission of their logs to the Netdata Parents. + +Optionally: + +- Disable ML, health checks and dashboard access at Netdata Children to save resources and avoid duplicate notifications. + +When using Netdata Cloud: + +- Optionally: disable dashboard access on all Netdata agents (including Netdata Parents). +- Optionally: disable alert notifications on all Netdata agents (including Netdata Parents). diff --git a/docs/deployment-guides/standalone-deployment.md b/docs/deployment-guides/standalone-deployment.md new file mode 100644 index 00000000000000..5baef805a95e6a --- /dev/null +++ b/docs/deployment-guides/standalone-deployment.md @@ -0,0 +1,139 @@ +# Standalone Deployment + +To help our users have a complete experience of Netdata when they install it for the first time, a Netdata Agent with default configuration is a complete monitoring solution out of the box, having all its features enabled and available. + +So, each Netdata agent acts as a standalone monitoring system by default. + +## Standalone agents, without Netdata Cloud + +| Feature | How it works | +|:---------------------------------------------:|:----------------------------------------------------:| +| Unified infrastructure dashboards for metrics | No, each Netdata agent provides its own dashboard | +| Unified infrastructure dashboards for logs | No, each Netdata agent exposes its own logs | +| Centrally configured alerts | No, each Netdata has its own alerts configuration | +| Centrally dispatched alert notifications | No, each Netdata agent sends notifications by itself | +| Data are exclusively on-prem | Yes | + +When using Standalone Netdata agents, each of them offers an API and a dashboard, at its own unique URL, that looks like `http://agent-ip:19999`. + +So, each of the Netdata agents has to be accessed individually and independently of the others: + +```mermaid +flowchart LR + WEB[["Multiple + Independent + Dashboards"]] + S1["Standalone + Netdata + 1"] + S2["Standalone + Netdata + 2"] + SN["Standalone + Netdata + N"] + WEB -->|URL 1| S1 + WEB -->|URL 2| S2 + WEB -->|URL N| SN +``` + +The same is true for alert notifications. Each of the Netdata agents runs its own alerts and sends notifications by itself, according to its configuration: + +```mermaid +flowchart LR + S1["Standalone + Netdata + 1"] + S2["Standalone + Netdata + 2"] + SN["Standalone + Netdata + N"] + EMAIL{{"e-mail + notifications"}} + SLACK{{"Slack + notifications"}} + OTHER{{"Other + notifications"}} + S1 & S2 & SN .-> SLACK + S1 & S2 & SN ---> EMAIL + S1 & S2 & SN ==> OTHER +``` + +### Configuration steps for standalone Netdata agents without Netdata Cloud + +No special configuration needed. + +- Install Netdata agents on all your systems, then access each of them via its own unique URL, that looks like `http://agent-ip:19999/`. + +## Standalone agents, with Netdata Cloud + +| Feature | How it works | +|:---------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| Unified infrastructure dashboards for metrics | Yes, via Netdata Cloud, all charts aggregate metrics from all servers. | +| Unified infrastructure dashboards for logs | All logs are accessible via the same dashboard at Netdata Cloud, although they are not unified (ie. logs from different servers are not multiplexed into a single view) | +| Centrally configured alerts | No, each Netdata has its own alerts configuration | +| Centrally dispatched alert notifications | Yes, via Netdata Cloud | +| Data are exclusively on-prem | Yes, Netdata Cloud queries Netdata Agents to satisfy dashboard queries. | + +By [connecting all Netdata agents to Netdata Cloud](https://github.com/netdata/netdata/blob/master/src/claim/README.md), you can have a unified infrastructure view of all your nodes, with aggregated charts, without configuring [observability centralization points](https://github.com/netdata/netdata/blob/master/docs/observability-centralization-points/README.md). + +```mermaid +flowchart LR + WEB[["One unified + dashboard + for all nodes"]] + NC(["Netdata Cloud + decides which agents + need to be queried"]) + S1["Standalone + Netdata + 1"] + S2["Standalone + Netdata + 2"] + SN["Standalone + Netdata + N"] + WEB -->|queries| NC + NC -->|queries| S1 & S2 & SN +``` + +Similarly for alerts, Netdata Cloud receives all alert transitions from all agents, decides which notifications should be sent and how, applies silencing rules, maintenance windows and based on each Netdata Cloud space and user settings, dispatches notifications: + +```mermaid +flowchart LR + EMAIL{{"e-mail + notifications"}} + MOBILEAPP{{"Netdata Mobile App + notifications"}} + SLACK{{"Slack + notifications"}} + OTHER{{"Other + notifications"}} + NC(["Netdata Cloud + applies silencing + & user settings"]) + S1["Standalone + Netdata + 1"] + S2["Standalone + Netdata + 2"] + SN["Standalone + Netdata + N"] + NC -->|notification| EMAIL & MOBILEAPP & SLACK & OTHER + S1 & S2 & SN -->|alert transition| NC +``` + +> Note that alerts are still triggered by Netdata agents. Netdata Cloud takes care of the notifications only. + +### Configuration steps for standalone Netdata agents with Netdata Cloud + +- Install Netdata agents using the commands given by Netdata Cloud, so that they will be automatically added to your Netdata Cloud space. Otherwise, install Netdata agents and then claim them via the command line or their dashboard. + +- Optionally: disable their direct dashboard access to secure them. + +- Optionally: disable their alert notifications to avoid receiving email notifications directly from them (email notifications are automatically enabled when a working MTA is found on the systems Netdata agents are installed). diff --git a/docs/export/enable-connector.md b/docs/export/enable-connector.md index dd3a55a4bf7b12..f81074c88b2d30 100644 --- a/docs/export/enable-connector.md +++ b/docs/export/enable-connector.md @@ -24,7 +24,7 @@ Once you understand the process of enabling a connector, you can translate that ## Enable the exporting engine Use `edit-config` from your -[Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) +[Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/configuration.md#the-netdata-config-directory) to open `exporting.conf`: ```bash diff --git a/docs/guides/monitor/raspberry-pi-anomaly-detection.md b/docs/guides/monitor/raspberry-pi-anomaly-detection.md index c04643958698e4..e2d28c2d112abc 100644 --- a/docs/guides/monitor/raspberry-pi-anomaly-detection.md +++ b/docs/guides/monitor/raspberry-pi-anomaly-detection.md @@ -24,7 +24,7 @@ Read on to learn all the steps and enable unsupervised anomaly detection on your First make sure Netdata is using Python 3 when it runs Python-based data collectors. Next, open `netdata.conf` using [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) -from within the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). Scroll down to the +from within the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/configuration.md#the-netdata-config-directory). Scroll down to the `[plugin:python.d]` section to pass in the `-ppython3` command option. ```conf diff --git a/docs/metrics-storage-management/enable-streaming.md b/docs/metrics-storage-management/enable-streaming.md index 49c798804a5a71..82443bedf26a62 100644 --- a/docs/metrics-storage-management/enable-streaming.md +++ b/docs/metrics-storage-management/enable-streaming.md @@ -114,7 +114,7 @@ itself while initiating a streaming connection. Copy that into a separate text f > Find out how to [install `uuidgen`](https://command-not-found.com/uuidgen) on your node if you don't already have it. Next, open `stream.conf` using [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) -from within the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). +from within the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/configuration.md#the-netdata-config-directory). ```bash cd /etc/netdata diff --git a/docs/netdata-agent/README.md b/docs/netdata-agent/README.md new file mode 100644 index 00000000000000..faf262fd4451fc --- /dev/null +++ b/docs/netdata-agent/README.md @@ -0,0 +1,84 @@ +# Netdata Agent + +The Netdata Agent is the main building block in a Netdata ecosystem. It is installed on all monitored systems to monitor system components, containers and applications. + +The Netdata Agent is an **observability pipeline in a box** that can either operate standalone, or blend into a bigger pipeline made by more Netdata Agents (Children and Parents). + +## Distributed Observability Pipeline + +The Netdata observability pipeline looks like in the following graph. + +The pipeline is extended by creating Metrics Observability Centralization Points that are linked all together (`from a remote Netdata`, `to a remote Netdata`), so that all Netdata installed become a vast integrated observability pipeline. + +```mermaid +stateDiagram-v2 + classDef userFeature fill:#f00,color:white,font-weight:bold,stroke-width:2px,stroke:yellow + classDef usedByNC fill:#090,color:white,font-weight:bold,stroke-width:2px,stroke:yellow + Local --> Discover + Local: Local Netdata + [*] --> Detect: from a remote Netdata + Others: 3rd party time-series DBs + Detect: Detect Anomalies + Dashboard:::userFeature + Dashboard: Netdata Dashboards + 3rdDashboard:::userFeature + 3rdDashboard: 3rd party Dashboards + Notifications:::userFeature + Notifications: Alert Notifications + Alerts: Alert Transitions + Discover --> Collect + Collect --> Detect + Store: Store + Store: Time-Series Database + Detect --> Store + Store --> Learn + Store --> Check + Store --> Query + Store --> Score + Store --> Stream + Store --> Export + Query --> Visualize + Score --> Visualize + Check --> Alerts + Learn --> Detect: trained ML models + Alerts --> Notifications + Stream --> [*]: to a remote Netdata + Export --> Others + Others --> 3rdDashboard + Visualize --> Dashboard + Score:::usedByNC + Query:::usedByNC + Alerts:::usedByNC +``` + +1. **Discover**: auto-detect metric sources on localhost, auto-discover metric sources on Kubernetes. +2. **Collect**: query data sources to collect metric samples, using the optimal protocol for each data source. 800+ integrations supported, including dozens of native application protocols, OpenMetrics and StatsD. +3. **Detect Anomalies**: use the trained machine learning models for each metric, to detect in real-time if each sample collected is an outlier (an anomaly), or not. +4. **Store**: keep collected samples and their anomaly status, in the time-series database (database mode `dbengine`) or a ring buffer (database modes `ram` and `alloc`). +5. **Learn**: train multiple machine learning models for each metric collected, learning behaviors and patterns for detecting anomalies. +6. **Check**: a health engine, triggering alerts and sending notifications. Netdata comes with hundreds of alert configurations that are automatically attached to metrics when they get collected, detecting errors, common configuration errors and performance issues. +7. **Query**: a query engine for querying time-series data. +8. **Score**: a scoring engine for comparing and correlating metrics. +9. **Stream**: a mechanism to connect Netdata agents and build Metrics Centralization Points (Netdata Parents). +10. **Visualize**: Netdata's fully automated dashboards for all metrics. +11. **Export**: export metric samples to 3rd party time-series databases, enabling the use of 3rd party tools for visualization, like Grafana. + +## Comparison to other observability solutions + +1. **One moving part**: Other monitoring solution require maintaining metrics exporters, time-series databases, visualization engines. Netdata has everything integrated into one package, even when [Metrics Centralization Points](https://github.com/netdata/netdata/blob/master/docs/observability-centralization-points/metrics-centralization-points/README.md) are required, making deployment and maintenance a lot simpler. + +2. **Automation**: Netdata is designed to automate most of the process of setting up and running an observability solution. It is designed to instantly provide comprehensive dashboards and fully automated alerts, with zero configuration. + +3. **High Fidelity Monitoring**: Netdata was born from our need to kill the console for observability. So, it provides metrics and logs in the same granularity and fidelity console tools do, but also comes with tools that go beyond metrics and logs, to provide a holistic view of the monitored infrastructure (e.g. check [Top Monitoring](https://github.com/netdata/netdata/blob/master/docs/cloud/netdata-functions.md)). + +4. **Minimal impact on monitored systems and applications**: Netdata has been designed to have a minimal impact on the monitored systems and their applications. There are [independent studies](https://www.ivanomalavolta.com/files/papers/ICSOC_2023.pdf) reporting that Netdata excels in CPU usage, RAM utilization, Execution Time and the impact Netdata has on monitored applications and containers. + +5. **Energy efficiency**: [University of Amsterdam did a research to find the energy efficiency of monitoring tools](https://twitter.com/IMalavolta/status/1734208439096676680). They tested Netdata, Prometheus, ELK, among other tools. The study concluded that **Netdata is the most energy efficient monitoring tool**. + +## Dashboard Versions + +The Netdata agents (Standalone, Children and Parents) **share the dashboard** of Netdata Cloud. However, when the user is logged-in and the Netdata agent is connected to Netdata Cloud, the following are enabled (which are otherwise disabled): + +1. **Access to Sensitive Data**: Some data, like systemd-journal logs and several [Top Monitoring](https://github.com/netdata/netdata/blob/master/docs/cloud/netdata-functions.md) features expose sensitive data, like IPs, ports, process command lines and more. To access all these when the dashboard is served directly from a Netdata agent, Netdata Cloud is required to verify that the user accessing the dashboard has the required permissions. + +2. **Dynamic Configuration**: Netdata agents are configured via configuration files, manually or through some provisioning system. The latest Netdata includes a feature to allow users change some of the configuration (collectors, alerts) via the dashboard. This feature is only available to users of paid Netdata Cloud plan. diff --git a/docs/netdata-agent/configuration.md b/docs/netdata-agent/configuration.md new file mode 100644 index 00000000000000..85319984312740 --- /dev/null +++ b/docs/netdata-agent/configuration.md @@ -0,0 +1,43 @@ +# Netdata Agent Configuration + +The main Netdata agent configuration is `netdata.conf`. + +## The Netdata config directory + +On most Linux systems, by using our [recommended one-line installation](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md#install-on-linux-with-one-line-installer), the **Netdata config +directory** will be `/etc/netdata/`. The config directory contains several configuration files with the `.conf` extension, a +few directories, and a shell script named `edit-config`. + +> Some operating systems will use `/opt/netdata/etc/netdata/` as the config directory. If you're not sure where yours +> is, navigate to `http://NODE:19999/netdata.conf` in your browser, replacing `NODE` with the IP address or hostname of +> your node, and find the `# config directory = ` setting. The value listed is the config directory for your system. + +All of Netdata's documentation assumes that your config directory is at `/etc/netdata`, and that you're running any scripts from inside that directory. + + +## edit `netdata.conf` + +To edit `netdata.conf`, run this on your terminal: + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` + +Your editor will open. + +## downloading `netdata.conf` + +The running version of `netdata.conf` can be downloaded from a running Netdata agent, at this URL: + +``` +http://agent-ip:19999/netdata.conf +``` + +You can save and use this version, using these commands: + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +curl -ksSLo /tmp/netdata.conf.new http://localhost:19999/netdata.conf && sudo mv -i /tmp/netdata.conf.new netdata.conf +``` + diff --git a/docs/netdata-agent/sizing-netdata-agents/README.md b/docs/netdata-agent/sizing-netdata-agents/README.md new file mode 100644 index 00000000000000..22437c8b9d35b7 --- /dev/null +++ b/docs/netdata-agent/sizing-netdata-agents/README.md @@ -0,0 +1,87 @@ +# Sizing Netdata Agents + +Netdata automatically adjusts its resources utilization based on the workload offered to it. + +This is a map of how Netdata **features impact resources utilization**: + +| Feature | CPU | RAM | Disk I/O | Disk Space | Retention | Bandwidth | +|-----------------------------:|:---:|:---:|:--------:|:----------:|:---------:|:---------:| +| Metrics collected | X | X | X | X | X | - | +| Samples collection frequency | X | - | X | X | X | - | +| Database mode and tiers | - | X | X | X | X | - | +| Machine learning | X | X | - | - | - | - | +| Streaming | X | X | - | - | - | X | + +1. **Metrics collected**: The number of metrics collected affects almost every aspect of resources utilization. + + When you need to lower the resources used by Netdata, this is an obvious first step. + +2. **Samples collection frequency**: By default Netdata collects metrics with 1-second granularity, unless the metrics collected are not updated that frequently, in which case Netdata collects them at the frequency they are updated. This is controlled per data collection job. + + Lowering the data collection frequency from every-second to every-2-seconds, will make Netdata use half the CPU utilization. So, CPU utilization is proportional to the data collection frequency. + +3. **Database Mode and Tiers**: By default Netdata stores metrics in 3 database tiers: high-resolution, mid-resolution, low-resolution. All database tiers are updated in parallel during data collection, and depending on the query duration Netdata may consult one or more tiers to optimize the resources required to satisfy it. + + The number of database tiers affects the memory requirements of Netdata. Going from 3-tiers to 1-tier, will make Netdata use half the memory. Of course metrics retention will also be limited to 1 tier. + +4. **Machine Learning**: Byt default Netdata trains multiple machine learning models for every metric collected, to learn its behavior and detect anomalies. Machine Learning is a CPU intensive process and affects the overall CPU utilization of Netdata. + +5. **Streaming Compression**: When using Netdata in Parent-Child configurations to create Metrics Centralization Points, the compression algorithm used greatly affects CPU utilization and bandwidth consumption. + + Netdata supports multiple streaming compressions algorithms, allowing the optimization of either CPU utilization or Network Bandwidth. The default algorithm `zstd` provides the best balance among them. + +## Minimizing the resources used by Netdata Agents + +To minimize the resources used by Netdata Agents, we suggest to configure Netdata Parents for centralizing metric samples, and disabling most of the features on Netdata Children. This will provide minimal resources utilization at the edge, while all the features of Netdata are available at the Netdata Parents. + +The following guides provide instructions on how to do this. + +## Maximizing the scale of Netdata Parents + +Netdata Parents automatically size resource utilization based on the workload they receive. The only possible option for improving query performance is to dedicate more RAM to them, by increasing their caches efficiency. + +Check [RAM Requirements](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md) for more information. + +## Innovations Netdata has for optimal performance and scalability + +The following are some of the innovations the open-source Netdata agent has, that contribute to its excellent performance, and scalability. + +1. **Minimal disk I/O** + + When Netdata saves data on-disk, it stores them at their final place, eliminating the need to reorganize this data. + + Netdata is organizing its data structures in such a way that samples are committed to disk as evenly as possible across time, without affecting its memory requirements. + + Furthermore, Netdata Agents use direct-I/O for saving and loading metric samples. This prevents Netdata from polluting system caches with metric data. Netdata maintains its own caches for this data. + + All these features make Netdata an nice partner and a polite citizen for production applications running on the same systems Netdata runs. + +2. **4 bytes per sample uncompressed** + + To achieve optimal memory and disk footprint, Netdata uses a custom 32-bit floating point number we have developed. This floating point number is used to store the samples collected, together with their anomaly bit. The database of Netdata is fixed-step, so it has predefined slots for every sample, allowing Netdata to store timestamps once every several hundreds samples, minimizing both its memory requirements and the disk footprint. + +3. **Query priorities** + + Alerting, Machine Learning, Streaming and Replication, rely on metric queries. When multiple queries are running in parallel, Netdata assigns priorities to all of them, favoring interactive queries over background tasks. This means that queries do not compete equally for resources. Machine learning or replication may slow down when interactive queries are running and the system starves for resources. + +4. **A pointer per label** + + Apart from metric samples, metric labels and their cardinality is the biggest memory consumer, especially in highly ephemeral environments, like kubernetes. Netdata uses a single pointer for any label key-value pair that is reused. Keys and values are also deduplicated, providing the best possible memory footprint for metric labels. + +5. **Streaming Protocol** + + The streaming protocol of Netdata allows minimizing the resources consumed on production systems by delegating features of to other Netdata agents (Parents), without compromising monitoring fidelity or responsiveness, enabling the creation of a highly distributed observability platform. + +## Netdata vs Prometheus + +Netdata outperforms Prometheus in every aspect. -35% CPU Utilization, -49% RAM usage, -12% network bandwidth, -98% disk I/O, -75% in disk footprint for high resolution data, while providing more than a year of retention. + +Read the [full comparison here](https://blog.netdata.cloud/netdata-vs-prometheus-performance-analysis/). + +## Energy Efficiency + +University of Amsterdam contacted a research on the impact monitoring systems have on docker based systems. + +The study found that Netdata excels in CPU utilization, RAM usage, Execution Time and concluded that **Netdata is the most energy efficient tool**. + +Read the [full study here](https://www.ivanomalavolta.com/files/papers/ICSOC_2023.pdf). diff --git a/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md b/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md new file mode 100644 index 00000000000000..092c8da169cb54 --- /dev/null +++ b/docs/netdata-agent/sizing-netdata-agents/bandwidth-requirements.md @@ -0,0 +1,47 @@ +# Bandwidth Requirements + +## On Production Systems, Standalone Netdata + +Standalone Netdata may use network bandwidth under the following conditions: + +1. You configured data collection jobs that are fetching data from remote systems. There is no such jobs enabled by default. +2. You use the dashboard of the Netdata. +3. [Netdata Cloud communication](#netdata-cloud-communication) (see below). + +## On Metrics Centralization Points, between Netdata Children & Parents + +Netdata supports multiple compression algorithms for streaming communication. Netdata Children offer all their compression algorithms when connecting to a Netdata Parent, and the Netdata Parent decides which one to use based on algorithms availability and user configuration. + +| Algorithm | Best for | +|:---------:|:-----------------------------------------------------------------------------------------------------------------------------------:| +| `zstd` | The best balance between CPU utilization and compression efficiency. This is the default. | +| `lz4` | The fastest of the algorithms. Use this when CPU utilization is more important than bandwidth. | +| `gzip` | The best compression efficiency, at the expense of CPU utilization. Use this when bandwidth is more important than CPU utilization. | +| `brotli` | The most CPU intensive algorithm, providing the best compression. | + +The expected bandwidth consumption using `zstd` for 1 million samples per second is 84 Mbps, or 10.5 MiB/s. + +The order compression algorithms is selected is configured in `stream.conf`, per `[API KEY]`, like this: + +``` + compression algorithms order = zstd lz4 brotli gzip +``` + +The first available algorithm on both the Netdata Child and the Netdata Parent, from left to right, is chosen. + +Compression can also be disabled in `stream.conf` at either Netdata Children or Netdata Parents. + +## Netdata Cloud Communication + +When Netdata Agents connect to Netdata Cloud, they communicate metadata of the metrics being collected, but they do not stream the samples collected for each metric. + +The information transferred to Netdata Cloud is: + +1. Information and **metadata about the system itself**, like its hostname, architecture, virtualization technologies used and generally labels associated with the system. +2. Information about the **running data collection plugins, modules and jobs**. +3. Information about the **metrics available and their retention**. +4. Information about the **configured alerts and their transitions**. + +This is not a constant stream of information. Netdata Agents update Netdata Cloud only about status changes on all the above (e.g. an alert being triggered, or a metric stopped being collected). So, there is an initial handshake and exchange of information when Netdata starts, and then there only updates when required. + +Of course, when you view Netdata Cloud dashboards that need to query the database a Netdata agent maintains, this query is forwarded to an agent that can satisfy it. This means that Netdata Cloud receives metric samples only when a user is accessing a dashboard and the samples transferred are usually aggregations to allow rendering the dashboards. diff --git a/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md b/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md new file mode 100644 index 00000000000000..021a35fb297860 --- /dev/null +++ b/docs/netdata-agent/sizing-netdata-agents/cpu-requirements.md @@ -0,0 +1,65 @@ +# CPU Requirements + +Netdata's CPU consumption is affected by the following factors: + +1. The number of metrics collected +2. The frequency metrics are collected +3. Machine Learning +4. Streaming compression (streaming of metrics to Netdata Parents) +5. Database Mode + +## On Production Systems, Netdata Children + +On production systems, where Netdata is running with default settings, monitoring the system it is installed at and its containers and applications, CPU utilization should usually be about 1% to 5% of a single CPU core. + +This includes 3 database tiers, machine learning, per-second data collection, alerts, and streaming to a Netdata Parent. + +## On Metrics Centralization Points, Netdata Parents + +On Metrics Centralization Points, Netdata Parents running on modern server hardware, we **estimate CPU utilization per million of samples collected per second**: + +| Feature | Depends On | Expected Utilization | Key Reasons | +|:-----------------:|:---------------------------------------------------:|:----------------------------------------------------------------:|:-------------------------------------------------------------------------:| +| Metrics Ingestion | Number of samples received per second | 2 CPU cores per million of samples per second | Decompress and decode received messages, update database. | +| Metrics re-streaming| Number of samples resent per second | 2 CPU cores per million of samples per second | Encode and compress messages towards Netdata Parent. | +| Machine Learning | Number of unique time-series concurrently collected | 2 CPU cores per million of unique metrics concurrently collected | Train machine learning models, query existing models to detect anomalies. | + +We recommend keeping the total CPU utilization below 60% when a Netdata Parent is steadily ingesting metrics, training machine learning models and running health checks. This will leave enough CPU resources available for queries. + +## I want to minimize CPU utilization. What should I do? + +You can control Netdata's CPU utilization with these parameters: + +1. **Data collection frequency**: Going from per-second metrics to every-2-seconds metrics will half the CPU utilization of Netdata. +2. **Number of metrics collected**: Netdata by default collects every metric available on the systems it runs. Review the metrics collected and disable data collection plugins and modules not needed. +3. **Machine Learning**: Disable machine learning to save CPU cycles. +4. **Number of database tiers**: Netdata updates database tiers in parallel, during data collection. This affects both CPU utilization and memory requirements. +5. **Database Mode**: The default database mode is `dbengine`, which compresses and commits data to disk. If you have a Netdata Parent where metrics are aggregated and saved to disk and there is a reliable connection between the Netdata you want to optimize and its Parent, switch to database mode `ram` or `alloc`. This disables saving to disk, so your Netdata will also not use any disk I/O. + +## I see increased CPU consumption when a busy Netdata Parent starts, why? + +When a Netdata Parent starts and Netdata children get connected to it, there are several operations that temporarily affect CPU utilization, network bandwidth and disk I/O. + +The general flow looks like this: + +1. **Back-filling of higher tiers**: Usually this means calculating the aggregates of the last hour of `tier2` and of the last minute of `tier1`, ensuring that higher tiers reflect all the information `tier0` has. If Netdata was stopped abnormally (e.g. due to a system failure or crash), higher tiers may have to be back-filled for longer durations. +2. **Metadata synchronization**: The metadata of all metrics each Netdata Child maintains are negotiated between the Child and the Parent and are synchronized. +3. **Replication**: If the Parent is missing samples the Child has, these samples are transferred to the Parent before transferring new samples. +4. Once all these finish, the normal **streaming of new metric samples** starts. +5. At the same time, **machine learning** initializes, loads saved trained models and prepares anomaly detection. +6. After a few moments the **health engine starts checking metrics** for triggering alerts. + +The above process is per metric. So, while one metric back-fills, another replicates and a third one streams. + +At the same time: + +- the compression algorithm learns the patterns of the data exchanged and optimizes its dictionaries for optimal compression and CPU utilization, +- the database engine adjusts the page size of each metric, so that samples are committed to disk as evenly as possible across time. + +So, when looking for the "steady CPU consumption during ingestion" of a busy Netdata Parent, we recommend to let it stabilize for a few hours before checking. + +Keep in mind that Netdata has been designed so that even if during the initialization phase and the connection of hundreds of Netdata Children the system lacks CPU resources, the Netdata Parent will complete all the operations and eventually enter a steady CPU consumption during ingestion, without affecting the quality of the metrics stored. So, it is ok if during initialization of a busy Netdata Parent, CPU consumption spikes to 100%. + +Important: the above initialization process is not such intense when new nodes get connected to a Netdata Parent for the first time (e.g. ephemeral nodes), since several of the steps involved are not required. + +Especially for the cases where children disconnect and reconnect to the Parent due to network related issues (i.e. both the Netdata Child and the Netdata Parent have not been restarted and less than 1 hour has passed since the last disconnection), the re-negotiation phase is minimal and metrics are instantly entering the normal streaming phase. diff --git a/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md b/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md new file mode 100644 index 00000000000000..0625ed6ed378d9 --- /dev/null +++ b/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md @@ -0,0 +1,131 @@ +# Disk Requirements & Retention + +## Database Modes and Tiers + +Netdata comes with 3 database modes: + +1. `dbengine`: the default high-performance multi-tier database of Netdata. Metric samples is cached in memory and saved to disk in multiple tiers, with compression. +2. `ram`: metric samples are stored in ring buffers in memory, with increments of 1024 samples. Metric samples are not committed to disk. Kernel-Same-Page (KSM) can be used to deduplicate Netdata's memory. +3. `alloc`: metric samples are stored in ring buffers in memory, with flexible increments. Metric samples are not committed to disk. + +## `ram` and `alloc` + +Modes `ram` and `alloc` can help when Netdata should not introduce any disk I/O at all. In both of these modes, metric samples exist only in memory, and only while they are collected. + +When Netdata is configured to stream its metrics to a Metrics Observability Centralization Point (a Netdata Parent), metric samples are forwarded in real-time to that Netdata Parent. The ring buffers available in these modes is used to cache the collected samples for some time, in case there are network issues, or the Netdata Parent is restarted for maintenance. + +The memory required per sample in these modes, is 4 bytes: + +- `ram` mode uses `mmap()` behind the scene, and can be incremented in steps of 1024 samples (4KiB). Mode `ram` allows the use of the Linux kernel memory dedupper (Kernel-Same-Page or KSM) to deduplicate Netdata ring buffers and save memory. +- `alloc` mode can be sized for any number of samples per metric. KSM cannot be used in this mode. + +To configure database mode `ram` or `alloc`, in `netdata.conf`, set the following: + +- `[db].mode` to either `ram` or `alloc`. +- `[db].retention` to the number of samples the ring buffers should maintain. For `ram` if the value set is not a multiple of 1024, the next multiple of 1024 will be used. + +## `dbengine` + +`dbengine` supports up to 5 tiers. By default, 3 tiers are used, like this: + +| Tier | Resolution | Uncompressed Sample Size | +|:--------:|:--------------------------------------------------------------------------------------------:|:------------------------:| +| `tier0` | native resolution (metrics collected per-second as stored per-second) | 4 bytes | +| `tier1` | 60 iterations of `tier0`, so when metrics are collected per-second, this tier is per-minute. | 16 bytes | +| `tier2` | 60 iterations of `tier1`, so when metrics are collected per second, this tier is per-hour. | 16 bytes | + +Data are saved to disk compressed, so the actual size on disk varies depending on compression efficiency. + +`dbegnine` tiers are overlapping, so higher tiers include a down-sampled version of the samples in lower tiers: + +```mermaid +gantt + dateFormat YYYY-MM-DD + tickInterval 1week + axisFormat + todayMarker off + tier0, 14d :a1, 2023-12-24, 7d + tier1, 60d :a2, 2023-12-01, 30d + tier2, 365d :a3, 2023-11-02, 59d +``` + +## Disk Space and Metrics Retention + +You can find information about the current disk utilization of a Netdata Parent, at . The output of this endpoint is like this: + +```json +{ + // more information about the agent + // near the end: + "db_size": [ + { + "tier": 0, + "disk_used": 1677528462156, + "disk_max": 1677721600000, + "disk_percent": 99.9884881, + "from": 1706201952, + "to": 1707401946, + "retention": 1199994, + "expected_retention": 1200132, + "currently_collected_metrics": 2198777 + }, + { + "tier": 1, + "disk_used": 838123468064, + "disk_max": 838860800000, + "disk_percent": 99.9121032, + "from": 1702885800, + "to": 1707401946, + "retention": 4516146, + "expected_retention": 4520119, + "currently_collected_metrics": 2198777 + }, + { + "tier": 2, + "disk_used": 334329683032, + "disk_max": 419430400000, + "disk_percent": 79.710408, + "from": 1679670000, + "to": 1707401946, + "retention": 27731946, + "expected_retention": 34790871, + "currently_collected_metrics": 2198777 + } + ] +} +``` + +In this example: + +- `tier` is the database tier. +- `disk_used` is the currently used disk space in bytes. +- `disk_max` is the configured max disk space in bytes. +- `disk_percent` is the current disk space utilization for this tier. +- `from` is the first (oldest) timestamp in the database for this tier. +- `to` is the latest (newest) timestamp in the database for this tier. +- `retention` is the current retention of the database for this tier, in seconds (divide by 3600 for hours, divide by 86400 for days). +- `expected_retention` is the expected retention in seconds when `disk_percent` will be 100 (divide by 3600 for hours, divide by 86400 for days). +- `currently_collected_metrics` is the number of unique time-series currently being collected for this tier. + +The estimated number of samples on each tier can be calculated as follows: + +``` +estimasted number of samples = retention / sample duration * currently_collected_metrics +``` + +So, for our example above: + +| Tier | Sample Duration (seconds) | Estimated Number of Samples | Disk Space Used | Current Retention (days) | Expected Retention (days) | Bytes Per Sample | +|:-------:|:-------------------------:|:---------------------------:|:---------------:|:------------------------:|:-------------------------:|:----------------:| +| `tier0` | 1 | 2.64 trillion samples | 1.56 TiB | 13.8 | 13.9 | 0.64 | +| `tier1` | 60 | 165.5 billion samples | 780 GiB | 52.2 | 52.3 | 5.01 | +| `tier2` | 3600 | 16.9 billion samples | 311 GiB | 320.9 | 402.7 | 19.73 | + +Note: as you can see in this example, the disk footprint per sample of `tier2` is bigger than the uncompressed sample size (19.73 bytes vs 16 bytes). This is due to the fact that samples are organized into pages and pages into extents. When Netdata is restarted frequently, it saves all data prematurely, before filling up entire pages and extents, leading to increased overheads per sample. + +To configure retention, in `netdata.conf`, set the following: + +- `[db].mode` to `dbengine`. +- `[db].dbengine multihost disk space MB`, this is the max disk size for `tier0`. The default is 256MiB. +- `[db].dbengine tier 1 multihost disk space MB`, this is the max disk space for `tier1`. The default is 50% of `tier0`. +- `[db].dbengine tier 2 multihost disk space MB`, this is the max disk space for `tier2`. The default is 50% of `tier1`. diff --git a/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md b/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md new file mode 100644 index 00000000000000..159c979a9b1d15 --- /dev/null +++ b/docs/netdata-agent/sizing-netdata-agents/ram-requirements.md @@ -0,0 +1,60 @@ +# RAM Requirements + +With default configuration about database tiers, Netdata should need about 16KiB per unique metric collected, independently of the data collection frequency. + +Netdata supports memory ballooning and automatically sizes and limits the memory used, based on the metrics concurrently being collected. + +## On Production Systems, Netdata Children + +With default settings, Netdata should run with 100MB to 200MB of RAM, depending on the number of metrics being collected. + +This number can be lowered by limiting the number of database tier or switching database modes. For more information check [Disk Requirements and Retention](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md). + +## On Metrics Centralization Points, Netdata Parents + +The general formula, with the default configuration of database tiers, is: + +``` +memory = UNIQUE_METRICS x 16KiB + CONFIGURED_CACHES +``` + +The default `CONFIGURED_CACHES` is 32MiB. + +For 1 million concurrently collected time-series (independently of their data collection frequency), the memory required is: + +``` +UNIQUE_METRICS = 1000000 +CONFIGURED_CACHES = 32MiB + +(UNIQUE_METRICS * 16KiB / 1024 in MiB) + CONFIGURED_CACHES = +( 1000000 * 16KiB / 1024 in MiB) + 32 MiB = +15657 MiB = +about 16 GiB +``` + +There are 2 cache sizes that can be configured in `netdata.conf`: + +1. `[db].dbengine page cache size MB`: this is the main cache that keeps metrics data into memory. When data are not found in it, the extent cache is consulted, and if not found in that either, they are loaded from disk. +2. `[db].dbengine extent cache size MB`: this is the compressed extent cache. It keeps in memory compressed data blocks, as they appear on disk, to avoid reading them again. Data found in the extend cache but not in the main cache have to be uncompressed to be queried. + +Both of them are dynamically adjusted to use some of the total memory computed above. The configuration in `netdata.conf` allows providing additional memory to them, increasing their caching efficiency. + +## I have a Netdata Parent that is also a systemd-journal logs centralization point, what should I know? + +Logs usually require significantly more disk space and I/O bandwidth than metrics. For optimal performance we recommend to store metrics and logs on separate, independent disks. + +Netdata uses direct-I/O for its database, so that it does not pollute the system caches with its own data. We want Netdata to be a nice citizen when it runs side-by-side with production applications, so this was required to guarantee that Netdata does not affect the operation of databases or other sensitive applications running on the same servers. + +To optimize disk I/O, Netdata maintains its own private caches. The default settings of these caches are automatically adjusted to the minimum required size for acceptable metrics query performance. + +`systemd-journal` on the other hand, relies on operating system caches for improving the query performance of logs. When the system lacks free memory, querying logs leads to increased disk I/O. + +If you are experiencing slow responses and increased disk reads when metrics queries run, we suggest to dedicate some more RAM to Netdata. + +We frequently see that the following strategy gives best results: + +1. Start the Netdata Parent, send all the load you expect it to have and let it stabilize for a few hours. Netdata will now use the minimum memory it believes is required for smooth operation. +2. Check the available system memory. +3. Set the page cache in `netdata.conf` to use 1/3 of the available memory. + +This will allow Netdata queries to have more caches, while leaving plenty of available memory of logs and the operating system. diff --git a/docs/netdata-agent/versions-and-platforms.md b/docs/netdata-agent/versions-and-platforms.md new file mode 100644 index 00000000000000..787874d6ea128f --- /dev/null +++ b/docs/netdata-agent/versions-and-platforms.md @@ -0,0 +1,70 @@ +# Netdata Agent Versions & Platforms + +Netdata is evolving rapidly and new features are added at a constant pace. Therefore we have frequent release cadence to deliver all these features to use as soon as possible. + +Netdata Agents are available in 2 versions: + +| Release Channel | Release Frequency | Support Policy & Features | Support Duration | Backwards Compatibility | +|:---------------:|:---------------------------------------------:|:---------------------------------------------------------:|:----------------------------------------:|:---------------------------------------------------------------------------------:| +| Stable | At most once per month, usually every 45 days | Receiving bug fixes and security updates between releases | Up to the 2nd stable release after them | Previous configuration semantics and data are supported by newer releases | +| Nightly | Every night at 00:00 UTC | Latest pre-released features | Up to the 2nd nightly release after them | Configuration and data of unreleased features may change between nightly releases | + +> "Support Duration" defines the time we consider the release as actively used by users in production systems, so that all features of Netdata should be working like the day they were released. However, after the latest release, previous releases stop receiving bug fixes and security updates. All users are advised to update to the latest release to get the latest bug fixes. + +## Binary Distribution Packages + +Binary distribution packages are provided by Netdata, via CI integration, for the following platforms and architectures: + +| Platform | Platform Versions | Released Packages Architecture | Format | +|:-----------------------:|:--------------------------------:|:------------------------------------------------:|:------------:| +| Docker under Linux | 19.03 and later | `x86_64`, `i386`, `ARMv7`, `AArch64`, `POWER8+` | docker image | +| Static Builds | - | `x86_64`, `ARMv6`, `ARMv7`, `AArch64`, `POWER8+` | .gz.run | +| Alma Linux | 8.x, 9.x | `x86_64`, `AArch64` | RPM | +| Amazon Linux | 2, 2023 | `x86_64`, `AArch64` | RPM | +| Centos | 7.x | `x86_64` | RPM | +| Debian | 10.x, 11.x, 12.x | `x86_64`, `i386`, `ARMv7`, `AArch64` | DEB | +| Fedora | 37, 38, 39 | `x86_64`, `AArch64` | RPM | +| OpenSUSE | Leap 15.4, Leap 15.5, Tumbleweed | `x86_64`, `AArch64` | RPM | +| Oracle Linux | 8.x, 9.x | `x86_64`, `AArch64` | RPM | +| Redhat Enterprise Linux | 7.x | `x86_64` | RPM | +| Redhat Enterprise Linux | 8.x, 9.x | `x86_64`, `AArch64` | RPM | +| Ubuntu | 20.04, 22.04, 23.10 | `x86_64`, `i386`, `ARMv7` | DEB | + +> IMPORTANT: Linux distributions frequently provide binary packages of Netdata. However, the packages you will find at the distributions' repositories may be outdated, incomplete, missing significant features or completely broken. We recommend to use the packages we provide. + +## Third party Supported Binary Packages + +The following distributions always provide the latest stable version of Netdata: + +| Platform | Platform Versions | Released Packages Architecture | +|:----------:|:-----------------:|:------------------------------------:| +| Arch Linux | Latest | All the Arch supported architectures | +| MacOS Brew | Latest | All the Brew supported architectures | + + +## Builds from Source + +We guarantee Netdata builds from source for the platforms we provide automated binary packages. These platforms are automatically checked via our CI, and fixes are always applied to allow merging new code into the nightly versions. + +The following builds from source should usually work, although we don't regularly monitor if there are issues: + +| Platform | Platform Versions | +|:-----------------------------------:|:--------------------------:| +| Linux Distributions | Latest unreleased versions | +| FreeBSD and derivatives | 13-STABLE | +| Gentoo and derivatives | Latest | +| Arch Linux and derivatives | latest from AUR | +| MacOS | 11, 12, 13 | +| Linux under Microsoft Windows (WSL) | Latest | + +## Static Builds and Unsupported Linux Versions + +The static builds of Netdata can be used on any Linux platform of the supported architectures. The only requirement these static builds have is a working Linux kernel, any version. Everything else required for Netdata to run, is inside the package itself. + +Static builds usually miss certain features that require operating-system support and cannot be provided in a generic way. These features include: + +- IPMI hardware sensors support +- systemd-journal features +- eBPF related features + +When platforms are removed from the [Binary Distribution Packages](https://github.com/netdata/netdata/blob/master/packaging/makeself/README.md) list, they default to install or update Netdata to a static build. This may mean that after platforms become EOL, Netdata on them may lose some of its features. We recommend to upgrade the operating system before it becomes EOL, to continue using all the features of Netdata. diff --git a/docs/netdata-cloud/README.md b/docs/netdata-cloud/README.md new file mode 100644 index 00000000000000..acf8e42fa7a4dd --- /dev/null +++ b/docs/netdata-cloud/README.md @@ -0,0 +1,134 @@ +# Netdata Cloud + +Netdata Cloud is a service that complements Netdata installations. It is a key component in achieving optimal cost structure for large scale observability. + +Technically, Netdata Cloud is a thin control plane that allows the Netdata ecosystem to be a virtually unlimited scalable and flexible observability pipeline. With Netdata Cloud, this observability pipeline can span multiple teams, cloud providers, data centers and services, while remaining a uniform and highly integrated infrastructure, providing real-time and high-fidelity insights. + +```mermaid +flowchart TB + NC("☁️ Netdata Cloud + access from anywhere, + horizontal scalability, + role based access, + custom dashboards, + central notifications") + Users[["✨ Unified Dashboards + across the infrastructure, + multi-cloud, hybrid-cloud"]] + Notifications["🔔 Alert Notifications + Slack, e-mail, Mobile App, + PagerDuty, and more"] + Users <--> NC + NC -->|deduplicated| Notifications + subgraph On-Prem Infrastructure + direction TB + Agents("🌎 Netdata Agents + Standalone, + Children, Parents + (possibly overlapping)") + TimeSeries[("Time-Series + metric samples + database")] + PrivateAgents("🔒 Private + Netdata Agents") + Agents <--> TimeSeries + Agents ---|stream| PrivateAgents + end + NC <-->|secure connection| Agents +``` + +Netdata Cloud provides the following features, on top of what the Netdata agents already provide: + +1. **Horizontal scalability**: Netdata Cloud allows scaling the observability infrastructure horizontally, by adding more independent Netdata Parents and Children. It can aggregate such, otherwise independent, observability islands into one uniform and integrated infrastructure. + + Netdata Cloud is a fundamental component for achieving an optimal cost structure and flexibility, in structuring observability the way that is best suited for each case. + +2. **Role Based Access Control (RBAC)**: Netdata Cloud has all the mechanisms for user-management and access control. It allows assigning all users a role, segmenting the infrastructure into rooms, and associating rooms with roles and users. + +3. **Access from anywhere**: Netdata agents are installed on-prem and this is where all your data are always stored. Netdata Cloud allows querying all the Netdata agents (Standalone, Children and Parents) in real-time when dashboards are accessed via Netdata Cloud. + + This enables a much simpler access control, eliminating the complexities of setting up VPNs to access observability, and the bandwidth costs for centralizing all metrics to one place. + +4. **Central dispatch of alert notifications**: Netdata Cloud allows controlling the dispatch of alert notifications centrally. By default, all Netdata agents (Standalone, Children and Parents) send their own notifications. This becomes increasingly complex as the infrastructure grows. So, Netdata Cloud steps in to simplify this process and provide central control of all notifications. + + Netdata Cloud also enables the use of the **Netdata Mobile App** offering mobile push notifications for all users in commercial plans. + +5. **Custom Dashboards**: Netdata Cloud enables the creation, storage and sharing custom dashboards. + + Custom dashboards are created directly from the UI, without the need for learning a query language. Netdata Cloud provides all the APIs to the Netdata dashboards to store, browse and retrieve custom dashboards created by all users. + +6. **Advanced Customization**: Netdata Cloud provides all the APIs for the dashboard to have different default settings per space, per room and per user, allowing administrators and users to customize the Netdata dashboards and charts the way they see fit. + +## Data Exposed to Netdata Cloud + +Netdata is thin layer of top of Netdata agents. It does not receive the samples collected, or the logs Netdata agents maintain. + +This is a key design decision for Netdata. If we were centralizing metric samples and logs, Netdata would have the same constrains and cost structure other observability solutions have, and we would be forced to lower metrics resolution, filter out metrics and eventually increase significantly the cost of observability. + +Instead, Netdata Cloud receives and stores only metadata related to the metrics collected, such as the nodes collecting metrics and their labels, the metric names, their labels and their retention, the data collection plugins and modules running, the configured alerts and their transitions. + +This information is a small fraction of the total information maintained by Netdata agents, allowing Netdata Cloud to remain high-resolution, high-fidelity and real-time, while being able to: + +- dispatch alerts centrally for all alert transitions. +- know which Netdata agents to query when users view the dashboards. + +Metric samples and logs are transferred via Netdata Cloud to your Web Browser, only when you view them via Netdata Cloud. And even then, Netdata Cloud does not store this information. It only aggregates the responses of multiple Netdata agents to a single response for your web browser to visualize. + +## High-Availability + +You can subscribe to Netdata Cloud updates at the [Netdata Cloud Status](https://status.netdata.cloud/) page. + +Netdata Cloud is a highly available, auto-scalable solution, however being a monitoring solution, we need to ensure dashboards are accessible during crisis. + +Netdata agents provide the same dashboard Netdata Cloud provides, with the following limitations: + +1. Netdata agents (Children and Parents) dashboards are limited to their databases, while on Netdata Cloud the dashboard presents the entire infrastructure, from all Netdata agents connected to it. + +2. When you are not logged-in or the agent is not connected to Netdata Cloud, certain features of the Netdata agent dashboard will not be available. + + When you are logged-in and the agent is connected to Netdata Cloud, the agent dashboard has the same functionality as Netdata Cloud. + +To ensure dashboard high availability, Netdata agent dashboards are available by directly accessing them, even when the connectivity between Children and Parents or Netdata Cloud faces issues. This allows the use of the individual Netdata agents' dashboards during crisis, at different levels of aggregation. + +## Fidelity and Insights + +Netdata Cloud queries Netdata agents, so it provides exactly the same fidelity and insights Netdata agents provide. Dashboards have the same resolution, the same number of metrics, exactly the same data. + +## Performance + +The Netdata agent and Netdata Cloud have similar query performance, but there are additional network latencies involved when the dashboards are viewed via Netdata Cloud. + +Accessing Netdata agents on the same LAN has marginal network latency and their response time is only affected by the queries. However, accessing the same Netdata agents via Netdata Cloud has a bigger network round-trip time, that looks like this: + +1. Your web browser makes a request to Netdata Cloud. +2. Netdata Cloud sends the request to your Netdata agents. If multiple Netdata agents are involved, they are queried in parallel. +3. Netdata Cloud receives their responses and aggregates them into a single response. +4. Netdata Cloud replies to your web browser. + +If you are sitting on the same LAN as the Netdata agents, the latency will be 2 times the round-trip network latency between this LAN and Netdata Cloud. + +However, when there are multiple Netdata agents involved, the queries will be faster compared to a monitoring solution that has one centralization point. Netdata Cloud splits each query into multiple parts and each of the Netdata agents involved will only perform a small part of the original query. So, when querying a large infrastructure, you enjoy the performance of the combined power of all your Netdata agents, which is usually quite higher than any single-centralization-point monitoring solution. + +## Does Netdata Cloud require Observability Centralization Points? + +No. Any or all Netdata agents can be connected to Netdata Cloud. + +We recommend to create [observability centralization points](https://github.com/netdata/netdata/blob/master/docs/observability-centralization-points/README.md), as required for operational efficiency (ephemeral nodes, teams or services isolation, central control of alerts, production systems performance), security policies (internet isolation), or cost optimization (use existing capacities before allocating new ones). + +We suggest to review the [Best Practices for Observability Centralization Points](https://github.com/netdata/netdata/blob/master/docs/observability-centralization-points/best-practices.md). + +## When I have Netdata Parents, do I need to connect Netdata Children to Netdata Cloud too? + +No, it is not needed, but it provides high-availability. + +When Netdata Parents are connected to Netdata Cloud, all their Netdata Children are available, via these Parents. + +When multiple Netdata Parents maintain a database for the same Netdata Children (e.g. clustered Parents, or Parents and Grandparents), Netdata Cloud is able to detect the unique nodes in an infrastructure and query each node only once, using one of the available Parents. + +Netdata Cloud prefers: + +- The most distant (from the Child) Parent available, when doing metrics visualization queries (since usually these Parents have been added for this purpose). + +- The closest (to the Child) Parent available, for [Top Monitoring](https://github.com/netdata/netdata/blob/master/docs/cloud/netdata-functions.md) (since top-monitoring provides live data, like the processes running, the list of sockets open, etc). The streaming protocol of Netdata Parents and Children is able to forward such requests to the right child, via the Parents, to respond with live and accurate data. + +Netdata Children may be connected to Netdata Cloud for high-availability, in case the Netdata Parents are unreachable. diff --git a/docs/netdata-cloud/netdata-cloud-on-prem/README.md b/docs/netdata-cloud/netdata-cloud-on-prem/README.md new file mode 100644 index 00000000000000..29601686d9d862 --- /dev/null +++ b/docs/netdata-cloud/netdata-cloud-on-prem/README.md @@ -0,0 +1,77 @@ +# Netdata Cloud On-Prem + +Netdata Cloud is built as microservices and is orchestrated by a Kubernetes cluster, providing a highly available and auto-scaled observability platform. + +The overall architecture looks like this: + +```mermaid +flowchart TD + agents("🌍 Netdata Agents
Users' infrastructure
Netdata Children & Parents") + users[["🔥 Unified Dashboards
Integrated Infrastructure
Dashboards"]] + ingress("🛡️ Ingress Gateway
TLS termination") + traefik((("🔒 Traefik
Authentication &
Authorization"))) + emqx(("📤 EMQX
Agents Communication
Message Bus
MQTT")) + pulsar(("⚡ Pulsar
Internal Microservices
Message Bus")) + frontend("🌐 Front-End
Static Web Files") + auth("👨‍💼 Users & Agents
Authorization
Microservices") + spaceroom("🏡 Spaces, Rooms,
Nodes, Settings

Microservices for
managing Spaces,
Rooms, Nodes and
related settings") + charts("📈 Metrics & Queries
Microservices for
dispatching queries
to Netdata agents") + alerts("🔔 Alerts & Notifications
Microservices for
tracking alert
transitions and
deduplicating alerts") + sql[("✨ PostgreSQL
Users, Spaces, Rooms,
Agents, Nodes, Metric
Names, Metrics Retention,
Custom Dashboards,
Settings")] + redis[("🗒️ Redis
Caches needed
by Microservices")] + elk[("🗞️ Elasticsearch
Feed Events Database")] + bridges("🤝 Input & Output
Microservices bridging
agents to internal
components") + notifications("📢 Notifications Integrations
Dispatch alert
notifications to
3rd party services") + feed("📝 Feed & Events
Microservices for
managing the events feed") + users --> ingress + agents --> ingress + ingress --> traefik + traefik ==>|agents
websockets| emqx + traefik -.- auth + traefik ==>|http| spaceroom + traefik ==>|http| frontend + traefik ==>|http| charts + traefik ==>|http| alerts + spaceroom o-...-o pulsar + spaceroom -.- redis + spaceroom x-..-x sql + spaceroom -.-> feed + charts o-.-o pulsar + charts -.- redis + charts x-.-x sql + charts -..-> feed + alerts o-.-o pulsar + alerts -.- redis + alerts x-.-x sql + alerts -..-> feed + auth o-.-o pulsar + auth -.- redis + auth x-.-x sql + auth -.-> feed + feed <--> elk + alerts ----> notifications + %% auth ~~~ spaceroom + emqx <.-> bridges o-..-o pulsar +``` + +## Requirements + +The following components are required to run Netdata Cloud On-Prem: + +- **Kubernetes cluster** version 1.23+ +- **Kubernetes metrics server** (for autoscaling) +- **TLS certificate** for secure connections. A single endpoint is required but there is an option to split the frontend, api, and MQTT endpoints. The certificate must be trusted by all entities connecting to it. +- Default **storage class configured and working** (persistent volumes based on SSDs are preferred) + +The following 3rd party components are used, which can be pulled with the `netdata-cloud-dependency` package we provide: + +- **Ingress controller** supporting HTTPS +- **PostgreSQL** version 13.7 (main database for all metadata Netdata Cloud maintains) +- **EMQX** version 5.11 (MQTT Broker that allows Agents to send messages to the On-Prem Cloud) +- **Apache Pulsar** version 2.10+ (message broken for inter-container communication) +- **Traefik** version 2.7.x (internal API Gateway) +- **Elasticsearch** version 8.8.x (stores the feed of events) +- **Redis** version 6.2 (caching) +- imagePullSecret (our ECR repos are secured) + +Keep in mind though that the pulled versions are not configured properly for production use. Customers of Netdata Cloud On-Prem are expected to configure these applications according to their needs and policies for production use. Netdata Cloud On-Prem can be configured to use all these applications as a shared resource from other existing production installations. diff --git a/docs/netdata-cloud-onprem/infrastructure.jpeg b/docs/netdata-cloud/netdata-cloud-on-prem/infrastructure.jpeg similarity index 100% rename from docs/netdata-cloud-onprem/infrastructure.jpeg rename to docs/netdata-cloud/netdata-cloud-on-prem/infrastructure.jpeg diff --git a/docs/netdata-cloud/netdata-cloud-on-prem/installation.md b/docs/netdata-cloud/netdata-cloud-on-prem/installation.md new file mode 100644 index 00000000000000..9dd2daf9d06dfd --- /dev/null +++ b/docs/netdata-cloud/netdata-cloud-on-prem/installation.md @@ -0,0 +1,212 @@ +# Netdata Cloud On-Prem Installation + +This installation guide assumes the prerequisites for installing Netdata Cloud On-Prem as satisfied. For more information please refer to the [requirements documentation](Netdata-Cloud-On-Prem.md#requirements). + +## Installation Requirements + +The following components are required to install Netdata Cloud On-Prem: + +- **AWS** CLI +- **Helm** version 3.12+ with OCI Configuration (explained in the installation section) +- **Kubectl** + +## Preparations for Installation + +### Configure AWS CLI + +Install [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html). + +There are 2 options for configuring `aws cli` to work with the provided credentials. The first one is to set the environment variables: + +```bash +export AWS_ACCESS_KEY_ID= +export AWS_SECRET_ACCESS_KEY= +``` + +The second one is to use an interactive shell: + +```bash +aws configure +``` + +### Configure helm to use secured ECR repository + +Using `aws` command we will generate a token for helm to access the secured ECR repository: + +```bash +aws ecr get-login-password --region us-east-1 | helm registry login --username AWS --password-stdin 362923047827.dkr.ecr.us-east-1.amazonaws.com/netdata-cloud-onprem +``` + +After this step you should be able to add the repository to your helm or just pull the helm chart: + +```bash +helm pull oci://362923047827.dkr.ecr.us-east-1.amazonaws.com/netdata-cloud-dependency --untar #optional +helm pull oci://362923047827.dkr.ecr.us-east-1.amazonaws.com/netdata-cloud-onprem --untar +``` + +Local folders with the newest versions of helm charts should appear on your working dir. + +## Installation + +Netdata provides access to two helm charts: + +1. `netdata-cloud-dependency` - required applications for `netdata-cloud-onprem`. +2. `netdata-cloud-onprem` - the application itself + provisioning + +### netdata-cloud-dependency + +This helm chart is designed to install the necessary applications: + +- Redis +- Elasticsearch +- EMQX +- Apache Pulsar +- PostgreSQL +- Traefik +- Mailcatcher +- k8s-ecr-login-renew +- kubernetes-ingress + +Although we provide an easy way to install all these applications, we expect users of Netdata Cloud On-Prem to provide production quality versions for them. Therefore, every configuration option is available through `values.yaml` in the folder that contains your netdata-cloud-dependency helm chart. All configuration options are described in `README.md` which is a part of the helm chart. + +Each component can be enabled/disabled individually. It is done by true/false switches in `values.yaml`. This way, it is easier to migrate to production-grade components gradually. + +Unless you prefer otherwise, `k8s-ecr-login-renew` is responsible for calling out the `AWS API` for token regeneration. This token is then injected into the secret that every node is using for authentication with secured ECR when pulling the images. + +The default setting in `values.yaml` of `netdata-cloud-onprem` - `.global.imagePullSecrets` is configured to work out of the box with the dependency helm chart. + +For helm chart installation - save your changes in `values.yaml` and execute: + +```shell +cd [your helm chart location] +helm upgrade --wait --install netdata-cloud-dependency -n netdata-cloud --create-namespace -f values.yaml . +``` + +Keep in mind that `netdata-cloud-dependency` is provided only as a proof of concept. Users installing Netdata Cloud On-Prem should properly configure these components. + +### netdata-cloud-onprem + +Every configuration option is available in `values.yaml` in the folder that contains your `netdata-cloud-onprem` helm chart. All configuration options are described in the `README.md` which is a part of the helm chart. + +#### Installing Netdata Cloud On-Prem + +```shell +cd [your helm chart location] +helm upgrade --wait --install netdata-cloud-onprem -n netdata-cloud --create-namespace -f values.yaml . +``` + +##### Important notes + +1. Installation takes care of provisioning the resources with migration services. + +2. During the first installation, a secret called the `netdata-cloud-common` is created. It contains several randomly generated entries. Deleting helm chart is not going to delete this secret, nor reinstalling the whole On-Prem, unless manually deleted by kubernetes administrator. The content of this secret is extremely relevant - strings that are contained there are essential parts of encryption. Losing or changing the data that it contains will result in data loss. + +## Short description of Netdata Cloud microservices + +#### cloud-accounts-service + +Responsible for user registration & authentication. Manages user account information. + +#### cloud-agent-data-ctrl-service + +Forwards request from the cloud to the relevant agents. +The requests include: +- Fetching chart metadata from the agent +- Fetching chart data from the agent +- Fetching function data from the agent + +#### cloud-agent-mqtt-input-service + +Forwards MQTT messages emitted by the agent related to the agent entities to the internal Pulsar broker. These include agent connection state updates. + +#### cloud-agent-mqtt-output-service + +Forwards Pulsar messages emitted in the cloud related to the agent entities to the MQTT broker. From there, the messages reach the relevant agent. + +#### cloud-alarm-config-mqtt-input-service + +Forwards MQTT messages emitted by the agent related to the alarm-config entities to the internal Pulsar broker. These include the data for the alarm configuration as seen by the agent. + +#### cloud-alarm-log-mqtt-input-service + +Forwards MQTT messages emitted by the agent related to the alarm-log entities to the internal Pulsar broker. These contain data about the alarm transitions that occurred in an agent. + +#### cloud-alarm-mqtt-output-service + +Forwards Pulsar messages emitted in the cloud related to the alarm entities to the MQTT broker. From there, the messages reach the relevant agent. + +#### cloud-alarm-processor-service + +Persists latest alert statuses received from the agent in the cloud. +Aggregates alert statuses from relevant node instances. +Exposes API endpoints to fetch alert data for visualization on the cloud. +Determines if notifications need to be sent when alert statuses change and emits relevant messages to Pulsar. +Exposes API endpoints to store and return notification-silencing data. + +#### cloud-alarm-streaming-service + +Responsible for starting the alert stream between the agent and the cloud. +Ensures that messages are processed in the correct order, and starts a reconciliation process between the cloud and the agent if out-of-order processing occurs. + +#### cloud-charts-mqtt-input-service + +Forwards MQTT messages emitted by the agent related to the chart entities to the internal Pulsar broker. These include the chart metadata that is used to display relevant charts on the cloud. + +#### cloud-charts-mqtt-output-service + +Forwards Pulsar messages emitted in the cloud related to the charts entities to the MQTT broker. From there, the messages reach the relevant agent. + +#### cloud-charts-service + +Exposes API endpoints to fetch the chart metadata. +Forwards data requests via the `cloud-agent-data-ctrl-service` to the relevant agents to fetch chart data points. +Exposes API endpoints to call various other endpoints on the agent, for instance, functions. + +#### cloud-custom-dashboard-service + +Exposes API endpoints to fetch and store custom dashboard data. + +#### cloud-environment-service + +Serves as the first contact point between the agent and the cloud. +Returns authentication and MQTT endpoints to connecting agents. + +#### cloud-feed-service + +Processes incoming feed events and stores them in Elasticsearch. +Exposes API endpoints to fetch feed events from Elasticsearch. + +#### cloud-frontend + +Contains the on-prem cloud website. Serves static content. + +#### cloud-iam-user-service + +Acts as a middleware for authentication on most of the API endpoints. Validates incoming token headers, injects the relevant ones, and forwards the requests. + +#### cloud-metrics-exporter + +Exports various metrics from an On-Prem Cloud installation. Uses the Prometheus metric exposition format. + +#### cloud-netdata-assistant + +Exposes API endpoints to fetch a human-friendly explanation of various netdata configuration options, namely the alerts. + +#### cloud-node-mqtt-input-service + +Forwards MQTT messages emitted by the agent related to the node entities to the internal Pulsar broker. These include the node metadata as well as their connectivity state, either direct or via parents. + +#### cloud-node-mqtt-output-service + +Forwards Pulsar messages emitted in the cloud related to the charts entities to the MQTT broker. From there, the messages reach the relevant agent. + +#### cloud-notifications-dispatcher-service + +Exposes API endpoints to handle integrations. +Handles incoming notification messages and uses the relevant channels(email, slack...) to notify relevant users. + +#### cloud-spaceroom-service + +Exposes API endpoints to fetch and store relations between agents, nodes, spaces, users, and rooms. +Acts as a provider of authorization for other cloud endpoints. +Exposes API endpoints to authenticate agents connecting to the cloud. diff --git a/docs/netdata-cloud/netdata-cloud-on-prem/poc-without-k8s.md b/docs/netdata-cloud/netdata-cloud-on-prem/poc-without-k8s.md new file mode 100644 index 00000000000000..6be4066bd25e8a --- /dev/null +++ b/docs/netdata-cloud/netdata-cloud-on-prem/poc-without-k8s.md @@ -0,0 +1,70 @@ +# Netdata Cloud On-Prem PoC without k8s + +These instructions are about installing a light version of Netdata Cloud, for clients who do not have a Kubernetes cluster installed. This setup is **only for demonstration purposes**, as it has no built-in resiliency on failures of any kind. + +## Requirements + +- Ubuntu 22.04 (clean installation will work best). +- 10 CPU Cores and 24 GiB of memory. +- Access to shell as a sudo. +- TLS certificate for Netdata Cloud On-Prem PoC. A single endpoint is required. The certificate must be trusted by all entities connecting to this installation. +- AWS ID and License Key - we should have provided this to you, if not contact us: . + +To install the whole environment, log in to the designated host and run: + +```bash +curl https://netdata-cloud-netdata-static-content.s3.amazonaws.com/provision.sh -o provision.sh +chmod +x provision.sh +sudo ./provision.sh install \ + -key-id "" \ + -access-key "" \ + -onprem-license-key "" \ + -onprem-license-subject "" \ + -onprem-url "" \ + -certificate-path "" \ + -private-key-path "" +``` + +What does the script do during installation? + +1. Prompts for user to provide: + - `-key-id` - AWS ECR access key ID. + - `-access-key` - AWS ECR Access Key. + - `-onprem-license-key` - Netdata Cloud On-Prem license key. + - `-onprem-license-subject` - Netdata Cloud On-Prem license subject. + - `-onprem-url` - URL for the On-prem (without http(s) protocol). + - `-certificate-path` - path to your PEM encoded certificate. + - `-private-key-path` - path to your PEM encoded key. + +2. After all the above installation will begin. The script will install: + - Helm + - Kubectl + - AWS CLI + - K3s cluster (single node) + +3. When all the required software is installed script starts to provision the K3s cluster with gathered data. + +After cluster provisioning netdata is ready to be used. + +> WARNING: +> This script will automatically expose not only netdata but also a mailcatcher under `/mailcatcher`. + +## How to log in? + +Only login by mail can work without further configuration. Every mail this Netdata Cloud On-Prem sends, will appear on the mailcatcher, which acts as the SMTP server with a simple GUI to read the mails. + +Steps: + +1. Open Netdata Cloud On-Prem PoC in the web browser on URL you specified +2. Provide email and use the button to confirm +3. Mailcatcher will catch all the emails so go to `/mailcatcher`. Find yours and click the link. +4. You are now logged into Netdata Cloud. Add your first nodes! + +## How to remove Netdata Cloud On-Prem PoC? + +To uninstall the whole PoC, use the same script that installed it, with the `uninstall` switch. + +```shell +cd