webfutureiorepo · pull · Feb 26, 2024 · Feb 24, 2024 · Feb 25, 2024 · Feb 25, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,9 @@
 
 **Merged pull requests:**
 
+- updated sizing netdata [\#17057](https://github.com/netdata/netdata/pull/17057) ([ktsaou](https://github.com/ktsaou))
+- fix zpool state chart family [\#17054](https://github.com/netdata/netdata/pull/17054) ([ilyam8](https://github.com/ilyam8))
+- DYNCFG: call the interceptor when a test is made on a new job [\#17052](https://github.com/netdata/netdata/pull/17052) ([ktsaou](https://github.com/ktsaou))
 - fix alerts jsonschema prototype for latest dyncfg [\#17047](https://github.com/netdata/netdata/pull/17047) ([ktsaou](https://github.com/ktsaou))
 - Do not use backtrace when sentry is enabled. [\#17043](https://github.com/netdata/netdata/pull/17043) ([vkalintiris](https://github.com/vkalintiris))
 - Keep a count of metrics and samples collected [\#17042](https://github.com/netdata/netdata/pull/17042) ([stelfrag](https://github.com/stelfrag))
@@ -311,7 +314,6 @@
 - code cleanup [\#16542](https://github.com/netdata/netdata/pull/16542) ([ktsaou](https://github.com/ktsaou))
 - Assorted kickstart script fixes. [\#16537](https://github.com/netdata/netdata/pull/16537) ([Ferroin](https://github.com/Ferroin))
 - wip documentation about functions table [\#16535](https://github.com/netdata/netdata/pull/16535) ([ktsaou](https://github.com/ktsaou))
-- Remove openSUSE 15.4 from CI [\#16449](https://github.com/netdata/netdata/pull/16449) ([tkatsoulas](https://github.com/tkatsoulas))
 
 ## [v1.44.3](https://github.com/netdata/netdata/tree/v1.44.3) (2024-02-12)
 
@@ -402,9 +404,6 @@
 - set journal path for logging [\#16457](https://github.com/netdata/netdata/pull/16457) ([ktsaou](https://github.com/ktsaou))
 - add sbindir\_POST to PATH of bash scripts that use `systemd-cat-native` [\#16456](https://github.com/netdata/netdata/pull/16456) ([ilyam8](https://github.com/ilyam8))
 - add LogNamespace to systemd units [\#16454](https://github.com/netdata/netdata/pull/16454) ([ilyam8](https://github.com/ilyam8))
-- Update non-zero uuid key + child conf. [\#16452](https://github.com/netdata/netdata/pull/16452) ([vkalintiris](https://github.com/vkalintiris))
-- Add missing argument. [\#16451](https://github.com/netdata/netdata/pull/16451) ([vkalintiris](https://github.com/vkalintiris))
-- log flood protection to 1000 log lines / 1 minute [\#16450](https://github.com/netdata/netdata/pull/16450) ([ilyam8](https://github.com/ilyam8))
 
 ## [v1.43.2](https://github.com/netdata/netdata/tree/v1.43.2) (2023-10-30)
 

diff --git a/README.md b/README.md
@@ -41,19 +41,19 @@ It scales nicely from just a single server to thousands of servers, even in comp
   Operating system metrics, container metrics, virtual machines, hardware sensors, applications metrics, OpenMetrics exporters, StatsD, and logs.
 
 - :muscle: **Real-Time, Low-Latency, High-Resolution**<br/>
-  All metrics are collected per second and are on the dashboard immediately after data collection. Netdata is designed to be fast.
+  All metrics are collected per second and are on the dashboard immediately after data collection.
 
 - :face_in_clouds: **Unsupervised Anomaly Detection**<br/>
-  Trains multiple Machine-Learning (ML) models for each metric collected and detects anomalies based on the past behavior of each metric individually.
+  Trains multiple Machine-Learning (ML) models for each metric and uses AI to detect anomalies based on the past behavior of each metric.
 
 - :fire: **Powerful Visualization**<br/>
-  Clear and precise visualization that allows you to quickly understand any dataset, but also to filter, slice and dice the data directly on the dashboard, without the need to learn any query language.
+  Clear and precise visualization allowing you to understand any dataset at first sight, but also to filter, slice and dice the data directly on the dashboard, without the need to learn a query language.
 
 - :bell: **Out of box Alerts**<br/>
   Comes with hundreds of alerts out of the box to detect common issues and pitfalls, revealing issues that can easily go unnoticed. It supports several notification methods to let you know when your attention is needed.
 
 - 📖 **systemd Journal Logs Explorer**<br/>
-  Provides a `systemd` journal logs explorer, to view, filter and analyze system and applications logs by directly accessing `systemd` journal files on individual hosts and infrastructure-wide logs centralization servers.
+  System and application logs of all servers are available in-real-time, for filtering and analysis, on both individual nodes and infrastructure-wide logs centralization servers.
 
 - :sunglasses: **Low Maintenance**<br/>
   Fully automated in every aspect: automated dashboards, out-of-the-box alerts, auto-detection and auto-discovery of metrics, zero-touch machine-learning, easy scalability and high availability, and CI/CD friendly.

diff --git a/docs/netdata-agent/sizing-netdata-agents/README.md b/docs/netdata-agent/sizing-netdata-agents/README.md
@@ -58,7 +58,9 @@ The following are some of the innovations the open-source Netdata agent has, tha
 
 2. **4 bytes per sample uncompressed**
 
-   To achieve optimal memory and disk footprint, Netdata uses a custom 32-bit floating point number we have developed. This floating point number is used to store the samples collected, together with their anomaly bit. The database of Netdata is fixed-step, so it has predefined slots for every sample, allowing Netdata to store timestamps once every several hundreds samples, minimizing both its memory requirements and the disk footprint.
+   To achieve optimal memory and disk footprint, Netdata uses a custom 32-bit floating point number. This floating point number is used to store the samples collected, together with their anomaly bit. The database of Netdata is fixed-step, so it has predefined slots for every sample, allowing Netdata to store timestamps once every several hundreds samples, minimizing both its memory requirements and the disk footprint.
+
+   The final disk footprint of Netdata varies due to compression efficiency. It is usually about 0.6 bytes per sample for the high-resolution tier (per-second), 6 bytes per sample for the mid-resolution tier (per-minute) and 18 bytes per sample for the low-resolution tier (per-hour).
 
 3. **Query priorities**
 

diff --git a/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md b/docs/netdata-agent/sizing-netdata-agents/disk-requirements-and-retention.md
@@ -28,11 +28,11 @@ To configure database mode `ram` or `alloc`, in `netdata.conf`, set the followin
 
 `dbengine` supports up to 5 tiers. By default, 3 tiers are used, like this:
 
-|   Tier   |                                          Resolution                                          | Uncompressed Sample Size |
-|:--------:|:--------------------------------------------------------------------------------------------:|:------------------------:|
-| `tier0`  |            native resolution (metrics collected per-second as stored per-second)             |         4 bytes          |
-| `tier1`  | 60 iterations of `tier0`, so when metrics are collected per-second, this tier is per-minute. |         16 bytes         |
-| `tier2`  |  60 iterations of `tier1`, so when metrics are collected per second, this tier is per-hour.  |         16 bytes         |
+|   Tier   |                                          Resolution                                          | Uncompressed Sample Size | Usually On Disk |
+|:--------:|:--------------------------------------------------------------------------------------------:|:------------------------:|:---------------:|
+| `tier0`  |            native resolution (metrics collected per-second as stored per-second)             |         4 bytes          |    0.6 bytes    |
+| `tier1`  | 60 iterations of `tier0`, so when metrics are collected per-second, this tier is per-minute. |         16 bytes         |     6 bytes     |
+| `tier2`  |  60 iterations of `tier1`, so when metrics are collected per second, this tier is per-hour.  |         16 bytes         |    18 bytes     |
 
 Data are saved to disk compressed, so the actual size on disk varies depending on compression efficiency.
 
@@ -56,40 +56,46 @@ You can find information about the current disk utilization of a Netdata Parent,
 ```json
 {
   // more information about the agent
-  // near the end:
+  // then, near the end:
   "db_size": [
     {
       "tier": 0,
-      "disk_used": 1677528462156,
-      "disk_max": 1677721600000,
-      "disk_percent": 99.9884881,
-      "from": 1706201952,
-      "to": 1707401946,
-      "retention": 1199994,
-      "expected_retention": 1200132,
-      "currently_collected_metrics": 2198777
+      "metrics": 43070,
+      "samples": 88078162001,
+      "disk_used": 41156409552,
+      "disk_max": 41943040000,
+      "disk_percent": 98.1245269,
+      "from": 1705033983,
+      "to": 1708856640,
+      "retention": 3822657,
+      "expected_retention": 3895720,
+      "currently_collected_metrics": 27424
     },
     {
       "tier": 1,
-      "disk_used": 838123468064,
-      "disk_max": 838860800000,
-      "disk_percent": 99.9121032,
-      "from": 1702885800,
-      "to": 1707401946,
-      "retention": 4516146,
-      "expected_retention": 4520119,
-      "currently_collected_metrics": 2198777
+      "metrics": 72987,
+      "samples": 5155155269,
+      "disk_used": 20585157180,
+      "disk_max": 20971520000,
+      "disk_percent": 98.1576785,
+      "from": 1698287340,
+      "to": 1708856640,
+      "retention": 10569300,
+      "expected_retention": 10767675,
+      "currently_collected_metrics": 27424
     },
     {
       "tier": 2,
-      "disk_used": 334329683032,
-      "disk_max": 419430400000,
-      "disk_percent": 79.710408,
-      "from": 1679670000,
-      "to": 1707401946,
-      "retention": 27731946,
-      "expected_retention": 34790871,
-      "currently_collected_metrics": 2198777
+      "metrics": 148234,
+      "samples": 314919121,
+      "disk_used": 5957346684,
+      "disk_max": 10485760000,
+      "disk_percent": 56.8136853,
+      "from": 1667808000,
+      "to": 1708856640,
+      "retention": 41048640,
+      "expected_retention": 72251324,
+      "currently_collected_metrics": 27424
     }
   ]
 }
@@ -98,6 +104,8 @@ You can find information about the current disk utilization of a Netdata Parent,
 In this example:
 
 - `tier` is the database tier.
+- `metrics` is the number of unique time-series in the database.
+- `samples` is the number of samples in the database.
 - `disk_used` is the currently used disk space in bytes.
 - `disk_max` is the configured max disk space in bytes.
 - `disk_percent` is the current disk space utilization for this tier.
@@ -107,21 +115,13 @@ In this example:
 - `expected_retention` is the expected retention in seconds when `disk_percent` will be 100 (divide by 3600 for hours, divide by 86400 for days).
 - `currently_collected_metrics` is the number of unique time-series currently being collected for this tier.
 
-The estimated number of samples on each tier can be calculated as follows:
-
-```
-estimasted number of samples = retention / sample duration * currently_collected_metrics
-```
-
 So, for our example above:
 
-|  Tier   | Sample Duration (seconds) | Estimated Number of Samples | Disk Space Used | Current Retention (days) | Expected Retention (days) | Bytes Per Sample |
-|:-------:|:-------------------------:|:---------------------------:|:---------------:|:------------------------:|:-------------------------:|:----------------:|
-| `tier0` |             1             |    2.64 trillion samples    |    1.56 TiB     |           13.8           |           13.9            |       0.64       |
-| `tier1` |            60             |    165.5 billion samples    |     780 GiB     |           52.2           |           52.3            |       5.01       |
-| `tier2` |           3600            |    16.9 billion samples     |     311 GiB     |          320.9           |           402.7           |      19.73       |
-
-Note: as you can see in this example, the disk footprint per sample of `tier2` is bigger than the uncompressed sample size (19.73 bytes vs 16 bytes). This is due to the fact that samples are organized into pages and pages into extents. When Netdata is restarted frequently, it saves all data prematurely, before filling up entire pages and extents, leading to increased overheads per sample.
+| Tier | # Of Metrics |  # Of Samples | Disk Used | Disk Free | Current Retention | Expected Retention | Sample Size |
+|-----:|-------------:|--------------:|----------:|----------:|------------------:|-------------------:|------------:|
+|    0 |        43.1K |  88.1 billion |    38.4Gi |     1.88% |         44.2 days |          45.0 days |      0.46 B |
+|    1 |        73.0K |   5.2 billion |    19.2Gi |     1.84% |        122.3 days |         124.6 days |      3.99 B |
+|    2 |       148.3K | 315.0 million |     5.6Gi |    43.19% |        475.1 days |         836.2 days |     18.91 B |
 
 To configure retention, in `netdata.conf`, set the following:
 

diff --git a/packaging/version b/packaging/version
@@ -1 +1 @@
-v1.44.0-406-nightly
+v1.44.0-412-nightly
diff --git a/src/collectors/proc.plugin/proc_spl_kstat_zfs.c b/src/collectors/proc.plugin/proc_spl_kstat_zfs.c
@@ -272,7 +272,7 @@ int update_zfs_pool_state_chart(const DICTIONARY_ITEM *item, void *pool_p, void
                     "zfspool",
                     chart_id,
                     NULL,
-                    name,
+                    "state",
                     "zfspool.state",
                     "ZFS pool state",
                     "boolean",

diff --git a/src/daemon/config/dyncfg-intercept.c b/src/daemon/config/dyncfg-intercept.c
@@ -180,7 +180,7 @@ static int dyncfg_intercept_early_error(struct rrd_function_execute *rfe, int rc
     return rc;
 }
 
-static const DICTIONARY_ITEM *dyncfg_get_template_of_new_job(const char *job_id) {
+const DICTIONARY_ITEM *dyncfg_get_template_of_new_job(const char *job_id) {
     char id_copy[strlen(job_id) + 1];
     memcpy(id_copy, job_id, sizeof(id_copy));
 

diff --git a/src/daemon/config/dyncfg-internals.h b/src/daemon/config/dyncfg-internals.h
@@ -76,6 +76,8 @@ const DICTIONARY_ITEM *dyncfg_add_internal(RRDHOST *host, const char *id, const
 int dyncfg_function_intercept_cb(struct rrd_function_execute *rfe, void *data);
 void dyncfg_cleanup(DYNCFG *v);
 
+const DICTIONARY_ITEM *dyncfg_get_template_of_new_job(const char *job_id);
+
 bool dyncfg_is_user_disabled(const char *id);
 
 RRDHOST *dyncfg_rrdhost_by_uuid(UUID *uuid);

diff --git a/src/daemon/config/dyncfg-tree.c b/src/daemon/config/dyncfg-tree.c
@@ -204,31 +204,57 @@ static int dyncfg_config_execute_cb(struct rrd_function_execute *rfe, void *data
         action = path;
         path = NULL;
 
-        if(id && *id && dyncfg_cmds2id(action) == DYNCFG_CMD_REMOVE) {
-            const DICTIONARY_ITEM *item = dictionary_get_and_acquire_item(dyncfg_globals.nodes, id);
-            if(item) {
-                DYNCFG *df = dictionary_acquired_item_value(item);
+        DYNCFG_CMDS cmd = dyncfg_cmds2id(action);
+        const DICTIONARY_ITEM *item = dictionary_get_and_acquire_item(dyncfg_globals.nodes, id);
+        if(!item)
+            item = dyncfg_get_template_of_new_job(id);
 
-                if(!rrd_function_available(host, string2str(df->function)))
-                    df->current.status = DYNCFG_STATUS_ORPHAN;
+        if(item) {
+            DYNCFG *df = dictionary_acquired_item_value(item);
 
+            if(!rrd_function_available(host, string2str(df->function)))
+                df->current.status = DYNCFG_STATUS_ORPHAN;
+
+            if(cmd == DYNCFG_CMD_REMOVE) {
                 bool delete = (df->current.status == DYNCFG_STATUS_ORPHAN);
                 dictionary_acquired_item_release(dyncfg_globals.nodes, item);
+                item = NULL;
 
                 if(delete) {
+                    if(!http_access_user_has_enough_access_level_for_endpoint(rfe->user_access, df->edit_access)) {
+                        code = dyncfg_default_response(
+                            rfe->result.wb, HTTP_RESP_FORBIDDEN,
+                            "dyncfg: you don't have enough edit permissions to execute this command");
+                        goto cleanup;
+                    }
+
                     dictionary_del(dyncfg_globals.nodes, id);
                     dyncfg_file_delete(id);
                     code = dyncfg_default_response(rfe->result.wb, 200, "");
                     goto cleanup;
                 }
             }
+            else if(cmd == DYNCFG_CMD_TEST && df->type == DYNCFG_TYPE_TEMPLATE && df->current.status != DYNCFG_STATUS_ORPHAN) {
+                const char *old_rfe_function = rfe->function;
+                char buf2[2048];
+                snprintfz(buf2, sizeof(buf2), "config %s %s", dictionary_acquired_item_name(item), action);
+                rfe->function = buf2;
+                dictionary_acquired_item_release(dyncfg_globals.nodes, item);
+                item = NULL;
+                code = dyncfg_function_intercept_cb(rfe, data);
+                rfe->function = old_rfe_function;
+                return code;
+            }
+
+            if(item)
+                dictionary_acquired_item_release(dyncfg_globals.nodes, item);
         }
 
         code = HTTP_RESP_NOT_FOUND;
         nd_log(NDLS_DAEMON, NDLP_ERR,
                "DYNCFG: unknown config id '%s' in call: '%s'. "
                "This can happen if the plugin that registered the dynamic configuration is not running now.",
-               action, rfe->function);
+               id, rfe->function);
 
         rrd_call_function_error(
             rfe->result.wb,
@@ -248,7 +274,11 @@ static int dyncfg_config_execute_cb(struct rrd_function_execute *rfe, void *data
 // for which there is no id overloaded.
 
 void dyncfg_host_init(RRDHOST *host) {
+    // IMPORTANT:
+    // This function needs to be async, although it is internal.
+    // The reason is that it can call by itself another function that may or may not be internal (sync).
+
     rrd_function_add(host, NULL, PLUGINSD_FUNCTION_CONFIG, 120,
                      1000, "Dynamic configuration", "config", HTTP_ACCESS_ANONYMOUS_DATA,
-                     true, dyncfg_config_execute_cb, host);
+                     false, dyncfg_config_execute_cb, host);
 }