Skip to content

Commit

Permalink
Balance streaming parents (netdata#18945)
Browse files Browse the repository at this point in the history
* recreate the circular buffer from time to time

* do not update cloud url if the node id is not updated

* remove deadlock and optimize pipe size

* removed const

* finer control on randomized delays

* restore children re-connecting to parents

* handle partial pipe reads; sender_commit() now checks if the sender is still connected to avoid bombarding it with data that cannot be sent

* added commented code about optimizing the array of pollfds

* improve interactivity of sender; code cleanup

* do not use the pipe for sending messages, instead use a queue in memory (that can never be full)

* fix dictionaries families

* do not destroy aral on replication exit - it crashes the senders

* support multiple dispatchers and connectors; code cleanup

* more cleanup

* Add serde support for KMeans models.

- Serialization/Deserialization support of KMeans models.
- Send/receive ML models between a child/parent.
- Fix some rare and old crash reports.
- Reduce allocations by a couple thousand per second when training.
- Enable ML statistics temporarily which might increase CPU consumption.

* fix ml models streaming

* up to 10 dispatchers and 2 connectors

* experiment: limit the number of receivers to the number of cores - 2

* reworked compression at the receiver to minimize read operations

* multi-core receivers

* use slot 0 on receivers

* use slot 0 on receivers

* use half the cores for receivers with a minimum of 4

* cancel receiver threads

* use offsets instead of pointers in the compressed buffer; track last reads

* fix crash on using freed decompressor; core re-org

* fix incorrect job registration

* fix send_to_plugin() for SSL

* add reason to disconnect message

* fix signaling receivers to stop

* added --dev option to netdata-installer.sh to prevent it from removing the build directory

* Fix serde of double values.

NaNs and +/- infinities are encoded as strings.

* unused param

* reset max cbuffer size when it is recreated

* struct receiver_state is now private

* 1 dispatcher, 1 connector, 2/3 cores for receivers

* all replication requests are served by replication threads - never the dispatcher threads

* optimize partitions and cache lines for dbengine cache

* fix crash on receiver shutdown

* rw spinlock now prioritizes writers

* backfill all higher tiers

* extent cache to 10%

* automatic sizing of replication threads

* add more replication threads

* configure cache eviction parameters to avoid running in aggressive mode all the time

* run evictions and flushes every 100ms

* add missing initialization

* add missing initialization - again

* add evictors for all caches

* add dedicated evict thread per cache

* destroy the completion

* avoid sending too many signals to eviction threads

* alternative way to make sure there are data to evict

* measure inline cache events

* disable inline evictions and flushing for open and extent cache

* use a spinlock to avoid sending too many signals

* batch evictions are not in steps of pages

* fix wanted cache size when there are no clean entries in it

* fix wanted cache size when there are no clean entries in it

* fix wanted cache size again

* adaptive batch evictions; batch evictions first try all partitions

* move waste events to waste chart

* added evict_traversed

* evict is smaller steps

* removed obsolete code

* disabled inlining of evictions and flushing; added timings for evictions

* more detailed timings for evictions

* use inline evictors

* use aral for gorilla pages of 512 bytes, when they are loaded from disk

* use aral for all gorilla page sizes loaded from disk

* disable inlining again to test it after the memory optimization

* timings for dbengine evictions

* added timing names

* detailed timings

* detailed timings - again

* removed timings and restored inline evictions

* eviction on release only under critical pressure

* cleanup and replication tuning

* tune cache size calculation

* tune replication threads calculation

* make streaming receiver exit

* Do not allocate/copy extent data twice.

* Build/link mimalloc

Just for testing, it will be reverted.

* lower memory requirements

* Link mimalloc statically

* run replication with synchronous queries

* added missing worker jobs in sender dispatcher

* enable batch evictions in pgc

* fix sender-dispatcher workers

* set max dispatchers to 2

* increase the default replication threads

* log stream_info errors

* increase replication threads

* log the json text when we fail to parse json response of stream_info

* stream info response may come back in multiple steps

* print the socket error of stream info

* added debug to stream info socket error

* loop while content-length is smaller than the payload received

* Revert "Link mimalloc statically"

This reverts commit c98e482.

* Revert "Build/link mimalloc"

This reverts commit 8aae22a.

* Remove NEED_PROTOBUF

* Use mimalloc

* Revert "Use mimalloc"

This reverts commit 9a68034.

* Use mimalloc

* support 256 bytes gorilla pages, when they are loaded from disk

* added os_mem_available()

* test memory protection

* use protection only on one cache

* use the free memory of the main cache in the other caches too

* use the free memory of the main cache in the open cache too

* Batch gorilla writes by tracking the last written number.

In a setup with 200 children, `perf` shows that
the worst offender is the gorilla write operation,
reporting ~17% overhead.

With this change `perf` reports ~4% overhead and
netdata's CPU consumption decreased by ~16%.

* make buffered_reader_next_line() a couple times faster

* flushing open cache

* Use re2c for the line splitting pluginsd.

Function get's optimized around 3x.

We should delete old code and use the re2c for
the rest of the functions, but we need to keep
the PR size as minimal as possible. Will do in
follow up PRs.

* use cores - 1 for receivers, use only 1 sender

* move sender processing to a separate function

* Revert "Batch gorilla writes by tracking the last written number."

This reverts commit 2e72a5c.

* Batch gorilla writes only from writers

This reapplies df79be2f01145bd79091a8934d7c80b4b3eb915b
and introduces a couple changes to remomove writes
from readers.

* log information for buffer overflow

* fix heap use after free

* added comments to the main stream receiver loop

* 3 dispatchers

* single threaded receiver and sender

* code cleanup

* de-associate hosts from streaming threads when both the receiver and sender stop, so that each time the threads are re-balanced

* fix heap use after free

* properly get the slot number of pollfd

* fixes

* fixes

* revert worker changes

* reuse streaming threads

* backfilling should be synchronous

* remove the node last

* do not keep a pointer to rellocatable buffer

* give to pgc the right page size, not less

* restore spreading metrics size across time

* use the calculated slots for gorilla pages

* accurately track gorilla page size changes

* check the sth pointer for validity

* code cleanup, files re-org and renames to reflect the new structure of streaming

* updated referenced size when the size of a page changes; removed flush spins - fluhses cancelled is a waste event

* improve families in netdata statistics

* page size histogram per cache

* page size histogram per cache queue (hot, dirty, clean)

* fix heap after use in pdc.c

* rw_spinlocks: when preferring a writer yield so that the writer has the chance to get the lock

* do not balloon open and extent caches more than needed (it fragments memory and there is not enough memory for the main cache)

* fixed typo

* enable trace allocations to work

* Skip adding kmeans model when ML dimension has not been created.

* PGD is now entirely on ARAL for all types of pages

* 2 partitions for PGD

* Check for ML queue prior to pushing as well.

* merge multiple arals, to avoid wasting memory

* significantly less arals; proper calculation of gorilla efficiency

* report pgd buffers separately from pgc

* aral only for sizes less than 512 bytes

* tune aral caches

* log the functions using the streaming buffer when concurrent use is detected

* aral supporting different pages for collected pages and clean pages - an attempt to minimize fragmentation at high performance

* fix misuse of sender thread buffers

* select the right buffer, based on the receiver tid

* no more rrdpush, renamed to stream

* lower aral max page size to 16KiB - in an attempt to lower fragmentation under memory pressure

* update opcode handling

* automatic sizing of aral limiting its size to 200 items per page or 4 x system pages

* tune cache eviction strategy

* renamed global statistics to telemetry and split it into multiple files

* left over renames of global statistics to telemetry

* added heatmap to chart types

* note about re-balancing a parents cluster

* fix formating

* added aral telemetry to find the fragmentation per aral

* experiment with a different strategy when making clean pages: always append so that the cache is being constantly rotated; aral telemetry reports utilization instead of fragmentation

* aral now takes into account waiting deallocators when it creates new pages

* split netdata-conf functions into multiple files; added dbengine use all caches and dbengine out of memory protection settings

* tune cache eviction strategy

* cache parameters cleanup

* rename mem_available to system_memory

* Fix variable type.

* Add fuzzer for pluginsd line splitter.

* use cgroup v1 and v2 to detect memory protection; log on start the detection of memory

* fixed typo

* added logs about system memory detection

* remove debug logs from system memory detection

* move the rest of dbengine config to netdata-conf

* respect streaming buffer size configured

* add workers to pgc eviction threads

* renamed worker

* fixed flip-flop in size and entries conversions

* use aral_by_size when we actually agreegate stats to aral by size

* use keyword defintions

* move opcode definitions to stream-thread.h

* swap struct pollfd slots to make sure all the sockets have an equal chance of being processed

* Revert "Add fuzzer for pluginsd line splitter."

This reverts commit 454cbcf.

* Revert "Use re2c for the line splitting pluginsd."

This reverts commit 2b2f9d3.

* stream thread use judy arrays instead of linked lists and pre-allocated arrays

* added comment about pfd structure on sender and receiver

* fixed logs and made the defaut sender timeout 5 seconds

* Spawn ML worker threads based on number of CPUs.

* Add statistics for ML allocations/deallocations.

* Add host flag to check for pending alert transitions to save
Remove precompiled statements
Offload processing of alerts in the event loop
Queue alert transitions to the metadata event loop to be saved
Run metadata checks every 5 seconds

* do not block doing socket retries when errno indicates EWOULDBLOCK; insist sending data in send_to_plugin()

* Revert "Add host flag to check for pending alert transitions to save"

This reverts commit 86ade0e.

* fix error reasons

* Disable ML memory statistics when using mimalloc

* add reason when ml cannot acquire the dimension

* added ML memory and depending on the DICT_WITH_STATS define, add aral by size too

* do not stream ML when the parent does not have ML enabled

* nd_poll() to overcome the starvation of poll() and use epoll() under Linux

* nd_poll() optimization to minimize the number of system calls

* nd_poll() fix

* nd_poll() fix again

* make glibc release memory to the system when the system is critical in memory

* try bigger aral pages, to enable releasing memory back to the system

* Queue alert transitions to the metadata event loop (global list not per host)
Add host count to check for pending alert transitions to save
Remove precompiled statements
Offload processing of alerts in the event loop
Run metadata checks every 5 seconds

* round robin aral allocations

* fix aral round robin

* ask glibc to release memory when the allocations are aggressive

* tinysleep yields the processor instead of waiting

* run malloc_trim() more frequently

* Add reference count on alarm_entry

* selective tinysleep and processor yielding

* revert gorilla batch writes

* codacy fixes

---------

Co-authored-by: vkalintiris <[email protected]>
Co-authored-by: Stelios Fragkakis <[email protected]>
  • Loading branch information
3 people authored Dec 5, 2024
1 parent da18678 commit 6b8c6ba
Show file tree
Hide file tree
Showing 281 changed files with 21,049 additions and 14,613 deletions.
148 changes: 122 additions & 26 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -438,6 +438,7 @@ check_function_exists(backtrace HAVE_BACKTRACE)
check_function_exists(arc4random_buf HAVE_ARC4RANDOM_BUF)
check_function_exists(arc4random_uniform HAVE_ARC4RANDOM_UNIFORM)
check_function_exists(getrandom HAVE_GETRANDOM)
check_function_exists(sysinfo HAVE_SYSINFO)

#
# check source compilation
Expand Down Expand Up @@ -475,6 +476,14 @@ int main() {
}
" HAVE_C_MALLOPT)

check_c_source_compiles("
#include <malloc.h>
int main() {
malloc_trim(0);
return 0;
}
" HAVE_C_MALLOC_TRIM)

check_c_source_compiles("
#define _GNU_SOURCE
#include <stdio.h>
Expand Down Expand Up @@ -920,6 +929,21 @@ set(LIBNETDATA_FILES
src/libnetdata/xxHash/xxhash.h
src/libnetdata/os/random.c
src/libnetdata/os/random.h
src/libnetdata/socket/nd-sock.c
src/libnetdata/socket/nd-sock.h
src/libnetdata/socket/listen-sockets.c
src/libnetdata/socket/listen-sockets.h
src/libnetdata/socket/poll-events.c
src/libnetdata/socket/poll-events.h
src/libnetdata/socket/connect-to.c
src/libnetdata/socket/connect-to.h
src/libnetdata/socket/socket-peers.c
src/libnetdata/socket/socket-peers.h
src/libnetdata/libjudy/judyl-typed.h
src/libnetdata/os/system_memory.c
src/libnetdata/os/system_memory.h
src/libnetdata/socket/nd-poll.c
src/libnetdata/socket/nd-poll.h
)

set(LIBH2O_FILES
Expand Down Expand Up @@ -1013,8 +1037,8 @@ set(DAEMON_FILES
src/daemon/daemon.h
src/daemon/libuv_workers.c
src/daemon/libuv_workers.h
src/daemon/global_statistics.c
src/daemon/global_statistics.h
src/daemon/telemetry/telemetry.c
src/daemon/telemetry/telemetry.h
src/daemon/analytics.c
src/daemon/analytics.h
src/daemon/main.c
Expand All @@ -1035,15 +1059,59 @@ set(DAEMON_FILES
src/daemon/pipename.h
src/daemon/unit_test.c
src/daemon/unit_test.h
src/daemon/config/dyncfg.c
src/daemon/config/dyncfg.h
src/daemon/config/dyncfg-files.c
src/daemon/config/dyncfg-unittest.c
src/daemon/config/dyncfg-inline.c
src/daemon/config/dyncfg-echo.c
src/daemon/config/dyncfg-internals.h
src/daemon/config/dyncfg-intercept.c
src/daemon/config/dyncfg-tree.c
src/daemon/dyncfg/dyncfg.c
src/daemon/dyncfg/dyncfg.h
src/daemon/dyncfg/dyncfg-files.c
src/daemon/dyncfg/dyncfg-unittest.c
src/daemon/dyncfg/dyncfg-inline.c
src/daemon/dyncfg/dyncfg-echo.c
src/daemon/dyncfg/dyncfg-internals.h
src/daemon/dyncfg/dyncfg-intercept.c
src/daemon/dyncfg/dyncfg-tree.c
src/daemon/telemetry/telemetry-http-api.c
src/daemon/telemetry/telemetry-http-api.h
src/daemon/telemetry/telemetry-queries.c
src/daemon/telemetry/telemetry-queries.h
src/daemon/telemetry/telemetry-ingestion.c
src/daemon/telemetry/telemetry-ingestion.h
src/daemon/telemetry/telemetry-ml.c
src/daemon/telemetry/telemetry-ml.h
src/daemon/telemetry/telemetry-gorilla.c
src/daemon/telemetry/telemetry-gorilla.h
src/daemon/telemetry/telemetry-daemon.c
src/daemon/telemetry/telemetry-daemon.h
src/daemon/telemetry/telemetry-daemon-memory.c
src/daemon/telemetry/telemetry-daemon-memory.h
src/daemon/telemetry/telemetry-sqlite3.c
src/daemon/telemetry/telemetry-sqlite3.h
src/daemon/telemetry/telemetry-dbengine.c
src/daemon/telemetry/telemetry-dbengine.h
src/daemon/telemetry/telemetry-string.c
src/daemon/telemetry/telemetry-string.h
src/daemon/telemetry/telemetry-heartbeat.c
src/daemon/telemetry/telemetry-heartbeat.h
src/daemon/telemetry/telemetry-dictionary.c
src/daemon/telemetry/telemetry-dictionary.h
src/daemon/telemetry/telemetry-workers.c
src/daemon/telemetry/telemetry-workers.h
src/daemon/telemetry/telemetry-trace-allocations.c
src/daemon/telemetry/telemetry-trace-allocations.h
src/daemon/telemetry/telemetry-aral.c
src/daemon/telemetry/telemetry-aral.h
src/daemon/config/netdata-conf-db.c
src/daemon/config/netdata-conf-db.h
src/daemon/config/netdata-conf.h
src/daemon/config/netdata-conf-backwards-compatibility.c
src/daemon/config/netdata-conf-backwards-compatibility.h
src/daemon/config/netdata-conf-web.c
src/daemon/config/netdata-conf-web.h
src/daemon/config/netdata-conf-directories.c
src/daemon/config/netdata-conf-directories.h
src/daemon/config/netdata-conf-logs.c
src/daemon/config/netdata-conf-logs.h
src/daemon/config/netdata-conf-global.c
src/daemon/config/netdata-conf-global.h
src/daemon/config/netdata-conf.c
)

set(H2O_FILES
Expand Down Expand Up @@ -1227,15 +1295,34 @@ if(ENABLE_ML)
set(ML_FILES
src/ml/ad_charts.h
src/ml/ad_charts.cc
src/ml/Config.cc
src/ml/dlib/dlib/all/source.cpp
src/ml/ml.h
src/ml/ml.cc
src/ml/ml-private.h
src/ml/ml_calculated_number.h
src/ml/ml_host.h
src/ml/ml_config.h
src/ml/ml_config.cc
src/ml/ml_dimension.h
src/ml/ml_enums.h
src/ml/ml_enums.cc
src/ml/ml_features.h
src/ml/ml_features.cc
src/ml/ml_kmeans.h
src/ml/ml_kmeans.cc
src/ml/ml_queue.h
src/ml/ml_worker.h
src/ml/ml_string_wrapper.h
src/ml/ml_queue.cc
src/ml/ml_private.h
src/ml/ml_public.h
src/ml/ml_public.cc
)

if(NOT ENABLE_MIMALLOC)
list(APPEND ML_FILES src/ml/ml_memory.cc)
endif()
else()
set(ML_FILES
src/ml/ml.h
src/ml/ml_public.h
src/ml/ml-dummy.c
)
endif()
Expand Down Expand Up @@ -1338,6 +1425,8 @@ set(RRD_PLUGIN_FILES
src/database/rrdfunctions-exporters.h
src/database/rrdfunctions-internals.h
src/database/rrdcollector-internals.h
src/database/rrd-database-mode.h
src/database/rrd-database-mode.c
)

if(ENABLE_DBENGINE)
Expand Down Expand Up @@ -1405,7 +1494,7 @@ set(SYSTEMD_JOURNAL_PLUGIN_FILES
)

set(STREAMING_PLUGIN_FILES
src/streaming/rrdpush.h
src/streaming/stream.h
src/streaming/stream-compression/compression.c
src/streaming/stream-compression/compression.h
src/streaming/stream-compression/brotli.c
Expand All @@ -1416,8 +1505,8 @@ set(STREAMING_PLUGIN_FILES
src/streaming/stream-compression/lz4.h
src/streaming/stream-compression/zstd.c
src/streaming/stream-compression/zstd.h
src/streaming/receiver.c
src/streaming/sender.c
src/streaming/stream-receiver.c
src/streaming/stream-sender.c
src/streaming/replication.c
src/streaming/replication.h
src/streaming/h2o-common.h
Expand All @@ -1429,11 +1518,11 @@ set(STREAMING_PLUGIN_FILES
src/streaming/stream-path.h
src/streaming/stream-capabilities.c
src/streaming/stream-capabilities.h
src/streaming/sender-connect.c
src/streaming/sender-internals.h
src/streaming/sender-execute.c
src/streaming/sender-commit.c
src/streaming/sender-destinations.c
src/streaming/stream-connector.c
src/streaming/stream-sender-internals.h
src/streaming/stream-sender-execute.c
src/streaming/stream-sender-commit.c
src/streaming/stream-parents.c
src/streaming/stream-handshake.c
src/streaming/protocol/command-function.c
src/streaming/protocol/command-host-labels.c
Expand All @@ -1443,11 +1532,17 @@ set(STREAMING_PLUGIN_FILES
src/streaming/stream-conf.c
src/streaming/stream-conf.h
src/streaming/stream-handshake.h
src/streaming/sender.h
src/streaming/sender-destinations.h
src/streaming/stream-parents.h
src/streaming/rrdhost-status.c
src/streaming/rrdhost-status.h
src/streaming/receiver.h
src/streaming/stream-sender-api.c
src/streaming/stream-receiver-internals.h
src/streaming/stream-receiver-api.c
src/streaming/stream-thread.c
src/streaming/stream-thread.h
src/streaming/stream-receiver-connection.c
src/streaming/stream-sender-commit.h
src/streaming/stream-traffic-types.h
)

set(WEB_PLUGIN_FILES
Expand All @@ -1459,6 +1554,7 @@ set(WEB_PLUGIN_FILES
src/web/server/static/static-threaded.h
src/web/server/web_client_cache.c
src/web/server/web_client_cache.h
src/web/api/v3/api_v3_stream_info.c
src/web/api/v3/api_v3_stream_path.c
)

Expand Down
2 changes: 1 addition & 1 deletion docs/developer-and-contributor-corner/python-collector.txt
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ context, charttype]`, where:
- `family`: An identifier used to group charts together (can be null).
- `context`: An identifier used to group contextually similar charts together. The best practice is to provide a context
that is `A.B`, with `A` being the name of the collector, and `B` being the name of the specific metric.
- `charttype`: Either `line`, `area`, or `stacked`. If null line is the default value.
- `charttype`: Either `line`, `area`, `stacked` or `heatmap`. If null line is the default value.

You can read more about `family` and `context` in the [Netdata Charts](/docs/dashboards-and-charts/netdata-charts.md) doc.

Expand Down
2 changes: 1 addition & 1 deletion docs/diagrams/data_structures/web.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions packaging/cmake/config.cmake.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@
#cmakedefine HAVE_ARC4RANDOM_UNIFORM
#cmakedefine HAVE_RAND_S
#cmakedefine HAVE_GETRANDOM
#cmakedefine HAVE_SYSINFO

#cmakedefine HAVE_BACKTRACE
#cmakedefine HAVE_CLOSE_RANGE
Expand All @@ -95,6 +96,7 @@
#cmakedefine STRERROR_R_CHAR_P
#cmakedefine HAVE_C__GENERIC
#cmakedefine HAVE_C_MALLOPT
#cmakedefine HAVE_C_MALLOC_TRIM
#cmakedefine HAVE_SETNS
#cmakedefine HAVE_STRNDUP
#cmakedefine SSL_HAS_PENDING
Expand Down
8 changes: 4 additions & 4 deletions src/aclk/aclk.c
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ static int wait_till_agent_claim_ready()
// We trap the impossible NULL here to keep the linter happy without using a fatal() in the code.
const char *cloud_base_url = cloud_config_url_get();
if (cloud_base_url == NULL) {
netdata_log_error("Do not move the \"url\" out of post_conf_load!!");
netdata_log_error("Do not move the \"url\" out of netdata_conf_section_global_run_as_user!!");
return 1;
}

Expand Down Expand Up @@ -559,7 +559,7 @@ static int aclk_attempt_to_connect(mqtt_wss_client client)
while (service_running(SERVICE_ACLK)) {
aclk_cloud_base_url = cloud_config_url_get();
if (aclk_cloud_base_url == NULL) {
error_report("Do not move the \"url\" out of post_conf_load!!");
error_report("Do not move the \"url\" out of netdata_conf_section_global_run_as_user!!");
aclk_status = ACLK_STATUS_NO_CLOUD_URL;
return -1;
}
Expand Down Expand Up @@ -868,7 +868,7 @@ void aclk_host_state_update(RRDHOST *host, int cmd, int queryable)
create_query->data.bin_payload.topic = ACLK_TOPICID_CREATE_NODE;
create_query->data.bin_payload.msg_name = "CreateNodeInstance";
nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Registering host=%s, hops=%u", host->machine_guid, host->system_info->hops);
"Registering host=%s, hops=%d", host->machine_guid, host->system_info->hops);

aclk_execute_query(create_query);
return;
Expand All @@ -892,7 +892,7 @@ void aclk_host_state_update(RRDHOST *host, int cmd, int queryable)
query->data.bin_payload.payload = generate_node_instance_connection(&query->data.bin_payload.size, &node_state_update);

nd_log(NDLS_DAEMON, NDLP_DEBUG,
"Queuing status update for node=%s, live=%d, hops=%u, queryable=%d",
"Queuing status update for node=%s, live=%d, hops=%d, queryable=%d",
(char*)node_state_update.node_id, cmd, host->system_info->hops, queryable);
freez((void*)node_state_update.node_id);
query->data.bin_payload.msg_name = "UpdateNodeInstanceConnection";
Expand Down
8 changes: 4 additions & 4 deletions src/aclk/aclk_capas.c
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

#include "aclk_capas.h"

#include "ml/ml.h"
#include "ml/ml_public.h"

#define HTTP_API_V2_VERSION 7

Expand Down Expand Up @@ -31,14 +31,14 @@ const struct capability *aclk_get_agent_capas()
agent_capabilities[3].version = metric_correlations_version;
agent_capabilities[3].enabled = 1;

agent_capabilities[7].enabled = localhost->health.health_enabled;
agent_capabilities[7].enabled = localhost->health.enabled;

return agent_capabilities;
}

struct capability *aclk_get_node_instance_capas(RRDHOST *host)
{
bool functions = (host == localhost || (host->receiver && stream_has_capability(host->receiver, STREAM_CAP_FUNCTIONS)));
bool functions = (host == localhost || receiver_has_capability(host, STREAM_CAP_FUNCTIONS));
bool dyncfg = (host == localhost || dyncfg_available_for_rrdhost(host));

struct capability ni_caps[] = {
Expand All @@ -48,7 +48,7 @@ struct capability *aclk_get_node_instance_capas(RRDHOST *host)
{ .name = "ctx", .version = 1, .enabled = 1 },
{ .name = "funcs", .version = functions ? 1 : 0, .enabled = functions ? 1 : 0 },
{ .name = "http_api_v2", .version = HTTP_API_V2_VERSION, .enabled = 1 },
{ .name = "health", .version = 2, .enabled = host->health.health_enabled },
{ .name = "health", .version = 2, .enabled = host->health.enabled},
{ .name = "req_cancel", .version = 1, .enabled = 1 },
{ .name = "dyncfg", .version = 2, .enabled = dyncfg },
{ .name = NULL, .version = 0, .enabled = 0 }
Expand Down
2 changes: 1 addition & 1 deletion src/aclk/https_client.c
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

#include "aclk_util.h"

#include "daemon/global_statistics.h"
#include "daemon/telemetry/telemetry.h"

static const char *http_req_type_to_str(http_req_type_t req) {
switch (req) {
Expand Down
2 changes: 1 addition & 1 deletion src/claim/claim.c
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ CLOUD_STATUS claim_reload_and_wait_online(void) {
cloud_conf_load(0);
bool claimed = load_claiming_state();
registry_update_cloud_base_url();
rrdpush_sender_send_claimed_id(localhost);
stream_sender_send_claimed_id(localhost);
nd_log_limits_reset();

CLOUD_STATUS status = cloud_status();
Expand Down
4 changes: 2 additions & 2 deletions src/claim/cloud-status.c
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ CLOUD_STATUS cloud_status(void) {
return CLOUD_STATUS_ONLINE;

if(localhost->sender &&
rrdhost_flag_check(localhost, RRDHOST_FLAG_RRDPUSH_SENDER_READY_4_METRICS) &&
stream_has_capability(localhost->sender, STREAM_CAP_NODE_ID) &&
rrdhost_flag_check(localhost, RRDHOST_FLAG_STREAM_SENDER_READY_4_METRICS) &&
stream_sender_has_capabilities(localhost, STREAM_CAP_NODE_ID) &&
!UUIDiszero(localhost->node_id) &&
!UUIDiszero(localhost->aclk.claim_id_of_parent))
return CLOUD_STATUS_INDIRECT;
Expand Down
2 changes: 1 addition & 1 deletion src/collectors/apps.plugin/apps_pid.c
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ size_t all_pids_count(void) {
}

void apps_pids_init(void) {
pids.all_pids.aral = aral_create("pid_stat", sizeof(struct pid_stat), 1, 65536, NULL, NULL, NULL, false, true);
pids.all_pids.aral = aral_create("pid_stat", sizeof(struct pid_stat), 1, 0, NULL, NULL, NULL, false, true);
simple_hashtable_init_PID(&pids.all_pids.ht, 1024);
}

Expand Down
2 changes: 1 addition & 1 deletion src/collectors/ebpf.plugin/ebpf.c
Original file line number Diff line number Diff line change
Expand Up @@ -739,7 +739,7 @@ ARAL *ebpf_allocate_pid_aral(char *name, size_t size)
}

return aral_create(name, size,
0, max_elements,
0, 0,
NULL, NULL, NULL, false, false);
}

Expand Down
4 changes: 2 additions & 2 deletions src/collectors/statsd.plugin/statsd.c
Original file line number Diff line number Diff line change
Expand Up @@ -2654,7 +2654,7 @@ void *statsd_main(void *ptr) {
RRDSET *st_pcharts = NULL;
RRDDIM *rd_pcharts = NULL;

if(global_statistics_enabled) {
if(telemetry_enabled) {
st_metrics = rrdset_create_localhost(
"netdata",
"statsd_metrics",
Expand Down Expand Up @@ -2851,7 +2851,7 @@ void *statsd_main(void *ptr) {
if(unlikely(!service_running(SERVICE_COLLECTORS)))
break;

if(global_statistics_enabled) {
if(telemetry_enabled) {
rrddim_set_by_pointer(st_metrics, rd_metrics_gauge, (collected_number)statsd.gauges.metrics);
rrddim_set_by_pointer(st_metrics, rd_metrics_counter, (collected_number)statsd.counters.metrics);
rrddim_set_by_pointer(st_metrics, rd_metrics_timer, (collected_number)statsd.timers.metrics);
Expand Down
Loading

0 comments on commit 6b8c6ba

Please sign in to comment.