Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_process_exporter_metrics: implement process exporter metrics #7943

Merged
merged 3 commits into from
Nov 8, 2023

Conversation

cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Sep 19, 2023

In the official node_exporter, process level of metrics exporter is not provided. the 3rd party exporter is provided. However, the 3rd party exporter is still not process level but for process group level of exporter.
We try to provide genuine process level of exporter in Linux.

Closes #7870


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
# For retrieving all of the processes' metrics
$ bin/fluent-bit -i process_exporter_metrics -o stdout
# For excluding some of the parts of the processes' metrics
$ bin/fluent-bit -i process_exporter_metrics -p 'process_exclude_pattern=/chrome|kworker|firefox|gsd|Co/' -o stdout
# For including some of the parts of the processes' metrics
$ bin/fluent-bit -i process_exporter_metrics -p 'process_include_pattern=/fluent-bit/' -o stdout
  • Debug log output from testing the change

e.g.)

$ bin/fluent-bit -i process_exporter_metrics -p 'process_include_pattern=/memcheck-amd64/' -o stdout -vv 
Fluent Bit v2.2.0
* Copyright (C) 2015-2023 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/10/25 16:06:37] [ info] Configuration:
[2023/10/25 16:06:37] [ info]  flush time     | 1.000000 seconds
[2023/10/25 16:06:37] [ info]  grace          | 5 seconds
[2023/10/25 16:06:37] [ info]  daemon         | 0
[2023/10/25 16:06:37] [ info] ___________
[2023/10/25 16:06:37] [ info]  inputs:
[2023/10/25 16:06:37] [ info]      process_exporter_metrics
[2023/10/25 16:06:37] [ info] ___________
[2023/10/25 16:06:37] [ info]  filters:
[2023/10/25 16:06:37] [ info] ___________
[2023/10/25 16:06:37] [ info]  outputs:
[2023/10/25 16:06:37] [ info]      stdout.0
[2023/10/25 16:06:37] [ info] ___________
[2023/10/25 16:06:37] [ info]  collectors:
[2023/10/25 16:06:37] [ info] [fluent bit] version=2.2.0, commit=6708c732f1, pid=406585
[2023/10/25 16:06:37] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2023/10/25 16:06:37] [ info] [storage] ver=1.1.6, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/10/25 16:06:37] [ info] [cmetrics] version=0.6.4
[2023/10/25 16:06:37] [ info] [ctraces ] version=0.3.1
[2023/10/25 16:06:37] [ info] [input:process_exporter_metrics:process_exporter_metrics.0] initializing
[2023/10/25 16:06:37] [ info] [input:process_exporter_metrics:process_exporter_metrics.0] storage_strategy='memory' (memory only)
[2023/10/25 16:06:37] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] enabled metrics cpu
[2023/10/25 16:06:37] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] enabled metrics io
[2023/10/25 16:06:37] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] enabled metrics memory
[2023/10/25 16:06:37] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] enabled metrics state
[2023/10/25 16:06:37] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] enabled metrics context_switches
[2023/10/25 16:06:37] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] enabled metrics fd
[2023/10/25 16:06:37] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] enabled metrics start_time
[2023/10/25 16:06:37] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] enabled metrics thread_wchan
[2023/10/25 16:06:37] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] enabled metrics thread
[2023/10/25 16:06:37] [ info] [input:process_exporter_metrics:process_exporter_metrics.0] path.procfs = /proc
[2023/10/25 16:06:37] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] [thread init] initialization OK
[2023/10/25 16:06:37] [ info] [input:process_exporter_metrics:process_exporter_metrics.0] thread instance initialized
[2023/10/25 16:06:37] [debug] [process_exporter_metrics:process_exporter_metrics.0] created event channels: read=30 write=31
[2023/10/25 16:06:37] [debug] [stdout:stdout.0] created event channels: read=34 write=35
[2023/10/25 16:06:37] [ info] [sp] stream processor started
[2023/10/25 16:06:37] [ info] [output:stdout:stdout.0] worker #0 started
[2023/10/25 16:06:42] [debug] [input chunk] update output instances with new chunk size diff=6550, records=0, input=process_exporter_metrics.0
[2023/10/25 16:06:43] [trace] [task 0x7f880c020790] created (id=0)
[2023/10/25 16:06:43] [debug] [task] created task=0x7f880c020790 id=0 OK
[2023/10/25 16:06:43] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
2023-10-25T07:06:42.518438676Z process_cpu_seconds_total{name="fluent-bit",pid="406585",ppid="8832",mode="user"} = 0
2023-10-25T07:06:42.518438676Z process_cpu_seconds_total{name="fluent-bit",pid="406585",ppid="8832",mode="system"} = 0
2023-10-25T07:06:42.518438676Z process_read_bytes_total{name="fluent-bit",pid="406585",ppid="8832"} = 0
2023-10-25T07:06:42.518438676Z process_write_bytes_total{name="fluent-bit",pid="406585",ppid="8832"} = 0
2023-10-25T07:06:42.518438676Z process_major_page_faults_total{name="fluent-bit",pid="406585",ppid="8832"} = 0
2023-10-25T07:06:42.518438676Z process_minor_page_faults_total{name="fluent-bit",pid="406585",ppid="8832"} = 966
2023-10-25T07:06:42.518438676Z process_context_switches_total{name="fluent-bit",pid="406585",context_switch_type="voluntary_ctxt_switches"} = 6
2023-10-25T07:06:42.518438676Z process_context_switches_total{name="fluent-bit",pid="406585",context_switch_type="nonvoluntary_ctxt_switches"} = 2
2023-10-25T07:06:42.518438676Z process_thread_cpu_seconds_total{name="fluent-bit",threadname="flb-pipeline",thread_id="406586",mode="user"} = 0
2023-10-25T07:06:42.518438676Z process_thread_cpu_seconds_total{name="fluent-bit",threadname="flb-pipeline",thread_id="406586",mode="system"} = 0
2023-10-25T07:06:42.518438676Z process_thread_cpu_seconds_total{name="fluent-bit",threadname="flb-logger",thread_id="406587",mode="user"} = 0
2023-10-25T07:06:42.518438676Z process_thread_cpu_seconds_total{name="fluent-bit",threadname="flb-logger",thread_id="406587",mode="system"} = 0
2023-10-25T07:06:42.518438676Z process_thread_cpu_seconds_total{name="fluent-bit",threadname="flb-in-process_",thread_id="406588",mode="user"} = 0
2023-10-25T07:06:42.518438676Z process_thread_cpu_seconds_total{name="fluent-bit",threadname="flb-in-process_",thread_id="406588",mode="system"} = 0
2023-10-25T07:06:42.518438676Z process_thread_cpu_seconds_total{name="fluent-bit",threadname="flb-out-stdout.",thread_id="406589",mode="user"} = 0
2023-10-25T07:06:42.518438676Z process_thread_cpu_seconds_total{name="fluent-bit",threadname="flb-out-stdout.",thread_id="406589",mode="system"} = 0
2023-10-25T07:06:42.518438676Z process_thread_io_bytes_total{name="fluent-bit",threadname="flb-pipeline",thread_id="406586",iomode="read"} = 0
2023-10-25T07:06:42.518438676Z process_thread_io_bytes_total{name="fluent-bit",threadname="flb-pipeline",thread_id="406586",iomode="write"} = 0
2023-10-25T07:06:42.518438676Z process_thread_io_bytes_total{name="fluent-bit",threadname="flb-logger",thread_id="406587",iomode="read"} = 0
2023-10-25T07:06:42.518438676Z process_thread_io_bytes_total{name="fluent-bit",threadname="flb-logger",thread_id="406587",iomode="write"} = 0
2023-10-25T07:06:42.518438676Z process_thread_io_bytes_total{name="fluent-bit",threadname="flb-in-process_",thread_id="406588",iomode="read"} = 0
2023-10-25T07:06:42.518438676Z process_thread_io_bytes_total{name="fluent-bit",threadname="flb-in-process_",thread_id="406588",iomode="write"} = 0
2023-10-25T07:06:42.518438676Z process_thread_io_bytes_total{name="fluent-bit",threadname="flb-out-stdout.",thread_id="406589",iomode="read"} = 0
2023-10-25T07:06:42.518438676Z process_thread_io_bytes_total{name="fluent-bit",threadname="flb-out-stdout.",thread_id="406589",iomode="write"} = 0
2023-10-25T07:06:42.518438676Z process_thread_major_page_faults_total{name="fluent-bit",threadname="flb-pipeline",thread_id="406586"} = 0
2023-10-25T07:06:42.518438676Z process_thread_major_page_faults_total{name="fluent-bit",threadname="flb-logger",thread_id="406587"} = 0
2023-10-25T07:06:42.518438676Z process_thread_major_page_faults_total{name="fluent-bit",threadname="flb-in-process_",thread_id="406588"} = 0
2023-10-25T07:06:42.518438676Z process_thread_major_page_faults_total{name="fluent-bit",threadname="flb-out-stdout.",thread_id="406589"} = 0
2023-10-25T07:06:42.518438676Z process_thread_minor_page_faults_total{name="fluent-bit",threadname="flb-pipeline",thread_id="406586"} = 86
2023-10-25T07:06:42.518438676Z process_thread_minor_page_faults_total{name="fluent-bit",threadname="flb-logger",thread_id="406587"} = 1
2023-10-25T07:06:42.518438676Z process_thread_minor_page_faults_total{name="fluent-bit",threadname="flb-in-process_",thread_id="406588"} = 39
2023-10-25T07:06:42.518438676Z process_thread_minor_page_faults_total{name="fluent-bit",threadname="flb-out-stdout.",thread_id="406589"} = 3
2023-10-25T07:06:42.518438676Z process_memory_bytes{name="fluent-bit",pid="406585",ppid="8832",type="virtual_memory"} = 278450176
2023-10-25T07:06:42.518438676Z process_memory_bytes{name="fluent-bit",pid="406585",ppid="8832",type="rss"} = 3412
2023-10-25T07:06:42.518438676Z process_open_filedesc{name="fluent-bit",pid="406585",ppid="8832"} = 52
2023-10-25T07:06:42.518438676Z process_fd_ratio{name="fluent-bit",pid="406585",ppid="8832"} = 0.00079346913862821393
2023-10-25T07:06:42.518438676Z process_start_time_seconds{name="fluent-bit",pid="406585",ppid="8832"} = 1698217597
2023-10-25T07:06:42.518438676Z process_num_threads{name="fluent-bit",pid="406585",ppid="8832"} = 5
2023-10-25T07:06:42.518438676Z process_states{name="fluent-bit",pid="406585",ppid="8832",state="R"} = 0
2023-10-25T07:06:42.518438676Z process_states{name="fluent-bit",pid="406585",ppid="8832",state="S"} = 1
2023-10-25T07:06:42.518438676Z process_states{name="fluent-bit",pid="406585",ppid="8832",state="D"} = 0
2023-10-25T07:06:42.518438676Z process_states{name="fluent-bit",pid="406585",ppid="8832",state="Z"} = 0
2023-10-25T07:06:42.518438676Z process_states{name="fluent-bit",pid="406585",ppid="8832",state="T"} = 0
2023-10-25T07:06:42.518438676Z process_states{name="fluent-bit",pid="406585",ppid="8832",state="I"} = 0
2023-10-25T07:06:37.681669047Z process_thread_wchan{name="fluent-bit",pid="406585",wchan="ep_poll"} = 1
2023-10-25T07:06:42.518438676Z process_thread_wchan{name="fluent-bit",pid="406585",wchan="hrtimer_nanosleep"} = 1
[2023/10/25 16:06:43] [debug] [out flush] cb_destroy coro_id=0
[2023/10/25 16:06:43] [trace] [coro] destroy coroutine=0x7f8800001040 data=0x7f8800001060
[2023/10/25 16:06:43] [trace] [engine] [task event] task_id=0 out_id=0 return=OK
[2023/10/25 16:06:43] [debug] [task] destroy task=0x7f880c020790 (task_id=0)
^C[2023/10/25 16:06:43] [engine] caught signal (SIGINT)
[2023/10/25 16:06:43] [trace] [engine] flush enqueued data
[2023/10/25 16:06:43] [ warn] [engine] service will shutdown in max 5 seconds
[2023/10/25 16:06:43] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] thread pause instance
[2023/10/25 16:06:44] [ info] [engine] service has stopped (0 pending tasks)
[2023/10/25 16:06:44] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] thread pause instance
[2023/10/25 16:06:44] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2023/10/25 16:06:44] [ info] [output:stdout:stdout.0] thread worker #0 stopped
[2023/10/25 16:06:44] [debug] [input:process_exporter_metrics:process_exporter_metrics.0] thread exit instance
  • Attached Valgrind output that shows no leaks or memory corruption was found
$ valgrind --leak-check=full bin/fluent-bit -i process_exporter_metrics -p 'process_include_pattern=/memcheck-amd64/' -o stdout -vv
<snip>
==406479== 
==406479== HEAP SUMMARY:
==406479==     in use at exit: 0 bytes in 0 blocks
==406479==   total heap usage: 138,480 allocs, 138,480 frees, 11,935,269 bytes allocated
==406479== 
==406479== All heap blocks were freed -- no leaks are possible
==406479== 
==406479== For lists of detected and suppressed errors, rerun with: -s
==406479== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

$ valgrind --leak-check=full  bin/fluent-bit -i process_exporter_metrics -p 'process_exclude_pattern=/chrome|kworker|firefox|gsd|Co/' -o stdout -vv
<snip>
==406758== 
==406758== HEAP SUMMARY:
==406758==     in use at exit: 0 bytes in 0 blocks
==406758==   total heap usage: 1,318,556 allocs, 1,318,556 frees, 198,553,082,179 bytes allocated
==406758== 
==406758== All heap blocks were freed -- no leaks are possible
==406758== 
==406758== For lists of detected and suppressed errors, rerun with: -s
==406758== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

$ valgrind bin/fluent-bit -i process_exporter_metrics -o stdout -vv
==407194== 
==407194== HEAP SUMMARY:
==407194==     in use at exit: 0 bytes in 0 blocks
==407194==   total heap usage: 4,838,135 allocs, 4,838,135 frees, 1,770,625,735,478 bytes allocated
==407194== 
==407194== All heap blocks were freed -- no leaks are possible
==407194== 
==407194== For lists of detected and suppressed errors, rerun with: -s
==407194== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

fluent/fluent-bit-docs#1198

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@cosmo0920 cosmo0920 temporarily deployed to pr September 19, 2023 07:20 — with GitHub Actions Inactive
@cosmo0920 cosmo0920 temporarily deployed to pr September 19, 2023 07:20 — with GitHub Actions Inactive
@cosmo0920 cosmo0920 temporarily deployed to pr September 19, 2023 07:20 — with GitHub Actions Inactive
@cosmo0920 cosmo0920 temporarily deployed to pr September 19, 2023 07:49 — with GitHub Actions Inactive
@cosmo0920 cosmo0920 force-pushed the cosmo0920-implement-process_exporter_metrics branch from a15854b to 8b7116f Compare September 19, 2023 09:37
@cosmo0920 cosmo0920 temporarily deployed to pr September 19, 2023 09:38 — with GitHub Actions Inactive
@cosmo0920 cosmo0920 temporarily deployed to pr September 19, 2023 09:38 — with GitHub Actions Inactive
@cosmo0920 cosmo0920 temporarily deployed to pr September 19, 2023 09:38 — with GitHub Actions Inactive
@cosmo0920 cosmo0920 temporarily deployed to pr September 19, 2023 10:09 — with GitHub Actions Inactive
@edsiper
Copy link
Member

edsiper commented Oct 16, 2023

@cosmo0920 I think we had a conversation around this feature but don't remember the details. Since official node exporter exposes this feature through a processes collector, why we are doing it in a separate plugin ?

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Oct 16, 2023

This is because the official node_exporter provides proceeses metrics as system level of process metrics. This plugin intentds to provide process level of metrics. For example, this needs to observe each of fluent-bit processes' metrics (memory, threads, cpu seconds etc).
Also, the process level of metrics are provided as a 3rd party exporter: https://github.com/ncabatoff/process-exporter

@agup006
Copy link
Member

agup006 commented Oct 17, 2023

@cosmo0920 is there a way to filter out what metrics are collected with each process? For example, only collecting CPU / Memory for all processes?

@cosmo0920
Copy link
Contributor Author

@cosmo0920 is there a way to filter out what metrics are collected with each process? For example, only collecting CPU / Memory for all processes?

Currently, there is no way to filter out each of metrics with CPU, memory, threads and so on. Should we implement such feature?

@cosmo0920 cosmo0920 force-pushed the cosmo0920-implement-process_exporter_metrics branch from 8b7116f to e86b866 Compare October 25, 2023 06:13
@cosmo0920 cosmo0920 temporarily deployed to pr October 25, 2023 06:14 — with GitHub Actions Inactive
@cosmo0920 cosmo0920 temporarily deployed to pr October 25, 2023 06:14 — with GitHub Actions Inactive
@cosmo0920 cosmo0920 temporarily deployed to pr October 25, 2023 06:14 — with GitHub Actions Inactive
@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Oct 25, 2023

I understand the requirements of filtering with the types of process metrics. I added turning on/off parameter for:

  • cpu
  • I/O
  • memory
  • state
  • context_switches
  • fd
  • start_time
  • thread_wchan
  • thread

@cosmo0920 cosmo0920 temporarily deployed to pr October 25, 2023 06:44 — with GitHub Actions Inactive
@edsiper edsiper merged commit cf4db83 into master Nov 8, 2023
43 of 44 checks passed
@edsiper edsiper deleted the cosmo0920-implement-process_exporter_metrics branch November 8, 2023 13:29
franciscovalentecastro pushed a commit to franciscovalentecastro/fluent-bit that referenced this pull request Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Process metrics for Linux
3 participants