-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza] make log emitter and entry conversion in adapter synchronous #35669
[pkg/stanza] make log emitter and entry conversion in adapter synchronous #35669
Conversation
Signed-off-by: Florian Bacher <[email protected]>
… log emitter Signed-off-by: Florian Bacher <[email protected]>
… log emitter Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
I am not very certain that the testbed benchmarks are a fair comparison for these changes, since they rely on so many external factors besides the adapter code itself. I left some suggestions about how we can update the adapter benchmarks to give us a better comparison.
Thanks for the thorough review @djaglowski and sorry for the late reply - I will go through all comments today and address them |
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR introduces a benchmark test for the receiver adapter which tests the performance of handling the log entries sent to the `LogEmitter`, which are then received and converted by the receiver adapter. The changes have been made based on the comment linked here: #35669 (comment) <!-- Issue number (e.g. #1234) or full URL to issue, if applicable. --> #### Link to tracking issue Part of #35453 <!--Describe what testing was performed and which tests were added.--> #### Testing Added additional benchmark tests --------- Signed-off-by: Florian Bacher <[email protected]>
Looks great, thank you @bacherfl! I believe this implements exactly what the issue #35453 describes. 🚀 Can we remove the Also, is this pull request still "work in progress and only a proof of concept" as described in the description? If yes, please let me know when it's ready for a "final" review (perhaps by switching to Draft and then back to non-draft when it's ready). |
Thanks @andrzej-stencel! Sure, I will remove the |
Regarding the performance, I did a comparison between the synchronous and the asynchronous adapter using the updated benchmark tests: Sync adapter:
Async adapter:
|
# Conflicts: # testbed/tests/scenarios.go
@andrzej-stencel One question regarding the removal of the converter: I've found that this is currently used in two places other than
In both cases it seems like the converter is used to generate test data, and can likely be replaced easily with the |
Signed-off-by: Florian Bacher <[email protected]>
I agree it should be possible to refactor those tests to remove the converter. Seems like a good followup PR. Regarding the benchmarks, do you mind running them multiple times and using
Based on the one run it looks promising but given the impact here I think we should have a more statistically significant comparison. If these are running quickly, it'd be interesting to see another order of magnitude too (not required though). |
Sure, will post the updated results soon |
Alright, here is the result of the
|
@andrzej-stencel @djaglowski I think this is ready for review now, since the removal of the converter type will be done as a follow up |
@bacherfl, this appears to show a ~60% increase in CPU consumption, but the earlier (single run) result showed just about the opposite. Just double checking, is it possible before/after were switched? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-ran the manual test "Run the collector binary reading a 1 GiB file" described in detail in #35454 (comment), comparing v0.112.0 to this PR. Surprisingly, the results say pretty consistently that this change decreases the time for File Log receiver to read a 1 GB file from ~17 seconds to ~14-15 seconds. This is about 15% improvement in performance!
@djaglowski this does seem a bit odd, I will double check the results again I just re-ran the tests and got the same results - However I noticed that I posted the |
Co-authored-by: Andrzej Stencel <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on @bacherfl.
The performance hit is unfortunate but it appears necessary to increase reliability (and maintainability).
Description
This PR changes the
LogEmitter
to accept a synchronous consumer callback function for processing a batch of log entries as an alternative to sending log entry batches to a channel.The components that use the
LogEmitter
(adapter and parser) have been adapted accordingly. Also, in the case of the adapter, the log entries are converted directly, rather than sending them over a channel to the converter and receiving the converted results over a different channel.Link to tracking issue
Fixes #35453
Testing
I did some initial performance tests using the
TestLogLargeFiles
load test to see how this change affects the performance. Below are the results:Before the change (i.e. with async log entry batch processing)
After the change (i.e. with sync log entry batch processing)
Those results seem to indicate comparable throughput, but with an increased resource consumption, especially in terms of memory.
I also did a test comparing the performance between the synchronous and asynchronous log emitter using the same methodology as in #35454.
The results were the following, and indicate an increase in the time it takes for reading the generated log file (see #35454 for details on how the file is generated and the test execution):
output-async.log
=== Step 3: Thu Oct 10 08:54:23 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 2.209674e+06 === Step 4: Thu Oct 10 08:54:25 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 5.428103e+06 === Step 5: Thu Oct 10 08:54:26 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 7.337017e+06 === Step 6: Thu Oct 10 08:54:27 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 9.258843e+06 === Step 7: Thu Oct 10 08:54:29 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.3082428e+07 === Step 8: Thu Oct 10 08:54:31 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.6519068e+07output-sync.log
=== Step 2: Thu Oct 10 08:51:27 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.580891e+06 === Step 3: Thu Oct 10 08:51:28 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 3.01034e+06 === Step 4: Thu Oct 10 08:51:29 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 4.434627e+06 === Step 5: Thu Oct 10 08:51:31 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 7.416612e+06 === Step 6: Thu Oct 10 08:51:34 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.0496072e+07 === Step 7: Thu Oct 10 08:51:36 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.3523882e+07 === Step 8: Thu Oct 10 08:51:37 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.4929707e+07 === Step 9: Thu Oct 10 08:51:39 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.6519105e+07