-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge fixture to test slown down #62
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #62 +/- ##
==========================================
- Coverage 94.28% 93.57% -0.71%
==========================================
Files 43 43
Lines 5582 6071 +489
Branches 5582 6071 +489
==========================================
+ Hits 5263 5681 +418
- Misses 296 360 +64
- Partials 23 30 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Since I am having issuer with perf, here is the sample code I am using:
Here is the output:
|
Caching the jobstats gives us a very low response time:
The memory stays stable at ~2.3Gwith peaks at ~6GB when processing jobstats:
|
Using latest commit, I was able to divide the parsing time by ~3:
It relies on zero-copy parser from combine, https://docs.rs/combine/latest/combine/parser/range/index.html The memory usage seems to be the same but I will need to test the previous implementation with the same script:
Please find attached the samply profile from latest commit. From the flamegraph, it seems most of the time is now spent deserializing the YAML blob. Next steps are: |
0dfdd69
to
35a1320
Compare
Sub 10s (on my test system) with the latest commit:
On a real system
|
Good improvement. Is this compiled in release mode? Also, what does current memory usage look like? |
Yes, actually it has to be compiled in release mode in order to work. There is a slight problem in Debug mode. The memory consumption is down to 440 to 650MB which comes down to reading the source file into memory. |
Which source file?
|
In case of our fixture the read_to_string() function which reads the ds86.txt
Exactly, Raphael thought about using the official prometheus crate which supports writing in a stream. |
Link to method for consideration? |
https://docs.rs/prometheus-client/latest/prometheus_client/encoding/text/fn.encode.html |
Caching of jobstats metrics is mandatory:
|
Here are the results of the latest commit running on a live system. With jobstats enabled
After some time (~10s) the jobstats cache is warm and the exporter start outputting jobstats metrics. With jobstats disabled
The average response time is ~0.02s. Summary
|
let mgs_fs_handle = thread::spawn(move || -> Result<Vec<Record>, LustreCollectorError> { | ||
let lctl_output = get_lctl_mgs_fs_output()?; | ||
let lctl_record = parse_mgs_fs_output(&lctl_output)?; | ||
|
||
Ok(lctl_record) | ||
}); | ||
|
||
let lnetctl_stats_handle = | ||
thread::spawn(move || -> Result<Vec<Record>, LustreCollectorError> { | ||
let lnetctl_stats_output = get_lnetctl_stats_output()?; | ||
let lnetctl_stats_record = parse_lnetctl_stats(str::from_utf8(&lnetctl_stats_output)?)?; | ||
|
||
Ok(lnetctl_stats_record) | ||
}); | ||
|
||
let recovery_status_handle = | ||
thread::spawn(move || -> Result<Vec<Record>, LustreCollectorError> { | ||
let recovery_status_output = get_recovery_status_output()?; | ||
let recovery_statuses = parse_recovery_status_output(&recovery_status_output)?; | ||
|
||
Ok(recovery_statuses) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed thread::spawn
as the results is now referencing a variable local to the thread.
I tried to use Arc
without success.
@jgrund any idea?
No longer relevant. |
Draft PR that includes a big fixture from jobstats to investigate performance slow down.