Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory consumption optimizations #356

Merged
merged 10 commits into from
Jun 24, 2024

Conversation

pl0xz0rz
Copy link
Contributor

@pl0xz0rz pl0xz0rz commented Jun 5, 2024

Relates to #355

This should reduce the memory used for ND histogram bin caching when most but not all points are selected (32 bits saved per selected point per histogram) and also reduce the memory used for intersection filters (use actual packed bitmasks instead of a 32 bit integer for each point)

To be added: more similar optimizations to reduce memory consumption

In test with multi weights, 5e5 points, 4 columns:
Before: 256 MB
After: 235 MB
After using regular expression to remove unused variables from funCustom: 215 MB
Now columns aren't a bottleneck in the test anymore, but still should be a bottleneck in realistic use case with 100+ columns

However, computing histograms is slower now
5e5 points
Before: 57ms
After: 100ms

@miranov25
Copy link
Owner

Tests were OK, but in realistic use case we see that all columns are cached which takes all of them expanded.
@pl0xz0rz -see my realistic test:
/lustre/alice/users/miranov/NOTES/alice-tpc-notes2/JIRA/ATO-650/perfScanSecITSW.html
As a consequence files wich when compressed has a size 260 MBy after reading and loadin custom function - maory consumption gos to the 2.8 GBy.
In the console, we can see that all columns were expanded, while only subset should be used

@miranov25
Copy link
Owner

Hi @pl0xz0rz,

Thanks for the update!

I've been testing memory consumption in a realistic scenario: dEdx calibration. For the compressed file "perfdEdx.html" (161 MB, last modified June 24, 2024, 10:44 AM), Chrome reports a significant improvement – memory usage is down to 450 MB, compared to the previous ~1.5 GB for similar files.

Potential Memory Usage Report:

I believe we can combine this information with the recent caching and uncaching changes (addressing user-defined cache columns can be a separate task) to create a comprehensive memory usage report.

What do you think? Should I merge now and further development will be done in the next full request?

@miranov25
Copy link
Owner

Mergin. Further development for the #355 and #358 will follow in the next pull request

@miranov25 miranov25 merged commit 515b739 into miranov25:master Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants