Reduce memory usage #2

PotatoesFall · 2024-06-04T21:07:59Z

Hi!

Hope it's cool if I made a PR, I really liked this tool but I noticed it keeps entropy information about all lines for a given file in memory and does a huge sort at the end.

Proposed change

I propose this change, which means we never keep more lines in memory than necessary.

The Entropies struct keeps the top n lines sorted in a slice.

Testing with a medium sized repository, I noticed the old version got all the way up to 1G memory consumption, after the change it doesn't even show up in my top 100 memory consumption programs.

Execution time

While I didn't see an improvement in execution time for my testing, this version does get rid of the large sort towards the end.

If we want to cut down execution time in the future, it may be wise to make an individual list per file and/or per directory (as before), and merge these together when done with each file/directory, to reduce locking on the main Entropies struct.

For very large values of -top, we could possibly also get very small performance gains using a max heap vs a slice, but that's probably premature optimization at this point.

EwenQuim · 2024-06-04T22:05:02Z

Thanks for your contribution!

I'll look at it tomorrow ;)

main.go

Currently, any change in the source code requires a rerun of `go mod download`. This commit moves that step earlier so that only changes in the dependencies require a rebuild of that layer.

EwenQuim · 2024-06-10T11:45:28Z

Can you please resolve conflicts? Thanks!

EwenQuim · 2024-06-20T18:32:09Z

That is some very high quality PR @PotatoesFall, thank you!

Some tests needed to be updated though, I did it and merged it.

PotatoesFall added 2 commits June 4, 2024 22:40

reduce memory usage

28f2e45

make memory safe

3cfe04c

AlexanderYastrebov reviewed Jun 4, 2024

View reviewed changes

main.go Outdated Show resolved Hide resolved

use binary search for entropy tracking

be572fb

AlexanderYastrebov reviewed Jun 5, 2024

View reviewed changes

main.go Show resolved Hide resolved

EwenQuim force-pushed the master branch 3 times, most recently from 5272ed5 to 975e612 Compare June 5, 2024 12:05

reorder build steps to improve layer caching

a64e5a9

Currently, any change in the source code requires a rerun of `go mod download`. This commit moves that step earlier so that only changes in the dependencies require a rebuild of that layer.

Merge remote-tracking branch 'origin/master' into memory

8d198c9

EwenQuim changed the base branch from master to reduce-memory-usage June 20, 2024 18:15

EwenQuim merged commit b885946 into EwenQuim:reduce-memory-usage Jun 20, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage #2

Reduce memory usage #2

PotatoesFall commented Jun 4, 2024 •

edited

Loading

EwenQuim commented Jun 4, 2024

EwenQuim commented Jun 10, 2024

EwenQuim commented Jun 20, 2024

Reduce memory usage #2

Reduce memory usage #2

Conversation

PotatoesFall commented Jun 4, 2024 • edited Loading

Proposed change

Execution time

EwenQuim commented Jun 4, 2024

EwenQuim commented Jun 10, 2024

EwenQuim commented Jun 20, 2024

PotatoesFall commented Jun 4, 2024 •

edited

Loading