-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug: write memory profile if heap exceeds threshold #819
Conversation
Leaving this as a draft for now, since I need to double-check some the concurrency logic. And I'm not sure it will be useful now that we have GCP profiling for I took it for a spin while indexing top alloc_space
top inuse_space
|
Our use of git is quite vanilla, I wonder if we should instead rely on just shelling out to git? Maybe there is a wrapper which just relies on things like |
b29d86f
to
d95346c
Compare
d95346c
to
4111b90
Compare
@keegancsmith @stefanhengl this is ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Left a question just for my understanding.
if idx%10_000 == 0 { | ||
b.CheckMemoryUsage() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check based on doc count and not, for example, time based? If it's time based we wouldn't have to check in two places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! That approach could definitely work. What I liked about this: it gives fine-grained control over when we check memory. So we can check exactly when we are most concerned about an impending OOM, to maximize the chance we'll get useful data. In fact I just added another check after my latest round of research into memory usage (I found that loading the list of files to index can be very expensive).
This PR adds adds a debugging flag to periodically check memory usage against a threshold. If it exceeds the threshold, then a memory profile like
indexmemory.prof.1
is written to disk. No more than 10 profiles will be written.I've already found this more useful than the existing
-memprofile
flag, so I removed that. It's hard to get insights using that flag, since it only takes a single profile per shard, forces GC, and forces parallelism to 1.