-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Realm event cancellation increases memory usage indefinitely for long running systems #1812
Comments
Hi @serkanturgut. Thank you for the very comprehensive write up of the issue. I think your diagnosis of the problem is correct given the current Realm implementation.
To date this is accurate. For many users tracking event lifetimes is really hard (especially in distributed systems) and Realm tries to provide the illusion that you don't have to do that at the moment. We don't currently have a way to say that you can free up an event and never refer to it again. We effectively allow you to query the status of any event indefinitely, which as you have noticed, can lead to a very small memory "leak" where generational event data structures continue to accumulate and become immutable after they've exhausted their generations. I'll say this is more an artifact of the implementation and the origins of Realm's use in HPC (where jobs are usually 24-48 hours) and not necessarily something inherent to the Realm programming model. We've probably over-fit a bit in our current implementation because of this. More on this in a moment.
For most Realm applications, cancellations are pretty rare (e.g., in Legion predicated execution often resolves
A comment and three questions: I think we could introduce such a feature, but I think we would want it to be at the granularity of events and not the underlying generational event data structure to avoid exposing implementation details (existence of generational event data structures) through the API. Let's say we instead had an API where you could call
At the moment, I agree with you, but I don't think it has to be this way. As mentioned earlier, I think we've probably overfit for HPC a bit, but the programming model is amenable to such long running executions and I believe can be easily modified to run indefinitely, presuming the client is willing to take on some responsibility for recycling events. |
Hi @lightsighter thank you for your detailed response and happy new year!
I understand, things made more sense now, I also reread your original paper now I have a better understanding of the event lifetimes.
I think this was a mistake on our side which is designing computations around cancellation actions (we use that as a signal to waiters that particular operation completed but there was an abnormal completion of the task). However, there is a tight coupling with cancellation actions therefore refactoring the application code poses a big risk, that's why I was exploring the alternatives.
I think at the event instance level fits best for our use case. Let me elaborate the use case a bit. Let's say the system has 2 computations in it, these 2 computations disjoint with each but they are running on the same Realm runtime in the same linux process. These 2 computations may or may not overlap in terms of their start and finish times. Every computation has 100+ events created, at the end of every computation, we know that we will never need to inquire these event generations ever again therefore we can potentially pass a list of events to Realm to say these event and their generations can be recycles.
I think that makes sense and implementation towards this model should be on the mainline of the project if anybody else need a feature like this but it sounds little bit more complex then what we need. Currently, we're only interested in recycling the events hit the cancellation limit but what you describe actually solves the problem for generation counters as well which is great. There is 20 bits for generation counter if I am not wrong, so every event data structure can be re-used 1M times (without cancellation), every DT node is 36MB, event data structure size is 536 bytes, so it can host 70K event data structures every one of which can be reused 1M times. Let's just say every computation takes 100 events, that means we have ~700M computations runway before DT is forced to allocate another block of 36MB.
Yes, we can safely guarantee that "free" called only once for that event generation. Thanks to RAII :)
We can track such dependencies however our current use of Realm, we don't necessarily need such complexity with the "free" API. There is no merging or preconditions once we know that a computation finished. We actually prototyped something like following and tested, I think it lacks some sanity asserts such as checking whether the event already recycled with
|
I actually think that is a fine reason to use event poisoning. We've just overfit again a bit in the implementation for what we needed from Legion. There's no fundamental reason that you shouldn't be able to have lots of cancelled events.
That makes sense.
By all means you should do whatever is simplest and keeps your work progressing for now. I think we'll try to prototype a more robust solution for the master branch of Realm, but that should not preclude you from hacking in whatever you need for now.
It's configurable in id.h (e.g. if you wanted to support fewer nodes since 2^16 nodes is a lot and use more bits for encoding generations for each generational event data structure you can do that), but yes that is the default configuration.
That envelope math looks right to me.
I mainly just wanted to make sure we had a clear programming model of what users would need to do before recycling events. It's just a bit tricky because you have to make sure all the operations that used the event as a precondition before the freeing of the event are also done running before the event can actually be freed. It's a non-obvious mental model so I just wanted to call attention to it. |
Also @serkanturgut would it be possible to have a meeting with you to hear more about your use case for Realm? I think the Realm team would value knowing about another use case and some of its requirements. We also have some news about Realm we'd like to share with you as well. If that's amenable to you, go ahead and shoot me an email at my address from the Realm paper and I'll respond back. |
Hi,
We have been investigating some memory leaks in our system that uses Realm runtime events and user-space threads heavily. As part of the computation logic, we have to issue cancellation on certain events. We found out that due to the design choices how event objects are created and recycled, every event has a limit of 16 poisoned generation limit, beyond that, the event object can't be recycled anymore.
Here is the simple program that reproduces the behavior we observed;
And I am attaching the malloc profile report (jemalloc) as a demonstration of unbounded memory usage.
Please only focus on DynamicTable expansion code paths and the event cancellation in the picture. Event cancellation triggers the allocations of the
poisoned_generations
array to record the poisoned generations and DynamicTable expansion basically due to the fact that after 16 poisons per event, it does not getfree
'd anymore so new event creation eventually allocated from newly create table leafs.I understand that it is a design choice that events have to keep a record of the poisoned generations to be able to answer to the inquiries whether a particular event generation was poisoned or not. If this is the case, I feel like that Realm runtime would be unsuitable for systems that rarely goes offline such as database systems.
In our integration with Realm, the events actually have a lifespan and we do not actually care about poisoned generations after the event life ended (by our system logic). So potentially we can keep track of cancelled events in our system and instruct Realm to force-free them to be recycled. However, to my understanding, Realm currently does not have a mechanism to "reset" an event object (
free
the poisoned_generations array) and reset poisoned generation number, oncemax_poisons = true;
object remains in this state forever (please confirm this).Would it be an acceptable approach to introduce "reset" API to event class to achieve such objective so the application can decide to end a lifespan of a particular event object?
I may be mistaken with my findings above, please let me know your thoughts and suggestions.
Thank you,
Serkan
The text was updated successfully, but these errors were encountered: