-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPC invocation_id conflict #103
Comments
I think a simple solution is to have a set holding all the pending |
Another option is to use a sequencer, which is easier to manage than a set. |
I agree that the results_map in a RemoteInvoker should be garbage-collected once each invocation has completed; this seems like an oversight. Also, I don't see why invocation IDs need to be random; they could just as easily be sequential, since they don't have to be globally unique (just unique within a RemoteInvoker instance). You might want to check with Matthew about why he designed it that way, though. |
After discussions with Weijia, we decided to garbage collect the relevant data structures when all replies are available and the user has destroyed the reply objects. This is not very urgent right now as Weijia's fix to use a sequencer for invocation_ids allows him to dodge the bug. |
All we need is unique, not random. So as long as the sequencing is globally unique you should be fine. Also we should tie the lifetime of the entry in that table to the lifetime of the results object we give the programmer. No need for invasive GC when we can track the actual usage |
Why do we need a globally unique invocation_id? The target of the RPC call should know the caller's node id and the RPC signature. I had thought it should be fine as long as the invocation_id is unique in such a domain. Sagar and I had a sketch solution for GC like this: in the destructor of class |
That makes sense for the sketch. I'm actually super sick right now so I genuinely can't remember why I think there needs to be uniqueness |
The reply has to be matched with the specific call.
…Sent from my iPhone
On Feb 22, 2019, at 9:39 PM, Weijia Song <[email protected]<mailto:[email protected]>> wrote:
Why do we need a globally unique invocation_id? The target of the RPC call should know the caller's node id and the RPC signature. I had thought it should be fine as long as the invocation_id is unique in such a domain.
Sagar and I had a sketch solution for GC like this: in the destructor of class QueryResuts, we can test if the promises are all fulfilled. If they are, clear it; otherwise, we set a tombstone in the corresponding PendingResults entry in the map.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#103 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AWDC76mAS8f7Z3d3qjWN84IGECcPddyeks5vQKnDgaJpZM4bIfY->.
|
What about this: if the caller were to send the address of a location at which the RPC target can place the result, and this is in a region of previously pinned memory enabled for one sided RDMA (in effect, in the SST), then the response can be written directly to the desired region of client memory. Plus, since that address would be playing a dedicated role while the RPC is underway, it can also serve as the unique request id!
On Feb 22, 2019, at 9:39 PM, Weijia Song <[email protected]<mailto:[email protected]>> wrote:
Why do we need a globally unique invocation_id? The target of the RPC call should know the caller's node id and the RPC signature. I had thought it should be fine as long as the invocation_id is unique in such a domain.
Sagar and I had a sketch solution for GC like this: in the destructor of class QueryResuts, we can test if the promises are all fulfilled. If they are, clear it; otherwise, we set a tombstone in the corresponding PendingResults entry in the map.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#103 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AWDC76mAS8f7Z3d3qjWN84IGECcPddyeks5vQKnDgaJpZM4bIfY->.
|
Oh that is pretty good actually --- it would certainly be unique enough, and it would entirely bypass the existing RDMC-style buffer management. Since we're overhauling that anyway, it makes sense to me to take advantage of these things! Re: Weijia's question: as of the last time I had a hand in this, the actual runtime process that ships around results to invocations is totally type-erased; the result datagram is only addressed based on the invocation ID, not based on the receiver types at all. This is why they needed to be globally unique; it would be enough for them to be unique within a single RDMC/top-level group, but so far we've only ever had one of those per process anyway. |
It turns out my fix had an issue. On receiving a reply message, the p2p message loop is going to set the returned value in RemoteInvoker::results_map[invocation_id]. At the same time, application thread may be calling p2p_send() and inserting a new entry to RemoteInvoker::results_map. Since the p2p message loop had given up the lock on the results_map, those two things will happen concurrently. However, In commit 6b7c1ac, I gave up on using |
When we construct an RPC message, we use a random generated from Matt's library(
mutils::long_rand()
) asinvocation_id
, which index into theresults_map
holding the promises of the RPC call. In objectstore performance benchmark, I found duplicateinvocation_id
s within ~50K RPC calls, throwing anstd::future already retrieved
exception and then crashing. We need to fix this with non-conflictinginvocation_id
. And we need reclaim the usedinvocation_id
- Don't leave used one-time garbage there forever!Code location:
send_return send(...)
@derecho/remote_invocable.h:101
.The text was updated successfully, but these errors were encountered: