You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to fix an issue in some code where I see margo_destroy failing in RPC handlers towards the end of the run, when margo_finalize has been called. Normally we track the number of pending RPCs so that, if finalize is called, the actual cleanup is delegated to last pending RPC. However in my code, some of the margo_destroy calls are failing with an HG_OTHER_ERROR, indicating that margo has probably been finalized already.
One thing that we need for sure is a version of margo_finalize that blocks until it is actually finalized (margo_wait_for_finalize is not meant for that, as it does not actually request finalization). But beyond this, I think margo_finalize may have thread-safety issues, in particular here. We could imagine the following scenario:
Main ES calls margo_finalize, enters the locked section to get pending = 1, for instance, and reaches this line.
At this exact moment, the pending RPC completes, calls its post-wrapper hooks and gets to this line, where it doesn't see that finalize was request (because it hasn't yet been requested by the main ES), so it completes without re-calling margo_finalize.
The main ES proceeds to this line, and returns. At this point, no one will be effectively finalizing margo.
The problem I'm seeing in my code seems to be the opposite, where margo is finalized before all the RPC have truly completed, but I'm yet to find how this can happen. At any rate, I would like some thoughts on at least the problem above.
The text was updated successfully, but these errors were encountered:
I'm trying to fix an issue in some code where I see
margo_destroy
failing in RPC handlers towards the end of the run, whenmargo_finalize
has been called. Normally we track the number of pending RPCs so that, if finalize is called, the actual cleanup is delegated to last pending RPC. However in my code, some of themargo_destroy
calls are failing with anHG_OTHER_ERROR
, indicating that margo has probably been finalized already.One thing that we need for sure is a version of
margo_finalize
that blocks until it is actually finalized (margo_wait_for_finalize
is not meant for that, as it does not actually request finalization). But beyond this, I thinkmargo_finalize
may have thread-safety issues, in particular here. We could imagine the following scenario:margo_finalize
, enters the locked section to get pending = 1, for instance, and reaches this line.margo_finalize
.The problem I'm seeing in my code seems to be the opposite, where margo is finalized before all the RPC have truly completed, but I'm yet to find how this can happen. At any rate, I would like some thoughts on at least the problem above.
The text was updated successfully, but these errors were encountered: