Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

margo_finalize maybe not thread-safe ? #209

Open
mdorier opened this issue Sep 14, 2022 · 0 comments
Open

margo_finalize maybe not thread-safe ? #209

mdorier opened this issue Sep 14, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@mdorier
Copy link
Contributor

mdorier commented Sep 14, 2022

I'm trying to fix an issue in some code where I see margo_destroy failing in RPC handlers towards the end of the run, when margo_finalize has been called. Normally we track the number of pending RPCs so that, if finalize is called, the actual cleanup is delegated to last pending RPC. However in my code, some of the margo_destroy calls are failing with an HG_OTHER_ERROR, indicating that margo has probably been finalized already.

One thing that we need for sure is a version of margo_finalize that blocks until it is actually finalized (margo_wait_for_finalize is not meant for that, as it does not actually request finalization). But beyond this, I think margo_finalize may have thread-safety issues, in particular here. We could imagine the following scenario:

  • Main ES calls margo_finalize, enters the locked section to get pending = 1, for instance, and reaches this line.
  • At this exact moment, the pending RPC completes, calls its post-wrapper hooks and gets to this line, where it doesn't see that finalize was request (because it hasn't yet been requested by the main ES), so it completes without re-calling margo_finalize.
  • The main ES proceeds to this line, and returns. At this point, no one will be effectively finalizing margo.

The problem I'm seeing in my code seems to be the opposite, where margo is finalized before all the RPC have truly completed, but I'm yet to find how this can happen. At any rate, I would like some thoughts on at least the problem above.

@mdorier mdorier self-assigned this Feb 28, 2023
@mdorier mdorier added the bug Something isn't working label Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant