Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix osrf.py_common.process_utils.get_loop() implementation #70

Merged
merged 2 commits into from
Jan 12, 2021

Conversation

hidmic
Copy link
Collaborator

@hidmic hidmic commented Jan 12, 2021

On Windows, avoid loops closing during garbage collection and reuse existing ones if possible.

Connected to ros2/launch#476.

On Windows, avoid loops closing during garbage collection
and reuse existing ones if possible.

Signed-off-by: Michel Hidalgo <[email protected]>
@hidmic hidmic requested a review from wjwwood January 12, 2021 19:25
Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a naive view, it looks like this shouldn't be necessary; the first time we come in here, the TLS loop_has_been_setup will be False, we'll unconditionally create the ProactorEventLoop, and then set loop_has_been_setup to True. Subsequent invocations should always just use the earlier get_event_loop. Could you explain a bit more why this is needed?

@hidmic
Copy link
Collaborator Author

hidmic commented Jan 12, 2021

CI up to launch, test_launch_ros, test_communication, and osrf_pycommon:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@hidmic
Copy link
Collaborator Author

hidmic commented Jan 12, 2021

From a naive view, it looks like this shouldn't be necessary; the first time we come in here, the TLS loop_has_been_setup will be False, we'll unconditionally create the ProactorEventLoop, and then set loop_has_been_setup to True. Subsequent invocations should always just use the earlier get_event_loop. Could you explain a bit more why this is needed?

Thing is, the event loop for the current thread may have already been set (e.g. through regular asyncio API). On Unix, we simply use the one set. But on Windows, we were always overriding it. If my analysis is correct (and I believe so, because I don't get hangs anymore), this creates a fun scenario if the event loop that was previously set was an asyncio.ProactorEventLoop itself. Once we override it, if there are no references to it, it'll eventually be garbage collected. When it is garbage collected, and Python's garbage collector runs in the main thread, that loop gets closed, and boom! Signal wakeups are disabled, affecting the other asyncio.ProactorEventLoop instance that is running.

IMHO, this is both a bug in asyncio and a not so adequate use of asyncio in launch. If you look at the upcoming deprecations in asyncio, it's clear event loops are meant to be global, at most one-time set instances.

@ivanpauno
Copy link

When it is garbage collected, and Python's garbage collector runs in the main thread, that loop gets closed, and boom! Signal wakeups are disabled, affecting the other asyncio.ProactorEventLoop instance that is running.

omg, wonderful finding 😅

@clalancette
Copy link
Contributor

Thing is, the event loop for the current thread may have already been set (e.g. through regular asyncio API). On Unix, we simply use the one set. But on Windows, we were always overriding it. If my analysis is correct (and I believe so, because I don't get hangs anymore), this creates a fun scenario if the event loop that was previously set was an asyncio.ProactorEventLoop itself. Once we override it, if there are no references to it, it'll eventually be garbage collected. When it is garbage collected, and Python's garbage collector runs in the main thread, that loop gets closed, and boom! Signal wakeups are disabled, affecting the other asyncio.ProactorEventLoop instance that is running.

Wow, deep stuff. Good find. So am I right in thinking that the main fix here is the close of the old loop and the open of a new ProactorEventLoop under Windows? If so, it might be worthwhile to put a comment explaining that, just because it is non-obvious from the context.

@wjwwood
Copy link
Member

wjwwood commented Jan 12, 2021

I didn't know the context for the issue. I just assumed the closing in the gc might be an issue if it is actually a singleton but didn't ask after it. Nice to hear the explanation. I agree a comment detailing why this code is like this would be useful.

Signed-off-by: Michel Hidalgo <[email protected]>
@hidmic
Copy link
Collaborator Author

hidmic commented Jan 12, 2021

@hidmic hidmic merged commit 6346ca9 into master Jan 12, 2021
@hidmic hidmic deleted the hidmic/fix-asyncio-get-loop branch January 12, 2021 21:49
hidmic added a commit that referenced this pull request Jun 16, 2021
On Windows, avoid loops closing during garbage collection
and reuse existing ones if possible.

Signed-off-by: Michel Hidalgo <[email protected]>
hidmic added a commit that referenced this pull request Jun 29, 2021
On Windows, avoid loops closing during garbage collection
and reuse existing ones if possible.

Signed-off-by: Michel Hidalgo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants