-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow retire_workers to run concurrently #8056
Conversation
FWIW I believe we also have pretty good consensus to nuke replicate/rebalance. Happy either way (from my POV this was actually less work but I'm happy if somebody comes around deleting code) |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 20 files - 1 20 suites - 1 9h 36m 3s ⏱️ - 1h 16m 47s For more details on these failures, see this check. Results for commit 9b63ce6. ± Comparison against base commit b14f6ae. ♻️ This comment has been updated with latest results. |
181b822
to
2eda738
Compare
I like this, but I think the synchronisation primitive is complex and hard to read enough to warrant a bit of encapsulation. I'm refactoring it now. |
276d15e
to
b8d1065
Compare
ready for review (assuming green tests) |
@fjetter all review comments have been addressed; ready for review again |
LGTM (I cannot approve since I'm the original author) |
Closes #8053
The added test is rather superficial but it is simple and conveys what we want to work. There is also a little heavier test already available that does a similar thing
distributed/distributed/tests/test_active_memory_manager.py
Lines 1230 to 1269 in 9a9d2c3
running those locally didn't reveal any problems. if CI is not sadder than usual, I'm fine with this
Regarding the conditions: This will prohibit running replicate/rebalance while retire_workers is running and will prohibit running anything else while replicate/rebalance is running.
The alternative is to nuke these APIs entirely but the way I see it this complexity is not too bad. I considered factoring this out but this abstraction would look aweful since the semantics we need here are hard to generalize. This just felt unnecessary. Happy to discuss either option.