-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More dask features #959
More dask features #959
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM 👍
Is the DaskExecutor tested in the CI or did you test it manually?
Well, there were tests but the CI didn't run them. Now, they do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM 👍
@@ -328,7 +331,8 @@ def run_map(executor: cluster_tools.Executor) -> None: | |||
assert list(result) == [4, 9, 16] | |||
|
|||
for exc in get_executors(): | |||
run_map(exc) | |||
if not isinstance(exc, cluster_tools.DaskExecutor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this specific test excluded for the DaskExecutor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Futures of the DaskExecutor become invalid when the executor is closed. This makes this test invalid. I was thinking about removing this test or making this test fail for all executors (probably a bit of effort).
cc @philippotto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts:
- yes, we should strive for similar behavior between dask and slurm. --> (A) the futures should become invalid for the slurm context, too OR (B) (maybe I'd prefer this?) we wrap/copy the results in/into different future objects that survive the context termination. Or is there a benefit in letting the futures die? the copying could be done upon context exit.
- if we do (A), this would be a breaking change (and likely needs fixing in vx etc). therefore, I'd tackle this in a separate PR.
- either way, the test itself should not be removed without replacement. I think, what the test intends to assert is that the iterator that is returned by
map
contains futures that were kicked off before the iterator is consumed (read the comment here). essentially, this is covering an implementation detail, but the overall expected behavior is that that the map call eagerly submits all futures, but lazily awaits its results (so that they don't need to be in RAM all at once). the test exploits the use-futures-after-context-was-shutdown-behavior to test the eager submit (if it was not eager, the test would fail because the submit would be after context exit). if you remove that behavior, the eager submits should still be checked for in my opinion.
I hope this is somewhat comprehensible. If not, let's have a call 🤙
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe dask transfers the data lazily from the workers or scheduler. That doesn't work anymore, once the client closes. We could wrap the futures to eagerly collect the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could wrap the futures to eagerly collect the data.
Yes, either this, or try to hook into the closing client (so that collection is done in the last moment where it's possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave that for a followup and merge this as is. Ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure 👍
Description:
mem
,cpus
) to DaskExecutorClient
can be configured viaDASK_ADDRESS
env var (should we rename that?)Todos: