-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't pile up context_meter callbacks #7961
Conversation
# As this may trigger calls to _start_async_instruction for more tasks, | ||
# make sure we don't endlessly pile up context_meter callbacks by specifying | ||
# the same key as in _start_async_instruction. | ||
with ledger.record(key="async-instruction"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, this whole incident would have been prevented by
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 20 files ± 0 20 suites ±0 10h 39m 44s ⏱️ - 50m 50s For more details on these failures, see this check. Results for commit 6a595fe. ± Comparison against base commit 9b9f948. |
Well, this does affect real world problems as well regardless of root-ish settings. For instance, if you add a trivial dependency, nothing is classified as root-ish anymore and you are in the same worst-case situation, e.g. def inc(x, foo):
return x + 1
t0 = time()
N = 10_000
foo = c.submit(inc, 0, foo=0, pure=False)
futs = c.map(inc, range(N), foo=foo) (Note: this problem is also suffering a different problem, namely the tasks are all assigned to the same worker. I'm working on this problem in parallel but it also shows the executor overhead) I'm a bit disappointed that this was not detected by our benchmarks suite. I guess our examples are too simple. would you mind adding a parametrized version of |
You are hitting an edge case, where you should have two TaskGroups but you end up with only one because the dependency and the dependents accidentally share the same prefix AND the keys are not tuples. This causes the TaskGroup to have itself in its dependencies, which in turns causes If you were using dask.array, dask.dataframe, or dask.bag OR the dependency was using a different function, you'd have two different TaskGroups and all the tasks would be rootish (because there are less than 5 dependencies). We could fix this specific use case by changing Client.map to generate tuple keys.
I don't think it's a good idea. Queuing impacts way, way more than just this specific issue, so if we actually care the whole test suite should run with and without queuing. I believe there was implicit consensus that that would be overkill as users shouldn't tamper with that setting and it's only there to let users revert the change in a hurry in case we realize that we did a catastrophic design mistake that hamstrings a specific use case.
I disagree. I think that the benchmarks suite shows that this is a negligible problem if you don't go digging for edge cases (the big assumption here is that |
No point in arguing about this. This is a problem, it needs to be fixed. It is very possible to pile up tasks on workers, even with queuing.
I'm not willing to invest the time and money to run the entire test suite with this being parametrized. This specific test was added to measure task latency and it missed this regression. Disabling this config reinstates this measurement. |
Reduce overhead of fine performance metrics when there is a very long chain of task finished -> task started events on the worker.
Note that real-life impact is a lot more modest than what the benchmark would suggest, as you would only experience noticeable degradation when
neither of which should be common use cases.