-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client.map() keys as a list breaks Dask queue system #8671
Comments
Well, I think this is neither a bug nor a feature. It's a bit of both I would say. Dask is internally relying on the structure of the keys and if you change this structure by providing your own keys, some assumptions are broken. if you do something like The task queuing unfortunately relies on task groups which is why this isn't working. |
This is a great bit of information, thank you! I like your solution to use I.e. if i have 1 worker with 200 cores
Then the worker will still be trying to process 200 from each queue and will be overloaded? What I was hoping for was the 2 sets of tasks to have different priorities, and be queued together, then all the highest priority tasks would be done first (from whichever set) |
Queuing is not exactly what you are imagining I think. The queuing refers to when we are submitting a task to a worker and not when that work is actually being executed. To impact the order of execution, there is the priority keyword. |
Ah I see I think I have misundertood that then. From what you are saying though, it sounds like the work is immediately scheduled to the least busy worker, even if there is technically no available resource on that worker. Is that right? If so is there any workaround to make it behave like the above? Essentially I need a prioritised queue, where in/out order doesn't really matter but priority does, and crucially, a higher priority task should be picked up regardless of the time of submission or if it has a different If no way to do that, then would ensuring all tasks have the same prefix achieve this behaviour? |
That is the default behaviour, yes. For most applications this should not concern you. This entire queuing thing was implemented to manage memory pressure for some very specific array workloads. In most situations a user will not care about this. There is also some logic that rebalances tasks between workers if some have too many tasks assigned while otehrs idle.
Forget all about "task queuing". "Task queuing" is an internal mechanism that users should rarely bother with and this is not what you are looking for. |
I think this was a key piece of information, thank you. So even if a worker has 'picked up' a task, it essentially does not mean it will be the one to run it (it can still be rebalanced). It also does not mean it will be run imminently (it can still be affected by other higher priority tasks). This is very useful clarification, I appreciate your time to explain it |
Describe the issue:
This could not be a bug, but if not then it is unclear to me from the docs why dask behaves like this.
When passing a list of values to client.map() to use as keys, the queueing system seems to break.
In the reproducible example:
Minimal Complete Verifiable Example:
The above produces the following output on the dashboard. Notice the number of jobs queued vs. processing:
Uncommenting the line
key=iterabs
produces the following instead. Notice queued=0:Anything else we need to know?:
Environment:
The text was updated successfully, but these errors were encountered: