Remove idle and saturated sets from scheduler #8889
Draft
+82
−193
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an attempt at removing the
idle
/saturated
/idle_task_count
containers on the scheduler.check_idle_saturated
has to be evaluated.occupancy
which historically has put network load on equal footing to compute load which caused plenty of confusion in the past. It also makes reliable testing notoriously difficultWhile ripping this out I simplified the stealing code and eradicated a couple of sources of non-determinism
The most important change that is also causing some of the tests to fail is that in the current iteration I chose to define possible thieves on the basis of the number of threads that are available. Previously, all idle classified workers, i.e. all workers with less than half of average occupancy, were considered possible thieves. Therefore, this stealing code is much, much more conservative. That is much more reliable and predictable but is also much less aggressive and cannot enforce absolute homogeneity.
The lack of determinism often originate from the usage of sets or the lack of tie-breakers when sorting. I believe that even without removing the idle/saturated sets it makes sense to remove those sources of non-determinism. Particularly, since this can subtly also affect global task ordering.
I still have to run some actual tests on this. So far this is all rather theoretical