Remove idle and saturated sets from scheduler #8889

fjetter · 2024-10-11T08:02:05Z

This is an attempt at removing the idle/saturated/idle_task_count containers on the scheduler.

They are expensive to maintain. They inherently rely on occupancy which is a quantity that cannot be accurately maintained in constant time. We therefore fell back a while ago to decompose occupancy into various quantities that can be maintained online but the occupancy itself has to be computed on demand (scaling linearily with the number of task prefixes). This can be expensive considering how often check_idle_saturated has to be evaluated.
Their definition is arguably fragile and hard to grasp. Particularly since they rely on the also quite fragile quantity occupancy which historically has put network load on equal footing to compute load which caused plenty of confusion in the past. It also makes reliable testing notoriously difficult
Ultimately, these sets are just maintained for performance reasons. Apart from performance optimization, their only functional purpose is to control the color of the worker bars. Saturated workers (a state that is incredibly difficult to attain) are colored yellow. That's it.

While ripping this out I simplified the stealing code and eradicated a couple of sources of non-determinism

The most important change that is also causing some of the tests to fail is that in the current iteration I chose to define possible thieves on the basis of the number of threads that are available. Previously, all idle classified workers, i.e. all workers with less than half of average occupancy, were considered possible thieves. Therefore, this stealing code is much, much more conservative. That is much more reliable and predictable but is also much less aggressive and cannot enforce absolute homogeneity.

The lack of determinism often originate from the usage of sets or the lack of tie-breakers when sorting. I believe that even without removing the idle/saturated sets it makes sense to remove those sources of non-determinism. Particularly, since this can subtly also affect global task ordering.

I still have to run some actual tests on this. So far this is all rather theoretical

github-actions · 2024-10-11T08:54:20Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

25 files ±0 25 suites ±0 10h 21m 40s ⏱️ - 3m 1s
4 130 tests ±0 3 988 ✅ - 28 110 💤 ±0 32 ❌ + 28
47 708 runs ±0 45 327 ✅ - 282 2 095 💤 ±0 286 ❌ +282

For more details on these failures, see this check.

Results for commit 393ee21. ± Comparison against base commit ecee9e8.

Remove idle and saturated sets from scheduler

393ee21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove idle and saturated sets from scheduler #8889

Remove idle and saturated sets from scheduler #8889

fjetter commented Oct 11, 2024 •

edited

Loading

github-actions bot commented Oct 11, 2024

Remove idle and saturated sets from scheduler #8889

Are you sure you want to change the base?

Remove idle and saturated sets from scheduler #8889

Conversation

fjetter commented Oct 11, 2024 • edited Loading

github-actions bot commented Oct 11, 2024

Unit Test Results

fjetter commented Oct 11, 2024 •

edited

Loading