Support deterministic scheduling #890

Zac-HD · 2019-01-28T19:41:01Z

See python-trio/pytest-trio#73 - this is important for Hypothesis, and arguably for debugging in general.

I've used the current time as primary sort key, with name and task cancel/schedule points to order tasks that started at the same time according to perf_counter. This is all deterministic, and any remaining collisions are probably coming from random bitflips.

I've also designed the tests to be imported into pytest-trio's test suite, and the Hypothesis test added.

codecov · 2019-01-28T20:50:39Z

Codecov Report

Merging #890 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #890      +/-   ##
==========================================
+ Coverage   99.53%   99.53%   +<.01%     
==========================================
  Files         101      102       +1     
  Lines       12379    12411      +32     
  Branches      910      916       +6     
==========================================
+ Hits        12321    12353      +32     
  Misses         36       36              
  Partials       22       22

Impacted Files	Coverage Δ
trio/tests/test_scheduler_determinism.py	`100% <100%> (ø)`
trio/_core/_run.py	`99.71% <100%> (ø)`	⬆️

trio/_core/_run.py

njsmith · 2019-01-29T09:13:57Z

trio/_core/_run.py

+        # instance can make the scheduler deterministic, which is important
+        # for testing and debugging, especially with tools such as Hypothesis,
+        # without giving up the advantages of sets everywhere else.
+        batch = sorted(runner.runq, key=Task.sort_key)


This is the very inner loop of trio's scheduler, so I'm slightly concerned about adding overhead. Of course the whole _r.shuffle thing is also dubious overhead... but maybe some simple measurements wouldn't be amiss? If it's non-trivial then we could hide it behind some flag that hypothesis (or someone) sets...

Another possibly less costly option is to use a priority queue: https://docs.python.org/3.7/library/heapq.html

I guess this is the kind of change where you want to run a microbenchmark?

I just don't have enough of a sense for typical workloads to write a good benchmark - all my performance instincts are formed for out-of-memory array workloads or other heavy throughput-dominated stuff. Very happy to run one and optimize accordingly though!

I don't think a heapq would help us here - to get a deterministic ordering we'd need to either run a n-log-n heapsort (and the sorted builtin is almost certainly faster), or else have the tasks stored in lists from creation time in which case we don't need to do anything (but we lose whatever benefits we currently get from sets).

We don't really have good workloads either, which makes the whole question of performance a bit vague. But for this purpose I just meant something like a silly microbenchmark with 100 tasks that just reschedule themselves over and over, to get some kind of upper-bound on this. Maybe it doesn't matter at all.

Zac-HD · 2019-01-29T20:33:02Z

OK, here's a terrible little microbenchmark, which at least demonstrates that it doesn't make a noticeable difference:

Using the exact scheduler_trace function that's in the tests for this PR (b675ef3), I used the ipython %%timeit magic:

# In [4]: %%timeit
#   ...: trio.run(scheduler_trace)
# <with sorting and shuffling enabled>
7.77 ms ± 1.54 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.79 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.77 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.69 ms ± 47.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# <without sorting or shuffling>
7.08 ms ± 1 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.75 ms ± 337 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.46 ms ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.08 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

So it's in the noise at that scale. Now let's try with 1k tasks instead of five:

# Enabled, 1k instead of five tasks
808 ms ± 139 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
818 ms ± 134 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
763 ms ± 81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
777 ms ± 82.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Disabled, same scale
722 ms ± 98.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
666 ms ± 7.34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
668 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
670 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So we can at least construct workloads where the impact is noticeable (constant work per task and n-log-n scheduling make that easy), though still not pathological. Judgement time:

Is this even a useful benchmark? Is the "large" scale too large (or too small?), does an IO-free and dependency-free benchmark make sense, etc. I think this is within a small constant factor of worst-case for a given number of tasks.
Is a slight slowdown worth the maintenance cost of an internal "could be deterministic" flag and branch? Probably yes, but I'd appreciate any thoughts.

njsmith · 2019-01-30T03:21:36Z

So it looks like sorting+shuffling adds ~15% slowdown in a pure-scheduling workload. (Actually I bet we can get that even higher if we switch from await checkpoint() to await cancel_shielded_checkpoint(), since the latter is more optimized.)

Did you try sorting+shuffling versus shuffling alone, like we currently do?

Zac-HD · 2019-01-30T04:11:49Z

Shuffling has a slight impact, but more importantly linear rather than log+linear complexity in the number of tasks. Of course all of this depends on the number of tasks and the actual workload....

# 1k tasks, shuffling but not sorting
686 ms ± 6.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
698 ms ± 20.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
687 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
687 ms ± 12.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Since I expect pytest-trio to be the only consumer of determinism here, I've just gone and added a flag to enable the sorting step because it's basically free performance over the other option.

njsmith · 2019-01-30T04:31:16Z

Yeah, since this is a super-janky API with exactly one consumer, I'm OK with hacky things like this.

I do wonder if it would be better to make the contract slightly less janky somehow. Would it be possible to have like trio._core._run.hypothesis_mode() that encapsulates both seeding the RNG and turning on sorting? What does the hypothesis RNG seeding functionality actually do, is it just calling .seed(0) before each test run or something fancier?

Zac-HD · 2019-01-30T07:17:04Z

What does the hypothesis RNG seeding functionality actually do, is it just calling .seed(0) before each test run or something fancier?

It used to do that, but it's now a bit fancier! We eventually discovered that repeatedly seeding to zero meant that tests were affecting each other's state, and once we fixed it that there were some rare bugs that just never got triggered when the PRNG was "close" to the zeroed state. So now we restore the previous state of every PRNG we manage after each test case is executed.

The other thing we do is the random_module() strategy, which you can use to tell Hypothesis "this test should be affected by randomness". In that case, we actually draw a random seed value from an internal strategy, so test case executions are varied but still reproducible. I wrote HypothesisWorks/hypothesis#1741 so we could use this for Trio too without Hypothesis needing to know the details of every such library - we make an exception for Numpy but try to keep everything else generalised.

So I don't think it's worth giving this a nicer API here (at least for now); it's kinda janky but I can add an autouse fixture to pytest-trio that does the monkeypatch for all and only tests with the Trio marker. Like everything in the Hypothesis, Trio, and Pytest ecosystems it will be beautiful on top and... pragmatic underneath - magic hath it's price!

newsfragments/890.feature.rst

trio/_core/_run.py

Very important for Hypothesis, and arguably for debugging in general.

Zac-HD · 2019-02-03T04:01:12Z

Ping @njsmith; I've fixed the changelog and comment - ready to merge?

trio/_core/_run.py

Zac-HD added the pytest-trio relevant label Jan 28, 2019

Zac-HD requested a review from njsmith January 28, 2019 19:41

Zac-HD force-pushed the optionally-deterministic-scheduler branch from d305eba to c93347f Compare January 28, 2019 20:50

Zac-HD force-pushed the optionally-deterministic-scheduler branch from c93347f to 5be15f0 Compare January 29, 2019 00:58

njsmith reviewed Jan 29, 2019

View reviewed changes

Zac-HD force-pushed the optionally-deterministic-scheduler branch from 5be15f0 to b675ef3 Compare January 29, 2019 09:44

Zac-HD force-pushed the optionally-deterministic-scheduler branch from 3393003 to c2d6660 Compare January 30, 2019 02:11

njsmith reviewed Jan 30, 2019

View reviewed changes

newsfragments/890.feature.rst Outdated Show resolved Hide resolved

trio/_core/_run.py Outdated Show resolved Hide resolved

Support deterministic scheduling

f532ebf

Very important for Hypothesis, and arguably for debugging in general.

Zac-HD force-pushed the optionally-deterministic-scheduler branch from c2d6660 to f532ebf Compare January 30, 2019 08:13

pquentin reviewed Feb 3, 2019

View reviewed changes

trio/_core/_run.py Show resolved Hide resolved

pquentin approved these changes Feb 3, 2019

View reviewed changes

Zac-HD merged commit 319327d into python-trio:master Feb 3, 2019

Zac-HD deleted the optionally-deterministic-scheduler branch February 3, 2019 11:46

Zac-HD mentioned this pull request Mar 30, 2024

Move pytest-trio's hook for deterministic Hypothesis tests upstream into Trio #2981

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support deterministic scheduling #890

Support deterministic scheduling #890

Zac-HD commented Jan 28, 2019

codecov bot commented Jan 28, 2019 •

edited

Loading

njsmith Jan 29, 2019

pquentin Jan 29, 2019

Zac-HD Jan 29, 2019

njsmith Jan 29, 2019

Zac-HD commented Jan 29, 2019

njsmith commented Jan 30, 2019

Zac-HD commented Jan 30, 2019

njsmith commented Jan 30, 2019

Zac-HD commented Jan 30, 2019

Zac-HD commented Feb 3, 2019

Support deterministic scheduling #890

Support deterministic scheduling #890

Conversation

Zac-HD commented Jan 28, 2019

codecov bot commented Jan 28, 2019 • edited Loading

Codecov Report

njsmith Jan 29, 2019

Choose a reason for hiding this comment

pquentin Jan 29, 2019

Choose a reason for hiding this comment

Zac-HD Jan 29, 2019

Choose a reason for hiding this comment

njsmith Jan 29, 2019

Choose a reason for hiding this comment

Zac-HD commented Jan 29, 2019

njsmith commented Jan 30, 2019

Zac-HD commented Jan 30, 2019

njsmith commented Jan 30, 2019

Zac-HD commented Jan 30, 2019

Zac-HD commented Feb 3, 2019

codecov bot commented Jan 28, 2019 •

edited

Loading