Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support deterministic scheduling #890

Merged

Conversation

Zac-HD
Copy link
Member

@Zac-HD Zac-HD commented Jan 28, 2019

See python-trio/pytest-trio#73 - this is important for Hypothesis, and arguably for debugging in general.

I've used the current time as primary sort key, with name and task cancel/schedule points to order tasks that started at the same time according to perf_counter. This is all deterministic, and any remaining collisions are probably coming from random bitflips.

I've also designed the tests to be imported into pytest-trio's test suite, and the Hypothesis test added.

@codecov
Copy link

codecov bot commented Jan 28, 2019

Codecov Report

Merging #890 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #890      +/-   ##
==========================================
+ Coverage   99.53%   99.53%   +<.01%     
==========================================
  Files         101      102       +1     
  Lines       12379    12411      +32     
  Branches      910      916       +6     
==========================================
+ Hits        12321    12353      +32     
  Misses         36       36              
  Partials       22       22
Impacted Files Coverage Δ
trio/tests/test_scheduler_determinism.py 100% <100%> (ø)
trio/_core/_run.py 99.71% <100%> (ø) ⬆️

@Zac-HD Zac-HD force-pushed the optionally-deterministic-scheduler branch from c93347f to 5be15f0 Compare January 29, 2019 00:58
trio/_core/_run.py Outdated Show resolved Hide resolved
trio/_core/_run.py Outdated Show resolved Hide resolved
# instance can make the scheduler deterministic, which is important
# for testing and debugging, especially with tools such as Hypothesis,
# without giving up the advantages of sets everywhere else.
batch = sorted(runner.runq, key=Task.sort_key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the very inner loop of trio's scheduler, so I'm slightly concerned about adding overhead. Of course the whole _r.shuffle thing is also dubious overhead... but maybe some simple measurements wouldn't be amiss? If it's non-trivial then we could hide it behind some flag that hypothesis (or someone) sets...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibly less costly option is to use a priority queue: https://docs.python.org/3.7/library/heapq.html

I guess this is the kind of change where you want to run a microbenchmark?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't have enough of a sense for typical workloads to write a good benchmark - all my performance instincts are formed for out-of-memory array workloads or other heavy throughput-dominated stuff. Very happy to run one and optimize accordingly though!

I don't think a heapq would help us here - to get a deterministic ordering we'd need to either run a n-log-n heapsort (and the sorted builtin is almost certainly faster), or else have the tasks stored in lists from creation time in which case we don't need to do anything (but we lose whatever benefits we currently get from sets).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really have good workloads either, which makes the whole question of performance a bit vague. But for this purpose I just meant something like a silly microbenchmark with 100 tasks that just reschedule themselves over and over, to get some kind of upper-bound on this. Maybe it doesn't matter at all.

@Zac-HD Zac-HD force-pushed the optionally-deterministic-scheduler branch from 5be15f0 to b675ef3 Compare January 29, 2019 09:44
@Zac-HD
Copy link
Member Author

Zac-HD commented Jan 29, 2019

OK, here's a terrible little microbenchmark, which at least demonstrates that it doesn't make a noticeable difference:

Using the exact scheduler_trace function that's in the tests for this PR (b675ef3), I used the ipython %%timeit magic:

# In [4]: %%timeit
#   ...: trio.run(scheduler_trace)
# <with sorting and shuffling enabled>
7.77 ms ± 1.54 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.79 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.77 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.69 ms ± 47.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# <without sorting or shuffling>
7.08 ms ± 1 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.75 ms ± 337 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.46 ms ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.08 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

So it's in the noise at that scale. Now let's try with 1k tasks instead of five:

# Enabled, 1k instead of five tasks
808 ms ± 139 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
818 ms ± 134 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
763 ms ± 81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
777 ms ± 82.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Disabled, same scale
722 ms ± 98.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
666 ms ± 7.34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
668 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
670 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So we can at least construct workloads where the impact is noticeable (constant work per task and n-log-n scheduling make that easy), though still not pathological. Judgement time:

  • Is this even a useful benchmark? Is the "large" scale too large (or too small?), does an IO-free and dependency-free benchmark make sense, etc. I think this is within a small constant factor of worst-case for a given number of tasks.
  • Is a slight slowdown worth the maintenance cost of an internal "could be deterministic" flag and branch? Probably yes, but I'd appreciate any thoughts.

@Zac-HD Zac-HD force-pushed the optionally-deterministic-scheduler branch from 3393003 to c2d6660 Compare January 30, 2019 02:11
@njsmith
Copy link
Member

njsmith commented Jan 30, 2019

So it looks like sorting+shuffling adds ~15% slowdown in a pure-scheduling workload. (Actually I bet we can get that even higher if we switch from await checkpoint() to await cancel_shielded_checkpoint(), since the latter is more optimized.)

Did you try sorting+shuffling versus shuffling alone, like we currently do?

@Zac-HD
Copy link
Member Author

Zac-HD commented Jan 30, 2019

Shuffling has a slight impact, but more importantly linear rather than log+linear complexity in the number of tasks. Of course all of this depends on the number of tasks and the actual workload....

# 1k tasks, shuffling but not sorting
686 ms ± 6.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
698 ms ± 20.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
687 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
687 ms ± 12.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Since I expect pytest-trio to be the only consumer of determinism here, I've just gone and added a flag to enable the sorting step because it's basically free performance over the other option.

@njsmith
Copy link
Member

njsmith commented Jan 30, 2019

Yeah, since this is a super-janky API with exactly one consumer, I'm OK with hacky things like this.

I do wonder if it would be better to make the contract slightly less janky somehow. Would it be possible to have like trio._core._run.hypothesis_mode() that encapsulates both seeding the RNG and turning on sorting? What does the hypothesis RNG seeding functionality actually do, is it just calling .seed(0) before each test run or something fancier?

@Zac-HD
Copy link
Member Author

Zac-HD commented Jan 30, 2019

What does the hypothesis RNG seeding functionality actually do, is it just calling .seed(0) before each test run or something fancier?

It used to do that, but it's now a bit fancier! We eventually discovered that repeatedly seeding to zero meant that tests were affecting each other's state, and once we fixed it that there were some rare bugs that just never got triggered when the PRNG was "close" to the zeroed state. So now we restore the previous state of every PRNG we manage after each test case is executed.

The other thing we do is the random_module() strategy, which you can use to tell Hypothesis "this test should be affected by randomness". In that case, we actually draw a random seed value from an internal strategy, so test case executions are varied but still reproducible. I wrote HypothesisWorks/hypothesis#1741 so we could use this for Trio too without Hypothesis needing to know the details of every such library - we make an exception for Numpy but try to keep everything else generalised.

So I don't think it's worth giving this a nicer API here (at least for now); it's kinda janky but I can add an autouse fixture to pytest-trio that does the monkeypatch for all and only tests with the Trio marker. Like everything in the Hypothesis, Trio, and Pytest ecosystems it will be beautiful on top and... pragmatic underneath - magic hath it's price!

newsfragments/890.feature.rst Outdated Show resolved Hide resolved
trio/_core/_run.py Outdated Show resolved Hide resolved
Very important for Hypothesis, and arguably for debugging in general.
@Zac-HD Zac-HD force-pushed the optionally-deterministic-scheduler branch from c2d6660 to f532ebf Compare January 30, 2019 08:13
@Zac-HD
Copy link
Member Author

Zac-HD commented Feb 3, 2019

Ping @njsmith; I've fixed the changelog and comment - ready to merge?

@Zac-HD Zac-HD merged commit 319327d into python-trio:master Feb 3, 2019
@Zac-HD Zac-HD deleted the optionally-deterministic-scheduler branch February 3, 2019 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants