-
-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support deterministic scheduling #890
Support deterministic scheduling #890
Conversation
d305eba
to
c93347f
Compare
Codecov Report
@@ Coverage Diff @@
## master #890 +/- ##
==========================================
+ Coverage 99.53% 99.53% +<.01%
==========================================
Files 101 102 +1
Lines 12379 12411 +32
Branches 910 916 +6
==========================================
+ Hits 12321 12353 +32
Misses 36 36
Partials 22 22
|
c93347f
to
5be15f0
Compare
trio/_core/_run.py
Outdated
# instance can make the scheduler deterministic, which is important | ||
# for testing and debugging, especially with tools such as Hypothesis, | ||
# without giving up the advantages of sets everywhere else. | ||
batch = sorted(runner.runq, key=Task.sort_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the very inner loop of trio's scheduler, so I'm slightly concerned about adding overhead. Of course the whole _r.shuffle
thing is also dubious overhead... but maybe some simple measurements wouldn't be amiss? If it's non-trivial then we could hide it behind some flag that hypothesis (or someone) sets...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibly less costly option is to use a priority queue: https://docs.python.org/3.7/library/heapq.html
I guess this is the kind of change where you want to run a microbenchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't have enough of a sense for typical workloads to write a good benchmark - all my performance instincts are formed for out-of-memory array workloads or other heavy throughput-dominated stuff. Very happy to run one and optimize accordingly though!
I don't think a heapq would help us here - to get a deterministic ordering we'd need to either run a n-log-n heapsort (and the sorted
builtin is almost certainly faster), or else have the tasks stored in lists from creation time in which case we don't need to do anything (but we lose whatever benefits we currently get from sets).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really have good workloads either, which makes the whole question of performance a bit vague. But for this purpose I just meant something like a silly microbenchmark with 100 tasks that just reschedule themselves over and over, to get some kind of upper-bound on this. Maybe it doesn't matter at all.
5be15f0
to
b675ef3
Compare
OK, here's a terrible little microbenchmark, which at least demonstrates that it doesn't make a noticeable difference: Using the exact
So it's in the noise at that scale. Now let's try with 1k tasks instead of five:
So we can at least construct workloads where the impact is noticeable (constant work per task and n-log-n scheduling make that easy), though still not pathological. Judgement time:
|
3393003
to
c2d6660
Compare
So it looks like sorting+shuffling adds ~15% slowdown in a pure-scheduling workload. (Actually I bet we can get that even higher if we switch from Did you try sorting+shuffling versus shuffling alone, like we currently do? |
Shuffling has a slight impact, but more importantly linear rather than log+linear complexity in the number of tasks. Of course all of this depends on the number of tasks and the actual workload....
Since I expect |
Yeah, since this is a super-janky API with exactly one consumer, I'm OK with hacky things like this. I do wonder if it would be better to make the contract slightly less janky somehow. Would it be possible to have like |
It used to do that, but it's now a bit fancier! We eventually discovered that repeatedly seeding to zero meant that tests were affecting each other's state, and once we fixed it that there were some rare bugs that just never got triggered when the PRNG was "close" to the zeroed state. So now we restore the previous state of every PRNG we manage after each test case is executed. The other thing we do is the So I don't think it's worth giving this a nicer API here (at least for now); it's kinda janky but I can add an autouse fixture to pytest-trio that does the monkeypatch for all and only tests with the Trio marker. Like everything in the Hypothesis, Trio, and Pytest ecosystems it will be beautiful on top and... pragmatic underneath - magic hath it's price! |
Very important for Hypothesis, and arguably for debugging in general.
c2d6660
to
f532ebf
Compare
Ping @njsmith; I've fixed the changelog and comment - ready to merge? |
See python-trio/pytest-trio#73 - this is important for Hypothesis, and arguably for debugging in general.
I've used the current time as primary sort key, with name and task cancel/schedule points to order tasks that started at the same time according to perf_counter. This is all deterministic, and any remaining collisions are probably coming from random bitflips.
I've also designed the tests to be imported into pytest-trio's test suite, and the Hypothesis test added.