Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use module-level Random to shuffle tasks #840

Merged
merged 1 commit into from
Jan 6, 2019

Conversation

Zac-HD
Copy link
Member

@Zac-HD Zac-HD commented Jan 6, 2019

This allows test helpers etc. to reproduce scheduling-dependent heisenbugs by seeding the Random instance. (or at least ensures that something else will prevent reproductions instead)

I have draft patches against pytest-trio and Hypothesis to take advantage of this, but since this has no observable impact except to enable those future patches I thought this was a good place to start.

This allows test helpers etc. to reproduce scheduling-dependent heisenbugs by seeding the Random instance.

Or at least ensures that *something else* will prevent reproductions instead.
@njsmith
Copy link
Member

njsmith commented Jan 6, 2019

Can you link to the issue thread where this was discussed, for context? (Not necessarily in the code, just here in the PR is fine.)

Actual change LGTM

@Zac-HD
Copy link
Member Author

Zac-HD commented Jan 6, 2019

Discussed here: HypothesisWorks/hypothesis#1709, and related to #239.

I considered a few other ways to make the scheduler deterministic-on-demand:

  • This option, allowing Hypothesis to seed a random shuffle, is the smallest change to Trio and therefore leaves our future options open - the public API doesn't change at all. It can be made flatly deterministic by default, or varied-but-reproducible with the random_module strategy (once the Hypothesis end lands, anyway).
  • Adding a seed= argument to trio.run would involve public API that we might not want to keep (e.g. if we do move to fair or other deterministic scheduling later). It would also be an enormous pain to connect it to Hypothesis for the varied-but-reproducible option.
  • Adding an order_tasks: callable=... argument to trio.run would be very, very flexible but have all the same problems as seed= for this purpose.

@codecov
Copy link

codecov bot commented Jan 6, 2019

Codecov Report

Merging #840 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #840      +/-   ##
==========================================
- Coverage   99.23%   99.22%   -0.01%     
==========================================
  Files         101      101              
  Lines       12182    12182              
  Branches      882      882              
==========================================
- Hits        12089    12088       -1     
- Misses         72       73       +1     
  Partials       21       21
Impacted Files Coverage Δ
trio/_core/_run.py 97.95% <100%> (-0.15%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b208624...73d397e. Read the comment docs.

@njsmith
Copy link
Member

njsmith commented Jan 6, 2019

Adding a seed= argument to trio.run would involve public API that we might not want to keep (e.g. if we do move to fair or other deterministic scheduling later). It would also be an enormous pain to connect it to Hypothesis for the varied-but-reproducible option.

I'm fine with the current PR, but noting here:

The public API is an issue, but not a huge one: in the worst case, if we remove randomization, then the argument would just become a no-op, which would be totally backwards compatible. ("You asked for determinism, and you got determinism!") We could then deprecate it as lazily as we felt like.

For the "enormous pain" part: are you imagining that Hypothesis will have hard-coded knowledge of Trio, and automatically patch the seed here whenever Hypothesis and Trio are both loaded in the same process? If so then I agree :-). But in the other design, where Hypothesis remains ignorant of Trio and we leave it to pytest-trio to hook them up, then a seed= argument seems like it should be pretty easy to handle – pytest-trio already knows when the test is being run under hypothesis, and pytest-trio is already responsible for calling trio.run, so it seems like it should be pretty straightforward to plumb them together.

...I guess the version where Hypothesis has hard-coded knowledge of Trio has compelling advantages though, if it's acceptable to the Hypothesis devs. Not everyone uses pytest-trio, and even if you use pytest-trio you might still have some tests where you call run by hand, and I have no idea how stateful tests would fit into this, etc.

Another advantage of not exposing seed is that it won't tempt anyone to use a fixed seed to "solve" race conditions in their code :-). Though I doubt this will be an issue in practice...

@Zac-HD
Copy link
Member Author

Zac-HD commented Jan 6, 2019

Our policy for Hypothesis is that the core engine cannot have any knowledge of other libraries; the interface layer can have generic hooks (e.g. I added one for pytest-trio which is now also used by pytest-asyncio), and the strategy layer can have whatever dependencies if it's under the hypothesis.extra namespace.

So pytest-trio could easily inject a static seed= argument, but the approach I'm going with also supports varied-but-reproducible seeds via the random_module strategy. From the Hypothesis side, it's also usable for simulations or anything else where an existing Random instance needs to be made deterministic.

This approach will also work without pytest-trio; you just need to do some (idempotent) setup by calling hypothesis.strategies.random_module.register(trio._core._run._r) before running the test. The pytest-trio implementation is literally just to do this setup when pytest imports the module 😄

@njsmith
Copy link
Member

njsmith commented Jan 6, 2019

So pytest-trio could easily inject a static seed= argument, but the approach I'm going with also supports varied-but-reproducible seeds via the random_module strategy.

Oh, I was imagining that it would sample the seed from some strategy, not use a static one.

But, anyway, I'm happy :-) Let's go ahead with this and we can always make it more complicated later if we need to.

@njsmith njsmith merged commit e5927e9 into python-trio:master Jan 6, 2019
@Zac-HD Zac-HD deleted the randomness branch January 21, 2019 05:09
@Zac-HD
Copy link
Member Author

Zac-HD commented Jan 21, 2019

Oh, I was imagining that it would sample the seed from some strategy, not use a static one.

It does that for the stdlib, Numpy, and any registered random instances iff you use the random_module strategy - otherwise we run into complications about how to display failing examples etc.

Otherwise, we use a static seed to ensure it's still reproducible. In principle it would be nice to make using the random module without the corresponding strategy into a healthcheck error, but when we did that a few years ago it was a terrible user experience for not much gain.

@njsmith
Copy link
Member

njsmith commented Jan 21, 2019

In principle it would be nice to make using the random module without the corresponding strategy into a healthcheck error, but when we did that a few years ago it was a terrible user experience for not much gain.

Huh, interesting! The more you know 🌠

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants