Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ploidy as attribute of species; rework samples specification #1361

Merged
merged 2 commits into from
Sep 26, 2022

Conversation

nspope
Copy link
Collaborator

@nspope nspope commented Sep 16, 2022

  • engine.simulate(samples=...) now takes a dict of the form {population : num_individuals}. engine.simulate uses a new function DemographicModel.get_sample_sets (that isn't part of the public API) to combine species ploidy with the demographic model.

  • The corresponding number of haploid samples (tree_sequence.num_samples) is num_individuals * species.ploidy

  • The CLI for sample specification is now population_name:num_samples, like HomSap -c chr1 -d OutOfAfrica_3G09 YRI:10 CEU:2 (same as for msprime)

  • The old way of specifying samples still works (via DemographicModel.get_samples in python API, and via positional arguments in the CLI), but throws a DeprecationWarning. I opted not to rename the samples=... argument to Engine.simulate to keep the API less cluttered -- so it now takes either a dict (new behavior) or a list of msprime.SampleSet (old behavior).

  • Fixes should samples be diploid? #1282.

  • Fixes adding ploidy as an attribute of species #1111 (by adding ploidy as an attribute of species) but doesn't alter the SLiM engine to do anything different for haploids (e.g. it still uses diploid recombination model, gives warning if odd number of samples is requested, etc).

@codecov
Copy link

codecov bot commented Sep 16, 2022

Codecov Report

Base: 99.84% // Head: 99.94% // Increases project coverage by +0.10% 🎉

Coverage data is based on head (eafaae4) compared to base (0b338dd).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1361      +/-   ##
==========================================
+ Coverage   99.84%   99.94%   +0.10%     
==========================================
  Files         113      109       -4     
  Lines        3838     3781      -57     
  Branches      524      512      -12     
==========================================
- Hits         3832     3779      -53     
+ Misses          3        1       -2     
+ Partials        3        1       -2     
Impacted Files Coverage Δ
stdpopsim/catalog/AedAeg/species.py 100.00% <100.00%> (ø)
stdpopsim/catalog/AnaPla/species.py 100.00% <100.00%> (ø)
stdpopsim/catalog/AnoCar/species.py 100.00% <100.00%> (ø)
stdpopsim/catalog/AnoGam/species.py 100.00% <100.00%> (ø)
stdpopsim/catalog/ApiMel/species.py 100.00% <100.00%> (ø)
stdpopsim/catalog/AraTha/species.py 100.00% <100.00%> (ø)
stdpopsim/catalog/BosTau/species.py 100.00% <100.00%> (ø)
stdpopsim/catalog/CaeEle/species.py 100.00% <100.00%> (ø)
stdpopsim/catalog/CanFam/species.py 100.00% <100.00%> (ø)
stdpopsim/catalog/ChlRei/species.py 100.00% <100.00%> (ø)
... and 25 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@grahamgower
Copy link
Member

Shouldn't this be a property of the chromosome rather than the species? I'm thinking of Mt, chrY, and plasmids.

@nspope
Copy link
Collaborator Author

nspope commented Sep 17, 2022

Yeah @grahamgower, that makes sense to me. So, store per-chromosome ploidy data as a dict in catalog/*/species.py, as is done for per-chromosome mean recombination rate?

@nspope
Copy link
Collaborator Author

nspope commented Sep 17, 2022

Hmm, when simulating a generic contig for a species, I think we'd want it to use a default per-species ploidy -- which we wouldn't have if ploidy is strictly a property of chromosome. So, should it be a property of both species and chromosome?

@grahamgower
Copy link
Member

Ah yes, good point! Well the species property as you have it in this PR looks good. How about having the chromosome property be an optional keyword arg of stdpopsim.Chromosome(), which is populated from info in the various genome_data.py:data dicts like the chromosome synonyms? The chromosome ploidy may then be omitted in most cases---maybe the maintenance scripts for generating species can detect "Mt" and "chrY" and set the ploidy=1 on these.

tests/test_dfes.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@nspope nspope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grahamgower -- makes sense, although I'm not sure about automated retrieval of ploidy per chromosome ... maybe we can punt that down the road for now. Is this mockup for HomSap along the lines of what you're thinking?

stdpopsim/catalog/HomSap/species.py Outdated Show resolved Hide resolved
stdpopsim/engines.py Outdated Show resolved Hide resolved
stdpopsim/catalog/HomSap/species.py Outdated Show resolved Hide resolved
@grahamgower
Copy link
Member

Regarding the sampling syntax for the CLI... This PR uses <name>=<n>, whereas msprime's msp ancestry command uses <name>:<n>. Perhaps stdpopsim should follow msprime here? I don't recall the argument for using a colon instead of equals, but maybe @jeromekelleher can enlighten us?
https://tskit.dev/msprime/docs/stable/cli.html#Positional%20Arguments

@jeromekelleher
Copy link
Member

Aha, yes, well remembered @grahamgower! Yes, it turns out ":" is a slightly better choice (discussion here: tskit-dev/msprime#1716), so may as well follow msprime's CLI here unless there's a compelling reason not to. Could borrow some code/tests as well if it's worthwhile - I had forgotten we did this for msprime.

@jeromekelleher
Copy link
Member

jeromekelleher commented Sep 21, 2022

This is getting to be a big PR - I wonder if we could decouple the ploidy changes from the sample specification bits (and make that a sepearate PR) to make it easier to review? No worries if they're too intertangled now.

@nspope
Copy link
Collaborator Author

nspope commented Sep 21, 2022

Yeah sorry @jeromekelleher -- it's a bit intertwined now especially in the tests, where ts.num_samples is frequently checked and depends on both ploidy and number of individuals.

@jeromekelleher
Copy link
Member

Fair enough - can you ping when you'd like a review please @nspope?

@nspope
Copy link
Collaborator Author

nspope commented Sep 21, 2022

Hey @andrewkern, any idea why we're getting periodic URL timeouts in the mac OS tests? I've been noticing this across different PRs over the past couple days, and sometimes it takes a few hours before a rerun will get it through.

@nspope
Copy link
Collaborator Author

nspope commented Sep 21, 2022

This is ready for review @jeromekelleher and/or @grahamgower -- thanks! A couple things,

  • I haven't added anything regarding ploidy to the development docs-- will do so in a separate PR
  • Sample specification in the CLI with the default constant demography seems kind of clunky, e.g. stdpopsim HomSap pop_0:1 -- I wish we could omit the pop_0: but then it's not distinguishable from the positional, haploid sample specification (which is retained with a deprecation warning).

@grahamgower
Copy link
Member

Hey @andrewkern, any idea why we're getting periodic URL timeouts in the mac OS tests? I've been noticing this across different PRs over the past couple days, and sometimes it takes a few hours before a rerun will get it through.

The macOS GHA runners just seem to have problems. One can find many reports about slowness (cpu, disk, and/or network). E.g. actions/runner-images#4896

Copy link
Member

@grahamgower grahamgower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

stdpopsim/models.py Outdated Show resolved Hide resolved
stdpopsim/slim_engine.py Show resolved Hide resolved
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, this is fantastic work and a real step forward!

Some minor suggestions above.

docs/tutorial.rst Show resolved Hide resolved
stdpopsim/models.py Outdated Show resolved Hide resolved
stdpopsim/utils.py Outdated Show resolved Hide resolved
stdpopsim/models.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

should samples be diploid? adding ploidy as an attribute of species
3 participants