-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ploidy as attribute of species; rework samples specification #1361
Conversation
Codecov ReportBase: 99.84% // Head: 99.94% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1361 +/- ##
==========================================
+ Coverage 99.84% 99.94% +0.10%
==========================================
Files 113 109 -4
Lines 3838 3781 -57
Branches 524 512 -12
==========================================
- Hits 3832 3779 -53
+ Misses 3 1 -2
+ Partials 3 1 -2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Shouldn't this be a property of the chromosome rather than the species? I'm thinking of Mt, chrY, and plasmids. |
Yeah @grahamgower, that makes sense to me. So, store per-chromosome ploidy data as a dict in |
Hmm, when simulating a generic contig for a species, I think we'd want it to use a default per-species ploidy -- which we wouldn't have if ploidy is strictly a property of chromosome. So, should it be a property of both species and chromosome? |
9f7ac79
to
ec53fff
Compare
Ah yes, good point! Well the species property as you have it in this PR looks good. How about having the chromosome property be an optional keyword arg of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grahamgower -- makes sense, although I'm not sure about automated retrieval of ploidy per chromosome ... maybe we can punt that down the road for now. Is this mockup for HomSap along the lines of what you're thinking?
Regarding the sampling syntax for the CLI... This PR uses |
Aha, yes, well remembered @grahamgower! Yes, it turns out ":" is a slightly better choice (discussion here: tskit-dev/msprime#1716), so may as well follow msprime's CLI here unless there's a compelling reason not to. Could borrow some code/tests as well if it's worthwhile - I had forgotten we did this for msprime. |
This is getting to be a big PR - I wonder if we could decouple the ploidy changes from the sample specification bits (and make that a sepearate PR) to make it easier to review? No worries if they're too intertangled now. |
Yeah sorry @jeromekelleher -- it's a bit intertwined now especially in the tests, where |
Fair enough - can you ping when you'd like a review please @nspope? |
Hey @andrewkern, any idea why we're getting periodic URL timeouts in the mac OS tests? I've been noticing this across different PRs over the past couple days, and sometimes it takes a few hours before a rerun will get it through. |
This is ready for review @jeromekelleher and/or @grahamgower -- thanks! A couple things,
|
The macOS GHA runners just seem to have problems. One can find many reports about slowness (cpu, disk, and/or network). E.g. actions/runner-images#4896 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, this is fantastic work and a real step forward!
Some minor suggestions above.
…n API; use population:integer pairs for CLI
c9ee587
to
3dd39e4
Compare
engine.simulate(samples=...)
now takes a dict of the form{population : num_individuals}
.engine.simulate
uses a new functionDemographicModel.get_sample_sets
(that isn't part of the public API) to combine species ploidy with the demographic model.The corresponding number of haploid samples (
tree_sequence.num_samples
) isnum_individuals * species.ploidy
The CLI for sample specification is now
population_name:num_samples
, likeHomSap -c chr1 -d OutOfAfrica_3G09 YRI:10 CEU:2
(same as for msprime)The old way of specifying samples still works (via
DemographicModel.get_samples
in python API, and via positional arguments in the CLI), but throws aDeprecationWarning
. I opted not to rename thesamples=...
argument toEngine.simulate
to keep the API less cluttered -- so it now takes either a dict (new behavior) or a list ofmsprime.SampleSet
(old behavior).Fixes should
samples
be diploid? #1282.Fixes adding ploidy as an attribute of species #1111 (by adding ploidy as an attribute of species) but doesn't alter the SLiM engine to do anything different for haploids (e.g. it still uses diploid recombination model, gives warning if odd number of samples is requested, etc).