-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phylogenetic: revisit subsampling configuration #90
Comments
From a quick exploration, I propose to keep subsampling across years, otherwise we lose temporal sampling to infer a reasonable molecular clock rate
|
I'm slowly realizing that a straight up swap to subsampling configurable logic might not be straightforward when we are building multiple trees (e.g. genome has a different min-length filter than E gene). For example:
And I'm a bit vague on how we'd use the subsampling config if we want different layers of subsampling for serotypes (denv1-denv4) then the all trees. I'm open to suggestions, just logging some thoughts. |
I think most of our multi-build phylogenetic workflows use the same subsampling config across trees/builds/datasets so this hasn't been much of an issue. There are at least a few pathogen repos that have the same need, with varying implementation:
I'm not sure which is "better", but my sense is that all these repos have widely varying Snakemake logic even beyond subsampling. |
Context
Revisit subsampling configuration for dengue filter commands. Currently there're only ~1500 genomes from all serotypes, when there should probably be ~4000. This is due to the fact that the subsampling logic is currently split by groups
year
andregion
dengue/phylogenetic/defaults/config_dengue.yaml
Lines 13 to 22 in 8b784e8
Instead of backwards calculating the
all
sequences_per_group
to get ~4000 samples, it was suggested that we could setsubsample_max_sequence
to4000
.Since I'm revisiting this anyway, I'm considering if we should also swap to the subsampling_configurable logic we currently have in WNV pipelines. I believe we're planning to move toward that implementation in the future along with some config validation work.
https://github.com/nextstrain/WNV/blob/9c7ddac3baf76157412d58c7c76a0cf4a347cb77/phylogenetic/defaults/config.yaml#L67-L69
The above text will probably change as I solidify my opinions on a direction, I'm writing up a draft intention here.
Description
A clear and concise description of what you want to happen
Examples
Possible solution
(Optional)
The text was updated successfully, but these errors were encountered: