Skip to content

Commit

Permalink
Phylogenetic: Use group-by with subsample-max-sequences
Browse files Browse the repository at this point in the history
  • Loading branch information
j23414 committed Jan 14, 2025
1 parent 05b014e commit 0d73b74
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 8 deletions.
7 changes: 1 addition & 6 deletions phylogenetic/defaults/config_dengue.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,7 @@ filter:
min_length:
genome: 5000
E: 1000
sequences_per_group:
all: '36'
denv1: '36'
denv2: '36'
denv3: '36'
denv4: '36'
subsample_max_sequences: '4000'

traits:
sampling_bias_correction: '3'
Expand Down
4 changes: 2 additions & 2 deletions phylogenetic/rules/prepare_sequences.smk
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ rule filter:
sequences = "results/{gene}/filtered_{serotype}.fasta"
params:
group_by = config['filter']['group_by'],
sequences_per_group = lambda wildcards: config['filter']['sequences_per_group'][wildcards.serotype],
subsample_max_sequences = config['filter']['subsample_max_sequences'],
min_length = lambda wildcard: config['filter']['min_length'][wildcard.gene],
strain_id = config.get("strain_id_field", "strain"),
shell:
Expand All @@ -73,7 +73,7 @@ rule filter:
--include {input.include} \
--output {output.sequences} \
--group-by {params.group_by} \
--sequences-per-group {params.sequences_per_group} \
--subsample-max-sequences {params.subsample_max_sequences} \
--min-length {params.min_length} \
--exclude-where country=? region=? date=? is_lab_host='true' \
--query-columns is_lab_host:str
Expand Down

0 comments on commit 0d73b74

Please sign in to comment.