Skip to content

Commit

Permalink
Exclude PAT strains and outliers from phylogenetic analysis
Browse files Browse the repository at this point in the history
Exclude several PAT strains that did not have the is_lab_host metadata field set to True.
Exclude several strains that were either putative recombinants or were below the PAT FV537222 cluster.
  • Loading branch information
j23414 committed Oct 16, 2024
1 parent c3d91b1 commit 0ca14f8
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 1 deletion.
2 changes: 1 addition & 1 deletion phylogenetic/defaults/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ input_sequences: "data/sequences.fasta"
# This command excludes all strains by default and then forces the inclusion of
# the strains selected by the subsampling logic defined above.
subsampling:
region: --query "is_lab_host != 'true'" --query-columns is_lab_host:str --min-length '9800' --group-by region year --subsample-max-sequences 3000
region: --query "is_lab_host != 'true'" --query-columns is_lab_host:str --min-length '9800' --group-by region year --subsample-max-sequences 3000 --exclude defaults/exclude.txt
force_include: --exclude-all --include defaults/include.txt

traits:
Expand Down
55 changes: 55 additions & 0 deletions phylogenetic/defaults/exclude.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
HW816192 # 11029 bp PAT 27-MAY-2015
CS543188 # 11029 bp PAT 20-APR-2007
CS568914 # 11029 bp PAT 18-MAY-2007
CS568916 # 11029 bp PAT 18-MAY-2007
CS568917 # 11029 bp PAT 18-MAY-2007
CS568918 # 11029 bp PAT 18-MAY-2007
CS568919 # 11029 bp PAT 18-MAY-2007
FV537222 # 10962 bp PAT 18-MAR-2010
FV537223 # 10962 bp PAT 18-MAR-2010
FV537224 # 10962 bp PAT 18-MAR-2010
FV537225 # 10962 bp PAT 18-MAR-2010
LQ460608 # 8839 bp PAT 06-OCT-2016
LQ564350 # 8839 bp PAT 06-OCT-2016
LY683288 # 11062 bp PAT 04-DEC-2019
MA388207 # 8839 bp PAT 30-OCT-2018
HC467807 # 11029 bp PAT 21-APR-2010
HH961658 # 10975 bp PAT 31-OCT-2010
HH961659 # 11029 bp PAT 31-OCT-2010
HV572312 # 11029 bp PAT 31-MAY-2012
OP846974 # Suspected recombinant sequences from Mencattelli et al, 2023 https://www.nature.com/articles/s41467-023-42185-7
OK239667 # Suspected recombinant sequences from Mencattelli et al, 2023 https://www.nature.com/articles/s41467-023-42185-7
OM202920 # Clusters below PAT FV537222
OM202936 # Clusters below PAT FV537222
OM202914 # Clusters below PAT FV537222
OM202933 # Clusters below PAT FV537222
OM202907 # Clusters below PAT FV537222
OK573263 # Clusters below PAT FV537222
FV537224 # Clusters below PAT FV537222
OK573278 # Clusters below PAT FV537222
OM202917 # Clusters below PAT FV537222
OM202919 # Clusters below PAT FV537222
OM202910 # Clusters below PAT FV537222
OM202911 # Clusters below PAT FV537222
OM202922 # Clusters below PAT FV537222
OK573272 # Clusters below PAT FV537222
OK573262 # Clusters below PAT FV537222
OK573279 # Clusters below PAT FV537222
OK573269 # Clusters below PAT FV537222
OM202923 # Clusters below PAT FV537222
OM202906 # Clusters below PAT FV537222
OM202909 # Clusters below PAT FV537222
OM202930 # Clusters below PAT FV537222
OM202929 # Clusters below PAT FV537222
OM202904 # Clusters below PAT FV537222
OM202913 # Clusters below PAT FV537222
OM202908 # Clusters below PAT FV537222
OM202915 # Clusters below PAT FV537222
OM202912 # Clusters below PAT FV537222
OK572999 # Clusters below PAT FV537222
OK573277 # Clusters below PAT FV537222
FV537225 # Clusters below PAT FV537222
OM202905 # Clusters below PAT FV537222
OM202932 # Clusters below PAT FV537222
FV537223 # Clusters below PAT FV537222
FV537222 # Clusters below PAT FV537222

0 comments on commit 0ca14f8

Please sign in to comment.