Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing species in GTDB 207 gte50comp-lt5cont.nwk tree #49

Open
chloelulu opened this issue Sep 3, 2024 · 2 comments
Open

missing species in GTDB 207 gte50comp-lt5cont.nwk tree #49

chloelulu opened this issue Sep 3, 2024 · 2 comments

Comments

@chloelulu
Copy link

Hello Developer,

I am currently using the database available at http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release207/kraken2/ for my analysis with Kraken2, and I've found it quite straightforward to use. According to the classification results, my samples show a significant abundance of Acetatifactor sp003612485 and 1XD42-69 sp003612565. Consequently, I plan to use the phylogenetic tree from http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release207/phylogeny/gte50comp-lt5cont.nwk as a reference as well.

However, I noticed that these two species are not present in the tree tips. May I know why they are missing and how I can obtain a complete tree that includes all the species listed in names.dmp?

Here are the commands I used to search for the species:

grep -E 'Acetatifactor sp003612485|1XD42-69 sp003612565' taxonomy/names.dmp
406383408	|	1XD42-69 sp003612565	|		|	scientific name	|
611307305	|	Acetatifactor sp003612485	|		|	scientific name	|
grep -E 'Acetatifactor sp003612485|1XD42-69 sp003612565' gte50comp-lt5cont.nwk

Your suggestions are much appreciated!

@chloelulu chloelulu changed the title missing species in the gte50comp-lt5cont.nwk tree missing species in GTDB 207 gte50comp-lt5cont.nwk tree Sep 3, 2024
@nick-youngblut
Copy link
Contributor

I believe that Acetatifactor sp003612485|1XD42-69 sp003612565 will have been modified in the newick file, since newick generally does not allow for special characters. You can use https://github.com/tjunier/newick_utils to help extract the names from the tree for better searching via grep.

@chloelulu
Copy link
Author

Hi @nick-youngblut ,
Thanks for the quick response and suggestions.
I used newick_utils, some species show up in 1XD42-69 and Acetatifactor. But unluckily still can not find Acetatifactor sp003612485 and 1XD42-69 sp003612565
I am eager to hear your advice!

Please see below code and search results.

nw_labels -I gte50comp-lt5cont.nwk | grep '1XD42-69'
s__1XD42-69_sp011959925
s__1XD42-69_sp910585825
s__1XD42-69_sp910586645
s__1XD42-69_sp910586725
s__1XD42-69_sp910586355
s__1XD42-69_sp910589105
s__1XD42-69_sp009911505
s__1XD42-69_sp910577065
s__1XD42-69_sp910588565
s__1XD42-69_sp014287635
s__1XD42-69_sp017625255
s__1XD42-69_sp017624495
nw_labels -I gte50comp-lt5cont.nwk | grep 'Acetatifactor'            
s__Acetatifactor_sp910577235
s__Acetatifactor_sp910584375
s__Acetatifactor_sp011959105
s__Acetatifactor_sp910583845
s__Acetatifactor_sp910585015
s__Acetatifactor_sp017467845
s__Acetatifactor_sp017461775
s__Acetatifactor_sp017522685
s__Acetatifactor_stercoripullorum
s__Acetatifactor_sp017480445
s__Acetatifactor_sp902796105
s__Acetatifactor_sp002368865
s__Acetatifactor_sp017476665
s__Acetatifactor_sp017478245
s__Acetatifactor_sp017621075
s__Acetatifactor_sp900760705
s__Acetatifactor_sp017559435
s__Acetatifactor_sp017465845
s__Acetatifactor_sp017620975
s__Acetatifactor_sp009177215
s__Acetatifactor_sp900066565
s__Acetatifactor_sp900772845
s__Acetatifactor_sp900771995
s__Acetatifactor_sp900766575
s__Acetatifactor_sp002431915
s__Acetatifactor_sp003447295
s__Acetatifactor_intestinalis
s__Acetatifactor_sp015056915
s__Acetatifactor_sp017624835
s__Acetatifactor_sp015057005
s__Acetatifactor_sp017513625
s__Acetatifactor_sp017473665
s__Acetatifactor_sp910578215
s__Acetatifactor_sp910589655
s__Acetatifactor_sp910577665
s__Acetatifactor_sp910586215
s__Acetatifactor_sp016293615
s__Acetatifactor_sp018385425
s__Acetatifactor_sp910577035
s__Acetatifactor_sp910578815
s__Acetatifactor_sp910586485
s__Acetatifactor_sp016303085
s__Acetatifactor_sp910586835
s__Acetatifactor_sp900554205
s__Acetatifactor_sp900755865
s__Acetatifactor_sp904501885
s__Acetatifactor_sp017397645
s__Acetatifactor_sp910579755
s__Acetatifactor_sp910587755
s__Acetatifactor_muris
s__Acetatifactor_sp910588225
s__Acetatifactor_sp910584865
s__Acetatifactor_sp910580225
s__Acetatifactor_sp910578995
s__Acetatifactor_sp910587555
s__Acetatifactor_sp002490995
s__Acetatifactor_sp910585805
s__Acetatifactor_sp910586775
s__Acetatifactor_sp910586515
s__Acetatifactor_sp910577185
s__Acetatifactor_sp910585665
s__Acetatifactor_sp910576125
s__Acetatifactor_sp910584235
s__Acetatifactor_sp910585615
s__Acetatifactor_sp910585425
s__Acetatifactor_sp910589035
s__Acetatifactor_sp910578185
s__Acetatifactor_sp910584435
s__Acetatifactor_sp910586435
s__Acetatifactor_sp002314715
s__Acetatifactor_sp902766425
s__Acetatifactor_sp900320485
s__Acetatifactor_sp016290775
s__Acetatifactor_sp017527205
s__Acetatifactor_sp015056795
s__Acetatifactor_sp017552985
s__Acetatifactor_sp017400965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants