-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining fungi and viruses with GTDB #11
Comments
Sorry, but currently Struo2 only supports Bacteria & Archaea, given that the GTDB only supports those domains. I'm willing to include directly support for eukaryotic genomes, but it's not clear how best to integrate in eukaryotic genes & taxonomy. See #7 |
https://github.com/nick-youngblut/gtdb_to_taxdump can potentially help creating a hybrid taxdump file. If I have some time, I'll create a script for making a hybrid GTDB (archaea + bacteria) + NCBI (eukaryote) taxdump. |
Hi! I was wondering if you have any updated advice for integrating fungal/viral taxonomy & genomes, now that Struo2 has switched over to the new taxdump pipeline. Thanks, |
@zoey-rw I believe that https://github.com/shenwei356/gtdb-taxdump is focused on the GTDB, which only includes bacteria and archaea, so it is still a challenge to integrate other taxa |
Is the taxdump essential for humann? |
No, it's shouldn't be needed. |
Hi,
I want to create a Kraken2 DB with GTDB data, since it's so much more curated and reliable than NCBI. However, I do need to be able to detect all domains of life, so I want to include NCBI's fungal, viral and human genomes that you can normally get with
kraken2-build
. The structure of the output is a bit different with these two approaches, though; Struo2 creates a folder per genome with data within that folder, while kraken2/NCBI just dumps the genomes into a common folder. Will this be a problem for building the DB? Should I make some sort of loop to stash each genome into its folder?I'm also not sure how to deal with these hybrid taxonomy, but I suppose I could select the archaeal, viral and mammalian nodes from the NCBI taxdump and append these to GTDB's?
Thank you very much for your time and this very nice package!
The text was updated successfully, but these errors were encountered: