Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining fungi and viruses with GTDB #11

Open
luhugerth opened this issue Oct 25, 2021 · 6 comments
Open

Combining fungi and viruses with GTDB #11

luhugerth opened this issue Oct 25, 2021 · 6 comments

Comments

@luhugerth
Copy link

Hi,

I want to create a Kraken2 DB with GTDB data, since it's so much more curated and reliable than NCBI. However, I do need to be able to detect all domains of life, so I want to include NCBI's fungal, viral and human genomes that you can normally get with kraken2-build. The structure of the output is a bit different with these two approaches, though; Struo2 creates a folder per genome with data within that folder, while kraken2/NCBI just dumps the genomes into a common folder. Will this be a problem for building the DB? Should I make some sort of loop to stash each genome into its folder?

I'm also not sure how to deal with these hybrid taxonomy, but I suppose I could select the archaeal, viral and mammalian nodes from the NCBI taxdump and append these to GTDB's?

Thank you very much for your time and this very nice package!

@nick-youngblut
Copy link
Contributor

Sorry, but currently Struo2 only supports Bacteria & Archaea, given that the GTDB only supports those domains. I'm willing to include directly support for eukaryotic genomes, but it's not clear how best to integrate in eukaryotic genes & taxonomy. See #7

@nick-youngblut
Copy link
Contributor

https://github.com/nick-youngblut/gtdb_to_taxdump can potentially help creating a hybrid taxdump file. If I have some time, I'll create a script for making a hybrid GTDB (archaea + bacteria) + NCBI (eukaryote) taxdump.

@zoey-rw
Copy link

zoey-rw commented Aug 25, 2022

Hi! I was wondering if you have any updated advice for integrating fungal/viral taxonomy & genomes, now that Struo2 has switched over to the new taxdump pipeline.

Thanks,
Zoey

@nick-youngblut
Copy link
Contributor

@zoey-rw I believe that https://github.com/shenwei356/gtdb-taxdump is focused on the GTDB, which only includes bacteria and archaea, so it is still a challenge to integrate other taxa

@jolespin
Copy link

Is the taxdump essential for humann?

@nick-youngblut
Copy link
Contributor

Is the taxdump essential for humann?

No, it's shouldn't be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants