-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the new Greengenes2 database for classification #658
Comments
Hi there, Greengenes2 was discussed in https://nfcore.slack.com/archives/CEA7TBJGJ/p1690539708378009 & https://nfcore.slack.com/archives/CEA7TBJGJ/p1678204777328909. Using I hope for the integration of Greengenes2 for DADA2 classifications, that should solve all preprocessing and make the db integration relatively easy to add here, including an upload to Zenodo which is much preferred to a university DB. Greengenes2 was said to be "soon-ish" provided as DADA2 database in Zenodo, see benjjneb/dada2#1680 and benjjneb/dada2#1829. |
Greengenes2 support is now for QIIME2 available in the dev branch and will be in the next release. I dont close that issue though because there is still no news for DADA2 (or I missed it). |
Hi Daniel.
Thank you for the update. I do see greengenes2 in the github page but it
doesnt show as one of the parameter options on your nextflow page. Just
informing you. I havent used it yet but I plan to very soon.
Best regards,
Ali
…On Fri, Jan 12, 2024 at 12:04 AM Daniel Straub ***@***.***> wrote:
Greengenes2 support is now for QIIME2 available in the dev branch and will
be in the next release. I dont close that issue though because there is
still no news for DADA2 (or I missed it).
—
Reply to this email directly, view it on GitHub
<#658 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEFFT7KL5RQQF4SUYKTCDKTYODVCPAVCNFSM6AAAAAA7E5LGKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGYYTEOJYGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
*Ali Mirza, Ph.D.*
IMPACTT Bioinformatician at Simon Fraser University, Burnaby, Canada
Email: ***@***.***
<https://www.mail.ubc.ca/owa/redir.aspx?C=IFx7SlyOwIiDLsCxgwvbmXMj4tAPRlHKUpt0pxg2k7ZC40YDoZbVCA..&URL=mailto%3aamirza%40alumni.ubc.ca>
or ***@***.***
LinkedIn URL: www.linkedin.com/in/ali-i-mirza
|
Hi @aimirza , it seems that |
My mistake, I was looking at |
How are you using qiime2 to classify ASVs with the greengenes2 database? Are you following the 'How to use it' guidelines from the link you shared or are you using a pre-trained classifier? |
The following files are used ampliseq/conf/ref_databases.config Lines 373 to 378 in 0473e15
to extract sequences with primers and train the classifier. |
Wow, extracting reads ( |
yes, I tested it and it takes long, check out #666 (comment) |
Changing the time limit doesn't seem to work properly. I supplied new config rules to the
I also tried the codes below, but it still failed after 1 day:
|
What about the cpus and memory, are they altered successfully? If yes, check your |
I also had set
I also don't see multiple jobs running at the same time. The only related parameters I see listed in the log file is The supplied config file (
N E X T F L O W ~ version 23.04.3 |
I think I got It to work after increasing the number of CPUs but now I have another problem. Apparently It is running out of space " |
Hi there, this is going way out of the scope of this issue (adding gg2 database). Your problems are not related to gg2, but to executing a large job on your hpc. The error on too less space is most likely related your hpc setting for tmp/scratch data, please contact you sys admin. |
I need to know a couple things about using the gg2 database. When using the process |
To reduce memory usage, I will add the parameter |
None of that worked, but... I finally got it to work after about weeks of trying, HURRAY!! To address the issue, I created an additional configuration file that includes the following adjustments passed as a file to the parameter Binding the /scratch Directory in Singularity:
This command explicitly binds the Setting Environment Variables for Temporary Directories:
These environment variables ( Additionally, setting Would you know which specific changes likely fixed the problem? With 8 cpus of 10GB each, I finished classifying my ASVs in 20 hours. |
Thanks for detailing the solution! |
Actually, binding Singularity to the specified directory using |
Description of feature
Greengenes2 recently came out. Greengenes2 is a new release of the Greengenes database that has been redesigned from the ground up and backed by whole genomes, focusing on harmonizing 16S rRNA and shotgun metagenomic datasets. It is also
much larger than past resources in its phylogenetic coverage, as compared to SILVA, Greengenes and GTDB. It would be great to add this database as an optional feature for classifying sequences. Usage instructions are below. It has a QIIME 2 plugin. Notice that the approaches to classify sequences is different between V4 and non-V4 sequences.
Paper: https://www.nature.com/articles/s41587-023-01845-1
How to use it: https://forum.qiime2.org/t/introducing-greengenes2-2022-10/25291
The text was updated successfully, but these errors were encountered: