Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kraken2 #36

Closed
edmundmiller opened this issue Apr 26, 2022 · 10 comments
Closed

Add Kraken2 #36

edmundmiller opened this issue Apr 26, 2022 · 10 comments
Assignees
Milestone

Comments

@edmundmiller
Copy link
Collaborator

Not sure what was going on with this one before, maybe @csawye01 or @drpatelh can shed some light here.

@edmundmiller edmundmiller added this to the v1.1.0 milestone Apr 26, 2022
@drpatelh
Copy link
Member

It was to look for contamination in the reads in a quick and unbiased way across organisms.

@matthdsm
Copy link
Collaborator

matthdsm commented Oct 7, 2022

I'm not really sure this in the scope of a demultiplexing workflow. Sounds to me like that's more like extensive input QC to be done before starting a wf.

@drpatelh
Copy link
Member

drpatelh commented Oct 7, 2022

I guess the thought process behind this was that the raw reads come out of the demultiplexing workflow so it would be good to add this as an option here as "pre-pipeline" QC (like FastQC) as this sort of information could also be useful to Sequencing facilities that may be interested in the results of the demultiplexing. I suspect it's more effort to add Kraken2 to all individual pipelines but also not fussed if this isn't added here. The biggest complication is sourcing a Kraken2 databases that is comprehensive enough to contain "most" species for contamination detection.

@drpatelh
Copy link
Member

drpatelh commented Oct 7, 2022

Be awesome to have contamination detection built-in here though.

@matthdsm
Copy link
Collaborator

matthdsm commented Oct 7, 2022

No problem, we can keep te issue around for the future
Will there be room on the refgenie server for a kraken database?

@matthdsm
Copy link
Collaborator

matthdsm commented Oct 7, 2022

Also, does kraken support remote db's or does it have to be local?
e.g., can it read its files straight from S3 (for example) or does it have to stage the entire thing locally?

@edmundmiller edmundmiller removed this from the v1.1.0 milestone Oct 13, 2022
@apeltzer
Copy link
Member

Hey @matthdsm - I have a subworkflow already fleshed out that does allow even to subsample fastqs using seqtk (unbiased) to speed up the process and then runs kraken2 on it, creating a report that can be fed into MultiQC directly. We use this internally already, will contribute now to subworkflows in modules and then we can simply take it from there 👍🏻

@apeltzer
Copy link
Member

See a PR to add subworkflow for this - we use this already in our own, but I'm attempting to contribute and recycle subworkflows from nf-core to make things easier 👍🏻

nf-core/modules#2397

@Aratz
Copy link
Contributor

Aratz commented Oct 16, 2023

Hi!

We actually have a use case for this at our platform and would like to include @apeltzer 's subworkflow into nf-core/demultiplex.

Is that ok if we start working on this during the Hackathon?

@Aratz Aratz self-assigned this Oct 17, 2023
@Aratz Aratz mentioned this issue Oct 18, 2023
10 tasks
@edmundmiller edmundmiller mentioned this issue Mar 27, 2024
@apeltzer apeltzer added this to the 1.5.0 milestone Aug 1, 2024
@apeltzer
Copy link
Member

apeltzer commented Aug 8, 2024

This is now implemented via #220 - will be part of 1.5.0 release

@apeltzer apeltzer closed this as completed Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants