Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial release review [DO NOT MERGE] #29

Closed
wants to merge 609 commits into from
Closed

Initial release review [DO NOT MERGE] #29

wants to merge 609 commits into from

Conversation

scwatts
Copy link
Collaborator

@scwatts scwatts commented May 13, 2024

Background

nf-core/oncoanalyser is a cancer DNA/RNA analysis and reporting pipeline that implements the hmftools workflow, which is developed by the Hartwig Medical Foundation.

The hmftools workflow provides comprehensive analysis of cancer DNA/RNA data where each component is fine-tuned and optimised to operate in an integrated manner to improve genomic characterisation.

There are certain expectations around reference and input data, and how the hmftools workflow should be used. As such, these expectations have guided certain design decisions in oncoanalyser with some described below to provide further context for the community review.

Overarching design choices

Separation of run modes into workflows. The hmftools workflow is flexible and can be used to analyse different types of data (e.g. WGTS, targeted sequencing), which require specific configuration and reference data. There are also additional modes planned for hmftools that will eventually be implemented in oncoanalyser.

Hence, I have structured oncoanalyser so that the majority of the logic corresponding to different run modes is lifted up into workflows rather than at the subworkflow or module level. While this introduces some code duplication, much of the mode-specific logic is simplified by dividing it into different workflows providing a better maintenance and development experience. Retaining this layout will ease the addition of new run modes in the future.

Arbitrarily run individual tools or enter at any point. A key feature of oncoanalyser is the ability to begin analysis from any point in the pipeline or run just a specific set of tools. The primary use-case for this is to skip variant calling and run new/supplementary/modified downstream analyses. For example, a user can run oncoanalyser from PURPLE so long as they provide the required inputs.

Accordingly, outputs from each tool can be provided in the samplesheet and are recognised by oncoanalyser. When a user provides input for a given tool, it is retrieved in the respective subworkflow and inserted into the appropriate input channel. The feature to arbitrarily select tools to run is implemented by wrapping each tool subworkflow call in a conditional statement that checks whether the corresponding tool should be run.

Strong reference genome recommendations. Hartwig support only the reference genomes they distribute for analysis with the hmftools workflows. Hence, oncoanalyser is configured to provide users with a choice of the Hartwig GRCh37 or GRCh38 genome. While not recommended, custom reference genomes can be used (including those from iGenomes) and oncoanalyser will build all necessary indexes for the target analysis. There is additionally an option to write/publish prepared reference data to the output directory.

No egress fee reference data hosting. Given the use of Hartwig-distributed genomes and size of reference data required to run oncoanalyser, a cloud hosting provider is needed to deliver this data to users. I chose to host all default reference data (genomes, hmftools/panel resource files, and other databases) on Cloudflare R2, which is an object storage service that doesn't charge for egress. I've used Cloudflare R2 for this purpose since Jan 2023 with the nominal hosting costs being covered by our organisation (UMCCR).

Other notes

  • I will make PRs outside this review to address each comment/request/etc made

.bumpversion.cfg Outdated Show resolved Hide resolved
Copy link

@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally made it through :D . There are a few things that could probably be solved by already existing plugins or functions. I'd be curious to get your input. For me it is not a blocker for the first release but definitely something that should be added soon to make the pipeline components similar to the rest of nf-core.

Similarly for the test data.

after reviewing the pipeline and from the docs I think I would still not be certain when to use this pipeline, rnadnavar and when to use sarek. I left a note in the readme. Maybe you could clarify this a bit

conf/test.config Outdated Show resolved Hide resolved
lib/WorkflowMain.groovy Show resolved Hide resolved
lib/WorkflowOncoanalyser.groovy Show resolved Hide resolved
modules/nf-core/multiqc/main.nf Show resolved Hide resolved
nextflow.config Show resolved Hide resolved
nextflow.config Show resolved Hide resolved
README.md Show resolved Hide resolved
docs/usage.md Show resolved Hide resolved
@scwatts
Copy link
Collaborator Author

scwatts commented Aug 7, 2024

As discussed on the nf-core Slack, all reviewers have given their approval and so we can now conclude the community review. There are several topics additional of discussion / points of action spun out from the review that will progress independently to the initial release (see above comment).

An immense thank you to all of our reviewers - it's been a significant undertaking and I greatly appreciate all the time and feedback you've kindly given for this community review to be successful.

@scwatts
Copy link
Collaborator Author

scwatts commented Aug 7, 2024

Now closing this PR so I can open the 'Release 1.0.0' PR

@scwatts scwatts closed this Aug 7, 2024
@scwatts scwatts mentioned this pull request Aug 7, 2024
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.