Initial release review [DO NOT MERGE] #29

scwatts · 2024-05-13T10:02:57Z

Background

nf-core/oncoanalyser is a cancer DNA/RNA analysis and reporting pipeline that implements the hmftools workflow, which is developed by the Hartwig Medical Foundation.

The hmftools workflow provides comprehensive analysis of cancer DNA/RNA data where each component is fine-tuned and optimised to operate in an integrated manner to improve genomic characterisation.

There are certain expectations around reference and input data, and how the hmftools workflow should be used. As such, these expectations have guided certain design decisions in oncoanalyser with some described below to provide further context for the community review.

Overarching design choices

Separation of run modes into workflows. The hmftools workflow is flexible and can be used to analyse different types of data (e.g. WGTS, targeted sequencing), which require specific configuration and reference data. There are also additional modes planned for hmftools that will eventually be implemented in oncoanalyser.

Hence, I have structured oncoanalyser so that the majority of the logic corresponding to different run modes is lifted up into workflows rather than at the subworkflow or module level. While this introduces some code duplication, much of the mode-specific logic is simplified by dividing it into different workflows providing a better maintenance and development experience. Retaining this layout will ease the addition of new run modes in the future.

Arbitrarily run individual tools or enter at any point. A key feature of oncoanalyser is the ability to begin analysis from any point in the pipeline or run just a specific set of tools. The primary use-case for this is to skip variant calling and run new/supplementary/modified downstream analyses. For example, a user can run oncoanalyser from PURPLE so long as they provide the required inputs.

Accordingly, outputs from each tool can be provided in the samplesheet and are recognised by oncoanalyser. When a user provides input for a given tool, it is retrieved in the respective subworkflow and inserted into the appropriate input channel. The feature to arbitrarily select tools to run is implemented by wrapping each tool subworkflow call in a conditional statement that checks whether the corresponding tool should be run.

Strong reference genome recommendations. Hartwig support only the reference genomes they distribute for analysis with the hmftools workflows. Hence, oncoanalyser is configured to provide users with a choice of the Hartwig GRCh37 or GRCh38 genome. While not recommended, custom reference genomes can be used (including those from iGenomes) and oncoanalyser will build all necessary indexes for the target analysis. There is additionally an option to write/publish prepared reference data to the output directory.

No egress fee reference data hosting. Given the use of Hartwig-distributed genomes and size of reference data required to run oncoanalyser, a cloud hosting provider is needed to deliver this data to users. I chose to host all default reference data (genomes, hmftools/panel resource files, and other databases) on Cloudflare R2, which is an object storage service that doesn't charge for egress. I've used Cloudflare R2 for this purpose since Jan 2023 with the nominal hosting costs being covered by our organisation (UMCCR).

Other notes

I will make PRs outside this review to address each comment/request/etc made

* restrict target regions to canonical Ensembl transcripts

modules/local/markdups/meta.yml

subworkflows/local/virusbreakend_calling/main.nf

lib/Utils.groovy

.bumpversion.cfg

FriederikeHanssen

Finally made it through :D . There are a few things that could probably be solved by already existing plugins or functions. I'd be curious to get your input. For me it is not a blocker for the first release but definitely something that should be added soon to make the pipeline components similar to the rest of nf-core.

Similarly for the test data.

after reviewing the pipeline and from the docs I think I would still not be certain when to use this pipeline, rnadnavar and when to use sarek. I left a note in the readme. Maybe you could clarify this a bit

conf/test.config

subworkflows/local/utils_nfcore_oncoanalyser_pipeline/main.nf

lib/WorkflowMain.groovy

lib/WorkflowOncoanalyser.groovy

modules/local/custom/write_reference_data/main.nf

modules/nf-core/multiqc/main.nf

nextflow.config

README.md

docs/usage.md

Apply the seventh set of reviewer recommendations

Allow RNA BAM from samplesheet as LILAC input

scwatts · 2024-08-07T05:28:24Z

I've created issues to continue on-going activities that do not block the initial release:

I've also marked remaining corresponding threads as resolved just to cleanly close off - let's continue discussing in the relevant GH issues or over Slack.

scwatts · 2024-08-07T05:29:20Z

As discussed on the nf-core Slack, all reviewers have given their approval and so we can now conclude the community review. There are several topics additional of discussion / points of action spun out from the review that will progress independently to the initial release (see above comment).

An immense thank you to all of our reviewers - it's been a significant undertaking and I greatly appreciate all the time and feedback you've kindly given for this community review to be successful.

scwatts · 2024-08-07T05:34:32Z

Now closing this PR so I can open the 'Release 1.0.0' PR

scwatts and others added 30 commits February 29, 2024 17:03

Adjust, fix bwa-mem2 index handling

de1aad0

Remove deprecated MarkDups -multi_bam argument

7e0fb7f

Template update for nf-core/tools version 2.13.1

60a62eb

Fix AMBER subworkflow TN mode

f47f33e

Fix COBALT subworkflow TN mode

08a58c8

Adjust indentation

c53ac44

Fix SAGE calling subworkflow TN mode

df1f5b7

Improving handling of 'no merge' RNA BAM scenarios

752f57a

Set outputs for alignment workflows

c648028

Add missing channel docs

db6bbf2

Further work on RNA BAM handling

c1f3142

Use explicit returns in .branch ops

bd45719

Do not index RNA BAMs prior to merge

8d06484

Update acknowledgements

361afb5

Bump PURPLE to 4.0.2

124b79b

Correct comment capitalisation

31a2d12

Handle backslashes in linxreport more consistently

22d3fd3

Correct typo in PAVE somatic module

8ced85b

Bump version: 0.3.0 → 0.3.1

0881ae9

Remove obsolete TODOs

c7e87c2

Fix Isofox singularity container URL

1653daf

Bump TSO500 data bundle version

f618283

* restrict target regions to canonical Ensembl transcripts

Remove -force_pathogenic_pass in PAVE somatic

2bd379c

Merge branch 'nf-core-template-merge-2.13.1' into dev

0d151bd

Apply prettier linting

3196720

Fix indenting

37a4022

Fix more linting failures

4085b4f

More indenting fixes

b4a37b3

Update pipeline description

05c3503

Update test config, adjust relevant arg schema

1b9a467

SPPearce reviewed Jul 30, 2024

View reviewed changes

modules/local/markdups/meta.yml Outdated Show resolved Hide resolved

FriederikeHanssen mentioned this pull request Jul 30, 2024

Shrink reference data to support execution on GHA runners #71

Open

edmundmiller reviewed Jul 30, 2024

View reviewed changes

subworkflows/local/virusbreakend_calling/main.nf Show resolved Hide resolved

nvnieuwk reviewed Jul 30, 2024

View reviewed changes

lib/Utils.groovy Show resolved Hide resolved

edmundmiller reviewed Jul 30, 2024

View reviewed changes

.bumpversion.cfg Outdated Show resolved Hide resolved

FriederikeHanssen reviewed Jul 30, 2024

View reviewed changes

scwatts mentioned this pull request Jul 31, 2024

[oncoanalyser] Add simulated test read data and samplesheets nf-core/test-datasets#1270

Merged

scwatts and others added 12 commits July 31, 2024 19:14

Add summary note on pipeline function in README.md

f142fa6

Correct Conda GRIDSS build version

c0d6b07

Correct Conda SAGE version

7394aa2

Comment out any hint of citations

b660853

Use the HMF abbreviation more in README.md

a6b5cc2

Update input samplesheet URL for tests

76cb780

Fix MarkDups URLs

3a835c6

Use stub samplesheet for test_stub profile

3f7d818

Switch to BAM inputs for test_stub samplesheet

f876e18

Merge pull request #72 from nf-core/initial-release-review-changes

41010dd

Apply the seventh set of reviewer recommendations

Allow RNA BAM from samplesheet as input to LILAC

7ec0f39

Merge pull request #76 from nf-core/lilac-samplesheet-bam-input

49872fc

Allow RNA BAM from samplesheet as LILAC input

scwatts closed this Aug 7, 2024

scwatts mentioned this pull request Aug 7, 2024

Release 1.0.0 #85

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial release review [DO NOT MERGE] #29

Initial release review [DO NOT MERGE] #29

scwatts commented May 13, 2024

FriederikeHanssen left a comment

scwatts commented Aug 7, 2024

scwatts commented Aug 7, 2024

scwatts commented Aug 7, 2024

Initial release review [DO NOT MERGE] #29

Initial release review [DO NOT MERGE] #29

Conversation

scwatts commented May 13, 2024

Background

Overarching design choices

Other notes

FriederikeHanssen left a comment

Choose a reason for hiding this comment

scwatts commented Aug 7, 2024

scwatts commented Aug 7, 2024

scwatts commented Aug 7, 2024