Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸ› οΈ Modification to bin cobalt outputs and add purity forcing #3

Merged
merged 12 commits into from
Aug 13, 2024

Conversation

ddomenico
Copy link
Contributor

Several modifications in order to improve workflow for routine use.

  • Added binning in order to decrease oversegmentation by grouping together combinations of probes with similar logR values.
  • Added option to specify min/max purity and regenerate purple solution.
  • Added check to not rerun amber/cobalt if results previously generated.

Copy link
Contributor

@juanesarango juanesarango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome man! This will clean the profiles a lot. Great job.

-ref_genome_version ${params.genomeVersion}
if [ -f "${params.outdir}/amber/${tumor}.amber.baf.tsv.gz" ] && \
[ -f "${params.outdir}/amber/${tumor}.amber.baf.pcf" ] && \
[ -f "${params.outdir}/amber/${tumor}.amber.qc" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks are managed by nextflow. If nextflow sees the outputs you expect in a previous run, it will cached these step.

Was this not happening for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the updates for the force -- because the workflow has finished completely the runDir is already copied to the publishDir and it goes to redo these steps since the output files don't exist in the new runDir.

main.nf Outdated
last_idx = cobalt_ratio_pcf_probes_logR.index[-1]

cobalt_ratio_pcf_probes_logR.to_csv("${tumor}.cobalt.ratio.pcf", sep='\\t', index=False)
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add here some """.stripIndent() to the script so the script is properly indented when added to a file.

See the other processes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I missed this -- surprised this didn't cause an error.

@@ -1,4 +1,5 @@
params.cores = 4
params.cores = 1
params.memory = '4 GB'
Copy link
Contributor

@juanesarango juanesarango Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im thinking about keeping 32 GB, and use 4 GB for testing in nf.test:

params {
    memory = '4 GB'
    tumor = "TEST"
    binProbes = 100
    binLogR = 0.5
    cobalt_ratio_pcf = "${projectDir}/tests/outdir/cobalt/TEST.cobalt.ratio.pcf"
    cobalt_ratio_tsv = "${projectDir}/tests/outdir/cobalt/TEST.cobalt.ratio.tsv.gz"
}

Unless you think it doesn't need that much memory by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up adding a memory parameter because it seems sample/system dependent. I prefer to keep it low and then configure it higher on samples or systems where necessary but either way works.

@ddomenico ddomenico merged commit 1f9517c into main Aug 13, 2024
1 check passed
@ddomenico ddomenico deleted the binning-mod branch August 13, 2024 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants