-
Notifications
You must be signed in to change notification settings - Fork 2
Running individual tools
Sometimes, you may only want to run one or a few steps, rather than the full pipeline (ie, alignment), or you may already have BAMs (aligned elsewhere) and want to run a specific variant calling tool (ie, mutect2).
Note: to process BAMs produced elsewhere, you MUST have the identical reference used for alignment OR be prepared to subset each BAM to contigs present in your desired reference file (ie, you can not process BAMs aligned by TGL using the hg38 reference on H4H [igenome-human/hg38]!!)
In all cases, tools will write individual commands to file: /path/to/output/directory/TOOL/logs/run_tool_step_sample/script.sh
perl bwa.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/fastq_dna_config.yaml \
-o /path/to/output/directory \
-b /path/to/output/bam.yaml \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
perl gatk.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/bwa_bam_config.yaml \
-o /path/to/output/directory \
-b /path/to/output/bam.yaml \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
perl contest.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
perl get_sequencing_metrics.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
perl get_coverage.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# Run HaplotypeCaller:
perl haplotype_caller.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# Combine and Genotype GVCFs:
perl genotype_gvcfs.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# Annotate and Filter using CPSR:
perl annotate_germline.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-i /path/to/genotype_gvcfs/final/output/directory \
-o /path/to/output/directory \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# Create a panel of normals (germline calls + sequencing artefacts)
# will only run if normal samples are available
perl mutect.pl \
--create-panel-of-normals \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# Generate somatic SNV calls
# can be run on T/N pairs or tumour-only samples (using panel of normals)
perl mutect.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
--pon /path/to/panel_of_normals.vcf { optional if not using the one created here: can also be specified in dna_pipeline_config.yaml if created elsewhere } \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# Create a panel of normals (germline calls + sequencing artefacts)
# will only run if normal samples are available
perl mutect2.pl \
--create-panel-of-normals \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# Generate somatic SNV and INDEL calls
# can be run on T/N pairs or tumour-only samples (using panel of normals)
perl mutect2.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
--pon /path/to/panel_of_normals.vcf { optional if not using the one created here: can also be specified in dna_pipeline_config.yaml if created elsewhere } \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
A few notes about Mutect2: Some samples (WGS and some WXS samples) will have exceptionally long run times. One solution is to run Mutect2 per-chromosome, however this alters the statistical models applied and thus may produce a different set of variants. Currently, stringent filtering plus the final ensemble approach typically make these differences irrelevant, but it is something to be aware of.
# Run T/N pairs to generate CNA calls, plus germline and somatic SNV and INDEL calls.
# This will create a panel of normals from the germline calls (only if not provided) and
# finally, will run T-only samples with germline filtering using the panel of normals
perl varscan.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
--pon /path/to/panel_of_normals.vcf { optional if not using the one created here: can also be specified in dna_pipeline_config.yaml if created elsewhere } \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# Use VarScan output to run Sequenza with gamma tuning for optimized SCNA calls
# will only run on T/N pairs
perl run_sequenza_with_optimal_gamma.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
-c slurm \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# Create a panel of normals (germline calls)
# will only run if normal samples are available
perl strelka.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
--create-panel-of-normals \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# Generate somatic SNV and INDEL calls, as well as SV calls from Manta
# can be run on T/N pairs or tumour-only samples (using panel of normals)
perl strelka.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
--pon /path/to/panel_of_normals.vcf { optional if not using the one created here: can also be specified in dna_pipeline_config.yaml if created elsewhere } \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# will only run on T/N pairs
perl somaticsniper.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
--pon /path/to/panel_of_normals.vcf { optional; can also be specified in dna_pipeline_config.yaml if created elsewhere } \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
SomaticSniper will ONLY run on tumour samples with a matched normal, and will ONLY produce somatic SNV calls (no panel of normals will be generated for this caller). If you wish to perform additional germline filtering, you may provide a panel of normals developed elsewhere.
# for WXS or smaller, targeted-panel datasets
# can be run on T/N pairs or tumour-only samples (using panel of normals)
perl vardict.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
--pon /path/to/panel_of_normals.vcf { optional if not using the one created here; can also be specified in dna_pipeline_config.yaml if created elsewhere } \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# or, for WGS, will split by chromosome
# can be run on T/N pairs or tumour-only samples (using panel of normals)
perl vardict_wgs.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
--pon /path/to/panel_of_normals.vcf { optional if not using the one created here; can also be specified in dna_pipeline_config.yaml if created elsewhere } \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# can be run on T/N pairs or tumour-only samples BUT requires at least 1 normal to estimate baseline distribution
perl gatk_cnv.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
GATK:CNV will ONLY run if at least one normal sample is provided, but will then run all provided samples (T/N and tumour-only).
# Generate somatic SV calls
# can be run on T/N pairs or tumour-only samples
perl delly.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
# Generate germline SV calls
perl delly.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
--germline \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
Note: germline SV calling by Delly has not been thoroughly tested! Please report any issues encountered.
# can be run on T/N pairs or tumour-only samples
perl mavis.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
--manta /path/to/strelka/directory \
--delly /path/to/delly/directory \
--rna /path/to/gatk_rnaseq_bam_config.yaml { optional if pughlab_rnaseq_pipeline.pl was run previously } \
--starfusion /path/to/starfusion/directory { optional if pughlab_rnaseq_pipeline.pl was run previously } \
--fusioncatcher /path/to/fusioncatcher/directory { optional if pughlab_rnaseq_pipeline.pl was run previously } \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
Mavis will also accept fusion calls generated from RNA-Seq data (using STAR-Fusion or FusionCatcher) but the input data must be in the same format as would be output by pughlab_rnaseq_pipeline.pl
# can be run on T/N pairs or tumour-only samples BUT requires at least 1 normal to estimate baseline distribution
perl msi_sensor.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/gatk_bam_config.yaml \
-o /path/to/output/directory \
-c slurm \
--remove \
--dry-run { if this is a dry-run } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
perl pughlab_pipeline_auto_report.pl \
--create_report
-t /path/to/dna_pipeline_config.yaml \
-d DATE \
-c slurm \
--dry-run { if this is a dry-run; NOTE that this will fail if the above pipeline has not completed } \
--no-wait { if not a dry-run and you don't want to wait around for it to finish }
This report generator is still a work in progress and may not run correctly if only a subset of tools were run!