Skip to content

Workflow (running a pipeline)

sprokopec edited this page Jul 6, 2021 · 12 revisions

Running a pipeline

  1. Download the latest version of the pipeline
cd /cluster/home/username/git/
git clone https://github.com/pughlab/pipeline-suite/
  1. Run FASTQC to verify fastq quality
module load perl

perl ~/git/pipeline-suite/collect_fastqc_metrics.pl \
-d /path/to/fastq_config.yaml \
-t /path/to/fastqc_tool_config.yaml \
-c slurm \
{optional: --rna, --dry-run }

Be sure to run FASTQC to verify fastq quality prior to running downstream pipelines. In particular, ensure read length is consistent, GC content is similar (typically between 40-60%) and files are unique (no duplicated md5sums).

  1. Prepare interval files (ie, for WXS)

For WXS or targeted-sequencing panels, a bed file containing target regions should be provided (listing at minimum: chromosome, start and end positions). Variant calling pipelines MuTect and Mutect2 will add 100bp of padding to each region provided. For consistency, this padding must be manually added prior to variant calling with other tools (ie, Strelka, SomaticSniper, VarDict and VarScan). This function will additionally create a bgzipped version of the padded interval file required by Strelka.

module load perl

perl ~/git/pipeline-suite/scripts/format_interval_bed.pl \
-b /path/to/base/intervals.bed \
-r /path/to/reference.fa

Make sure you have write permissions on the directory containing the intervals bed file as this will write output files to the same directory as the original bed file!

  1. Run DNA (or RNA) pipeline
module load perl

perl ~/git/pipeline-suite/pughlab_dnaseq_pipeline.pl \
-t /path/to/dna_pipeline_config.yaml \
-d /path/to/dna_fastq_config.yaml \
--preprocessing \
--variant_calling \
--create_report \
-c slurm \
--dry-run { optional }

This will generate the directory structure in the output directory (provided in /path/to/dna_pipeline_config.yaml), including a "logs/run_DNA_pipeline_TIMESTAMP/" directory containing a file "run_DNASeq_pipeline.log" which lists the individual tool commands; these can be run separately if "--dry-run" was set, or in the event of a failure at any stage and you don't need to re-run the entire thing (Note: doing so would not regenerate files that already exist).

Clone this wiki locally