Skip to content

CNV analysis pipeline for germline whole exome samples

Notifications You must be signed in to change notification settings

berguner/cnv_pipeline

Repository files navigation

Germline CNV Detection Pipeline for Whole Exome data

Introduction

This pipeline is designed to detect CNV in many exome samples prepared with various exome capture kits. The pipeline includes a clustering step which attempts to separate samples into smaller groups which have similar coverage patterns. Given that there are enough number (>12) of samples from each exome panel/kit, the clustering should reduce the batch effects and increase the sensitivity and specificity. The pipeline currently supports GRCh37 genome version only.

Software Requirements

If you can't use Docker/Singularity:

  • Python >3.6
    • hdbscan, scikit-learn, plotly, pandas, sci-py
  • R >3.5
  • AnnotSV for annotation

How to run

  • Clone this repository.
  • Download a copy of Cromwell
  • Run Rscript install_CNV_prerequisites.R to install required R packages.
    or
    pull the Docker container:
    docker pull berguner/bsf_cnv_pipeline:latest
    singularity pull docker://berguner/bsf_cnv_pipeline:latest
  • Collect the aligned and duplicate marked .bam and .bai files in a folder. Symbolic links would also work.
  • Update the test/test.inputs.json file according to your setup.
    • Create a sample annotation sheet (CSV) with a column named Sample Name containing the sample names.
  • Create a backend configuration file for Cromwell. You can modify the one of the provided backends in this repository.
  • cd to the project_folder and run the pipeline:
java -Xmx4g -Dconfig.file=/path/to/backend.conf \
  -jar /path/to/cromwell-59.jar \
  run /path/to/cnv_pipeline/cnv_pipeline.wdl \
  --inputs /path/to/my.inputs.json

Pipeline steps

  1. Count reads on exonic regions.
  2. Clustering samples based on coverage (read count) patterns.
  3. Running CNV calling on each cluster of samples.
  4. Merging results from CODEX2 and ExomeDepth.
  5. Annotation of merged results using AnnotSV.
  6. Aggregation of deleteions found in the cohort.

About

CNV analysis pipeline for germline whole exome samples

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published