This pipeline is designed to detect CNV in many exome samples prepared with various exome capture kits. The pipeline includes a clustering step which attempts to separate samples into smaller groups which have similar coverage patterns. Given that there are enough number (>12) of samples from each exome panel/kit, the clustering should reduce the batch effects and increase the sensitivity and specificity. The pipeline currently supports GRCh37 genome version only.
If you can't use Docker/Singularity:
- Python >3.6
- hdbscan, scikit-learn, plotly, pandas, sci-py
- R >3.5
- AnnotSV for annotation
- Clone this repository.
- Download a copy of Cromwell
- Run
Rscript install_CNV_prerequisites.R
to install required R packages.
or
pull the Docker container:
docker pull berguner/bsf_cnv_pipeline:latest
singularity pull docker://berguner/bsf_cnv_pipeline:latest
- Collect the aligned and duplicate marked .bam and .bai files in a folder. Symbolic links would also work.
- Update the
test/test.inputs.json
file according to your setup.- Create a sample annotation sheet (CSV) with a column named
Sample Name
containing the sample names.
- Create a sample annotation sheet (CSV) with a column named
- Create a backend configuration file for Cromwell. You can modify the one of the provided backends in this repository.
cd
to theproject_folder
and run the pipeline:
java -Xmx4g -Dconfig.file=/path/to/backend.conf \
-jar /path/to/cromwell-59.jar \
run /path/to/cnv_pipeline/cnv_pipeline.wdl \
--inputs /path/to/my.inputs.json
- Count reads on exonic regions.
- Clustering samples based on coverage (read count) patterns.
- Running CNV calling on each cluster of samples.
- Merging results from CODEX2 and ExomeDepth.
- Annotation of merged results using AnnotSV.
- Aggregation of deleteions found in the cohort.