Skip to content

Latest commit

 

History

History
44 lines (33 loc) · 2.36 KB

README.md

File metadata and controls

44 lines (33 loc) · 2.36 KB

DGVar

workflow

Overview

DGVar is developed for identifying high confidence germline deleterious variants (DGVs) based on whole-exome sequencing data. The tool takes BAM files or VCF files as input, and screens DGVs by going through a series of flitering steps (please see the above workflow chart).

System requirements

DGVar was developed and tested on linux system. To run DGVar, you'll need a linux system and the following softwares being installed:

  • Python v2.7+
  • Java v1.6 & 1.7
  • Samtools v0.1.19
  • Genome Analysis Toolkit v2.5.2
  • snpEff v4.2

Pre-compiled database

The following databases are required:

  • Human GRCh37 reference genome (b37, available at GATK resource bundle)
  • NCBI dbSNP v137 (available at GATK resource bundle)
  • SnpEff database GRCh37.75 (available at SnpEff)
  • NCBI ClinVar v20180805 (slim version available at Box)
  • ExAC v0.3.1 (slim version available at Box)
  • Exome target regions (Agilent HaloPlex bed file available at Box)
  • Onco/TSG annotations (available at here)
  • Common variants in EIPM cohort (available at here)

Installation

Download the codes as well as the required database; update the path in the main shell script call_variants.sh, then run a quick test:

  • Make sure the current working directory is where dgvar is installed, and then type cd eg
  • Run a test by typing sh test.sh
  • Check results at results/test/annotated/filt/test.candidates.filter_common.txt.gz, you should see the variant in BRCA2 gene (13:32914437 GT --> G).

Instructions for use

To run DGVar:

  • Update the path in the main shell script call_variants.sh

  • Run shell script sh call_variants.sh in.bam sampleID target.bed

    where in.bam is the input bam file, sampleID is an unique id (no space allowed), target.bed is the bed file listing genomic regions to check for variants

Run an example using the script in the eg folder.

Expected run time may vary (10-30 min) depending on input file size and your computing resources.