DGVar is developed for identifying high confidence germline deleterious variants (DGVs) based on whole-exome sequencing data. The tool takes BAM files or VCF files as input, and screens DGVs by going through a series of flitering steps (please see the above workflow chart).
DGVar was developed and tested on linux system. To run DGVar, you'll need a linux system and the following softwares being installed:
- Python v2.7+
- Java v1.6 & 1.7
- Samtools v0.1.19
- Genome Analysis Toolkit v2.5.2
- snpEff v4.2
The following databases are required:
- Human GRCh37 reference genome (b37, available at GATK resource bundle)
- NCBI dbSNP v137 (available at GATK resource bundle)
- SnpEff database GRCh37.75 (available at SnpEff)
- NCBI ClinVar v20180805 (slim version available at Box)
- ExAC v0.3.1 (slim version available at Box)
- Exome target regions (Agilent HaloPlex bed file available at Box)
- Onco/TSG annotations (available at here)
- Common variants in EIPM cohort (available at here)
Download the codes as well as the required database; update the path in the main shell script call_variants.sh
, then run a quick test:
- Make sure the current working directory is where dgvar is installed, and then type
cd eg
- Run a test by typing
sh test.sh
- Check results at
results/test/annotated/filt/test.candidates.filter_common.txt.gz
, you should see the variant in BRCA2 gene (13:32914437 GT --> G).
To run DGVar:
-
Update the path in the main shell script
call_variants.sh
-
Run shell script
sh call_variants.sh in.bam sampleID target.bed
where in.bam is the input bam file, sampleID is an unique id (no space allowed), target.bed is the bed file listing genomic regions to check for variants
Run an example using the script in the eg
folder.
Expected run time may vary (10-30 min) depending on input file size and your computing resources.