This repository has been archived by the owner on Jan 3, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from UCSC-Treehouse/v3-local-fasta
V3 local fasta
- Loading branch information
Showing
12 changed files
with
492 additions
and
198 deletions.
There are no files selected for viewing
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# GATK RNAseq Variant Calling Analysis | ||
Docker image of GATK best practices approach for variant calling on RNAseq data. | ||
|
||
For full details please visit https://software.broadinstitute.org/gatk/guide/article?id=3891. | ||
|
||
The steps for the workflow are: | ||
* Add read groups, sort, mark duplicates, and create index using PICARD (https://broadinstitute.github.io/picard/) | ||
* Split’N’Trim and reassign mapping qualities | ||
* GATK IndelRealigner | ||
* snpEff (http://snpeff.sourceforge.net/SnpEff.html) 5) GATK HaplotypeCaller | ||
* GATK SelectVariants | ||
|
||
## Usage | ||
docker pull linhvoyo/gatk_rna_variant_v2 | ||
|
||
docker run -it -v /path/to/ref_folder:/data/ref \ -v /path/to/input_folder/:/data/work \ | ||
-e refgenome=input_reference_genome.fa \ | ||
-e input=sample.sorted.bam \ | ||
linhvoyo/gatk_rna_variant_v2 | tee >>/path/to/input_folder/out | ||
|
||
## Details | ||
docker run | ||
Command to initiate docker. | ||
|
||
-v /path/to/ref folder:/data/ref | ||
State the full path of directory containing the reference genome. The -v parameter will mount the listed directory into the docker container. | ||
|
||
-v /path/to/input folder/:.data/work | ||
State the full path of working directory containing the input bam file. | ||
|
||
-e refgenome=input reference genome.fa | ||
Provide the name of reference genome. | ||
|
||
-e input=sample.sorted.bam | ||
Provide the name of the input file. The docker takes in a sorted bam file that was aligned using STAR aligner (https://github.com/alexdobin/STAR). | ||
|
||
## Example run command and the expected output | ||
|
||
### Run command | ||
docker run -it -v /data/ref:/data/ref \ | ||
-v $(pwd):/data/work \ | ||
-e refgenome=GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ | ||
-e input=test.bam \ | ||
linhvoyo/gatk_rna_variant_v2 | tee >> test.bam.out | ||
|
||
### Output files | ||
All output files will be in the mounted directory containing test.bam. | ||
|
||
test.split.realigned.bam | ||
test.split.realigned.bai | ||
test.split.realigned.HaplotypeCaller.filtered.ann.vcf | ||
test.split.realigned.HaplotypeCaller.filtered.ann.vcf.idx | ||
|
||
Variant calls filtered by the list of canonical variants obtained from Max H. The coordinates were converted from hg19 to hg38 using UCSC liftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver). | ||
|
||
test.split.realigned.HaplotypeCaller.filtered.ann.vl.vcf | ||
test.split.realigned.HaplotypeCaller.filtered.ann.vl.vcf.idx | ||
|
||
Variant calls filtered by cancer genes. The list of cancer genes was downloaded from Cancer Census (https://cancer.sanger.ac.uk). The coordiantes are in hg38. | ||
|
||
test.split.realigned.HaplotypeCaller.filtered.ann.cg.vcf | ||
test.split.realigned.HaplotypeCaller.filtered.ann.cg.vcf.idx | ||
|
||
The generated ”log” shows a brief overview of the variant calling process. The file will show ”Done (linhvoy- o/gatk rna variant v2) - test.bam” once sample has finished processing. | ||
|
||
test.bam.log | ||
|
||
### Example Output | ||
Wed Mar 15 17:06:18 UTC 2017 Add Read Groups - test.bam | ||
Wed Mar 15 17:06:27 UTC 2017 Add Read Groups - test.bam - Done | ||
Wed Mar 15 17:06:27 UTC 2017 SplitNCigarReads - test.bam | ||
Wed Mar 15 17:06:38 UTC 2017 SplitNCigarReads - test.bam - Done | ||
Wed Mar 15 17:06:38 UTC 2017 RealignerTargetCreator - test.bam | ||
Wed Mar 15 17:11:47 UTC 2017 RealignerTargetCreator - test.bam - Done Wed Mar 15 17:11:47 UTC 2017 IndelRealigner - test.bam | ||
Wed Mar 15 17:12:01 UTC 2017 IndelRealigner - test.bam - Done | ||
Wed Mar 15 17:12:01 UTC 2017 HaplotypeCaller - | ||
Wed Mar 15 17:47:56 UTC 2017 HaplotypeCaller - | ||
Wed Mar 15 17:47:56 UTC 2017 VariantFiltration | ||
Wed Mar 15 17:48:00 UTC 2017 VariantFiltration | ||
Wed Mar 15 17:48:00 UTC 2017 snpEff - test.bam | ||
Wed Mar 15 17:49:34 UTC 2017 snpEff - test.bam | ||
Wed Mar 15 17:49:34 UTC 2017 SelectVariants - variant list - test.bam | ||
Wed Mar 15 17:49:38 UTC 2017 SelectVariants - variant list - test.bam - Done Wed Mar 15 17:49:38 UTC 2017 SelectVariants - gene list - test.bam | ||
Wed Mar 15 17:49:42 UTC 2017 SelectVariants - gene list - test.bam - Done Wed Mar 15 17:49:42 UTC 2017 Done (linhvoyo/gatk_rna_variant_v2) - test.bam |
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
#!/bin/bash | ||
#v2 add tempdir option. | ||
|
||
#if [[ $tempdir = *[!\ ]* ]]; then tmpdir="-Djava.io.tmpdir="$tempdir; else tmpdir=" "; fi | ||
set -eu -o pipefail | ||
|
||
|
||
i=$input | ||
r=$refgenome | ||
tmpdir="-Djava.io.tmpdir="/data/work/temp_$input | ||
gl=cc_genelist.bed | ||
vl=hglft_genome_24bb_d53250_variants.bed | ||
|
||
|
||
|
||
|
||
if [ ! -f "${i/.bam}.readgroups.bam.ok" ]; then | ||
echo `date` "Add Read Groups - $input">>/data/work/${i/.bam}.log; | ||
java -jar $tmpdir /root/picard.jar AddOrReplaceReadGroups I=/data/work/$i O=/data/work/${i/.bam}.readgroups.bam SORT_ORDER=coordinate RGLB=${i/.bam} RGPL=illumina RGPU=illumina RGSM=${i/.bam} VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true ; | ||
echo `date` "Add Read Groups - $input - Done">>/data/work/${i/.bam}.log; | ||
>${i/.bam}.readgroups.bam.ok; | ||
fi | ||
|
||
if [ ! -f "${i/.bam}.split.bam.ok" ]; then | ||
echo `date` "SplitNCigarReads - $input">>/data/work/${i/.bam}.log; | ||
java -Xmx50000m -jar $tmpdir /root/GenomeAnalysisTK.jar -T SplitNCigarReads -R /data/ref/$r -I /data/work/${i/.bam}.readgroups.bam -o /data/work/${i/.bam}.split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS; | ||
echo `date` "SplitNCigarReads - $input - Done">>/data/work/${i/.bam}.log; | ||
>${i/.bam}.split.bam.ok; | ||
fi | ||
|
||
if [ ! -f "${i/.bam}.split.intervals.ok" ]; then | ||
echo `date` "RealignerTargetCreator - $input">>/data/work/${i/.bam}.log; | ||
java -Xmx5000m -jar $tmpdir /root/GenomeAnalysisTK.jar \ | ||
-T RealignerTargetCreator \ | ||
-R /data/ref/$r \ | ||
-I /data/work/${i/.bam}.split.bam \ | ||
-o /data/work/${i/.bam}.split.intervals \ | ||
-nt 24 ; | ||
echo `date` "RealignerTargetCreator - $input - Done">>/data/work/${i/.bam}.log; | ||
>${i/.bam}.split.intervals.ok; | ||
fi | ||
|
||
if [ ! -f "${i/.bam}.split.realigned.bam.ok" ]; then | ||
echo `date` "IndelRealigner - $input">>/data/work/${i/.bam}.log; | ||
java -Xmx50000m -jar $tmpdir /root/GenomeAnalysisTK.jar \ | ||
-T IndelRealigner \ | ||
-R /data/ref/$r \ | ||
-I /data/work/${i/.bam}.split.bam \ | ||
-targetIntervals /data/work/${i/.bam}.split.intervals \ | ||
-o /data/work/${i/.bam}.split.realigned.bam; | ||
echo `date` "IndelRealigner - $input - Done">>/data/work/${i/.bam}.log; | ||
>${i/.bam}.split.realigned.bam.ok; | ||
fi | ||
|
||
if [ ! -f "${i/.bam}.split.realigned.HaplotypeCaller.vcf.ok" ]; then | ||
echo `date` "HaplotypeCaller - $input">>/data/work/${i/.bam}.log; | ||
java -Xmx20000m -jar $tmpdir /root/GenomeAnalysisTK.jar \ | ||
-T HaplotypeCaller \ | ||
-R /data/ref/$r \ | ||
-I /data/work/${i/.bam}.split.realigned.bam \ | ||
-dontUseSoftClippedBases \ | ||
-stand_call_conf 20.0 \ | ||
-stand_emit_conf 20.0 \ | ||
-o /data/work/${i/.bam}.split.realigned.HaplotypeCaller.vcf \ | ||
-nct 4 ; | ||
echo `date` "HaplotypeCaller - $input - Done">>/data/work/${i/.bam}.log; | ||
>${i/.bam}.split.realigned.HaplotypeCaller.vcf.ok; | ||
fi | ||
|
||
if [ ! -f "${i/.bam}.split.realigned.HaplotypeCaller.filtered.vcf.ok" ]; then | ||
echo `date` "VariantFiltration - $input">>/data/work/${i/.bam}.log; | ||
java -Xmx20000m -jar $tmpdir /root/GenomeAnalysisTK.jar \ | ||
-T VariantFiltration \ | ||
-R /data/ref/$r \ | ||
-V /data/work/${i/.bam}.split.realigned.HaplotypeCaller.vcf \ | ||
-window 35 \ | ||
-cluster 3 \ | ||
-filterName FS \ | ||
-filter "FS > 30.0" \ | ||
-filterName QD \ | ||
-filter "QD < 2.0" \ | ||
-o /data/work/${i/.bam}.split.realigned.HaplotypeCaller.filtered.vcf; | ||
echo `date` "VariantFiltration - $input - Done">>/data/work/${i/.bam}.log; | ||
>${i/.bam}.split.realigned.HaplotypeCaller.filtered.vcf.ok; | ||
fi | ||
|
||
if [ ! -f "${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vcf.ok" ]; then | ||
echo `date` "snpEff - $input">>/data/work/${i/.bam}.log; | ||
java -Xmx10000m -jar $tmpdir /root/snpEff/snpEff.jar -v -classic GRCh38.86 /data/work/${i/.bam}.split.realigned.HaplotypeCaller.filtered.vcf > /data/work/${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vcf; | ||
rm snpEff_genes.txt; | ||
rm snpEff_summary.html; | ||
echo `date` "snpEff - $input - Done">>/data/work/${i/.bam}.log; | ||
>${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vcf.ok; | ||
fi | ||
|
||
if [ ! -f "${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vl.vcf.ok" ]; then | ||
echo `date` "SelectVariants - variant list - $input">>/data/work/${i/.bam}.log; | ||
java -Xmx20000m -jar $tmpdir /root/GenomeAnalysisTK.jar \ | ||
-T SelectVariants \ | ||
-R /data/ref/$refgenome \ | ||
-V /data/work/${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vcf \ | ||
-o /data/work/${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vl.vcf \ | ||
-L /root/$vl; | ||
echo `date` "SelectVariants - variant list - $input - Done">>/data/work/${i/.bam}.log; | ||
>${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vl.vcf.ok; | ||
fi | ||
|
||
if [ ! -f "${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.cg.vcf.ok" ]; then | ||
echo `date` "SelectVariants - gene list - $input ">>/data/work/${i/.bam}.log; | ||
java -Xmx50000m -jar $tmpdir /root/GenomeAnalysisTK.jar \ | ||
-T SelectVariants \ | ||
-R /data/ref/$refgenome \ | ||
-V /data/work/${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vcf \ | ||
-o /data/work/${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.cg.vcf \ | ||
-L /root/$gl; | ||
echo `date` "SelectVariants - gene list - $input - Done ">>/data/work/${i/.bam}.log; | ||
>${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.cg.vcf.ok; | ||
fi | ||
|
||
#chown output files | ||
finish() { | ||
# Fix ownership of output files | ||
uid=$(stat -c '%u:%g' /data/work) | ||
chown $uid ${i/.bam}.log | ||
chown $uid ${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.cg.vcf | ||
chown $uid ${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.cg.vcf.idx | ||
chown $uid ${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vcf | ||
chown $uid ${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vcf.idx | ||
chown $uid ${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vl.vcf | ||
chown $uid ${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vl.vcf.idx | ||
chown $uid ${i/.bam}.split.realigned.bai | ||
chown $uid ${i/.bam}.split.realigned.bam | ||
} | ||
|
||
trap finish EXIT | ||
|
||
|
||
echo `date` "Done (linhvoyo/gatk_variant_v2) - $input">>/data/work/${i/.bam}.log; | ||
|
||
rm ${i/.bam}.readgroups.bai ${i/.bam}.readgroups.bam ${i/.bam}.split.bai ${i/.bam}.split.bam ${i/.bam}.split.intervals ${i/.bam}.split.realigned.HaplotypeCaller.vcf ${i/.bam}.split.realigned.HaplotypeCaller.vcf.idx ${i/.bam}.readgroups.bam.ok ${i/.bam}.split.bam.ok ${i/.bam}.split.intervals.ok ${i/.bam}.split.realigned.bam.ok ${i/.bam}.split.realigned.HaplotypeCaller.vcf.ok ${i/.bam}.split.realigned.HaplotypeCaller.filtered.vcf.ok ${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vl.vcf.ok ${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.cg.vcf.ok ${i/.bam}.split.realigned.HaplotypeCaller.filtered.ann.vcf.ok ${i/.bam}.split.realigned.HaplotypeCaller.filtered.vcf ${i/.bam}.split.realigned.HaplotypeCaller.filtered.vcf.idx -r $(pwd)/temp_$input |
File renamed without changes.
Oops, something went wrong.