Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
RCollins13 authored Dec 17, 2016
1 parent 6784851 commit 8171e30
Showing 1 changed file with 71 additions and 74 deletions.
145 changes: 71 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,99 +1,96 @@
# Holmes
Pipeline for structural variation detection from liWGS libraries, as reported in Collins et al. (in press)
Pipeline for structural variation detection from liWGS libraries

Readme to be updated very soon.
Copyright (c) 2016 Ryan L. Collins and the laboratory of Michael E. Talkowski
Contact: Ryan L. Collins <[email protected]>

Preliminary description:
Holmes (liWGS-SV) Documentation
Contact: Ryan Collins ([email protected])
Update: August 2015
If you use this software, please cite:
Collins RL, et al. Defining the spectrum of large inversions, complex structural variation, and chromothripsis in the morbid genome. In Press (2016)

Execution command:
runHolmes.sh samples.list parameters_info.sh
Code development credits: Ryan L. Collins, Harrison Brand, Matthew R. Stone, Vamsee Pillalamarri, Joseph T. Glessner, Claire Redin, Colby Chiang, Ian Blumenthal, Adrian Heilbut

Readme to be cleaned up very soon!

Preliminary description:

Execution command:
runHolmes.sh samples.list parameters_info.sh

##INPUT##
samples.list: three columns, tab delimited, col 1) sample ID, col 2) full path to sample bam, col 3) expected sex (M=XY, F=XX, O=other, U=unknown (will defer to predicted sex by sexcheck))
parameters_info.sh: shell script to export all parameters for pipeline run
samples.list: three columns, tab delimited, col 1) sample ID, col 2) full path to sample bam, col 3) expected sex (M=XY, F=XX, O=other, U=unknown (will defer to predicted sex by sexcheck))
parameters_info.sh: shell script to export all parameters for pipeline run

##PRE-MODULE STEPS##
Symlinks & indexes all bams
Creates working and output directory trees
Loads necessary modules
Symlinks & indexes all bams
Creates working and output directory trees
Loads necessary modules

##MODULE 1: QC##
Runs the following:
Picard EstimateLibraryComplexity
Picard CollectAlignmentSummaryMetrics
Picard CollectInsertSizeMetrics
Picard CollectWgsMetrics
Samtools flagstat
Bamtools stats
Sex Check
WGS Dosage Bias Check
Checks for nominal QC values, reports errors to ${OUTDIR}/${COHORT_ID}_WARNINGS.txt
Writes master QC table to ${OUTDIR}/QC/cohort/${COHORT_ID}.QC.metrics
Runs the following:
Picard EstimateLibraryComplexity
Picard CollectAlignmentSummaryMetrics
Picard CollectInsertSizeMetrics
Picard CollectWgsMetrics
Samtools flagstat
Bamtools stats
Sex Check
WGS Dosage Bias Check
Checks for nominal QC values, reports errors to ${OUTDIR}/${COHORT_ID}_WARNINGS.txt
Writes master QC table to ${OUTDIR}/QC/cohort/${COHORT_ID}.QC.metrics

##MODULE 2: PHYSICAL DEPTH ANALYSES##
Runs binCov to generate 1kb binned physical depth for each library
BGZips & tabix indexes each coverage file (for classifier)
Runs binCov to generate 1kb binned physical depth for each library
BGZips & tabix indexes each coverage file (for classifier)

##MODULE 3: PER-SAMPLE CLUSTERING##
**Rate-limiting step of entire pipeline**
If ${pre_bamstat} isn't set as "TRUE", bamstat is run at min cluster size = 3 for each sample
If ${pre_bamstat}="TRUE", bamstat clusters and stats.file are copied from preexisting paths to ${WRKDIR}
Removes *pairs.txt and *pairs.sorted.txt to save space
If ${pre_bamstat} isn't set as "TRUE", bamstat is run at min cluster size = 3 for each sample
If ${pre_bamstat}="TRUE", bamstat clusters and stats.file are copied from preexisting paths to ${WRKDIR}
Removes *pairs.txt and *pairs.sorted.txt to save space

##MODULE 4: PHYSICAL DEPTH CNV CALLING##
Runs cnMOPS on autosomes on all samples
Runs cnMOPS on allosomes on samples split by M/F. "Other" sex samples pooled with either M or F depending on ${other_assign}
Merges cnMOPS calls per sample
Runs Serkan's log2R DNAcopy large CNV caller. Allosomes not split by sex; maybe include this functionality in future updates
Runs cnMOPS on autosomes on all samples
Runs cnMOPS on allosomes on samples split by M/F. "Other" sex samples pooled with either M or F depending on ${other_assign}
Merges cnMOPS calls per sample

##MODULE 5: JOINT RECLUSTERING & CLASSIFICATION##
Runs classifier
Patches clusters
Reclassifies patched clusters
Applies final classification labels & sets coordinate reporting to be 1st or 3rd quartile of reads, respectively (to avoid overclustering/negative sizes)
Runs classifier
Patches clusters
Reclassifies patched clusters
Applies final classification labels & sets coordinate reporting to be 1st or 3rd quartile of reads, respectively (to avoid overclustering/negative sizes)

##MODULE 6: CONSENSUS CNV CALLING##
Runs in one of two modes: with or without genotyping information
Mode chosen by parameter ${min_geno}, set in module6.sh, which corresponds to the minimum number of samples in the cohort to use genotyping
***NEED TO ADD GENOTYPING***
Consensus Groups with Genotyping:
A [HIGH]: Valid cluster, cnMOPS or genotyping support, <30% blacklist
B [HIGH]: cnMOPS call, ≥50kb, <30% blacklist, genotyping pass, no clustering overlap
C [MED]: cnMOPS call, <50kb, genotyping pass, <30% blacklist
D [MED]: valid cluster, genotyping or cnMOPS support, ≥30% blacklist
E [MED]: cnMOPS call, ≥50kb, genotyping pass, ≥30% blacklist
F [LOW]: cnMOPS call, ≥50kb, no clustering support, no genotyping support
G [LOW]: cnMOPS call, <50kb, genotyping pass, ≥30% blacklist
H [LOW]: valid cluster, <25kb, no cnMOPS or genotyping support
Consensus Groups without Genotyping:
A [HIGH]: Valid cluster, cnMOPS support, <30% blacklist
B [MED]: cnMOPS call, ≥50kb, <30% blacklist, no clustering overlap
C [MED]: valid cluster, cnMOPS support, ≥30% blacklist
D [LOW]: cnMOPS call, ≥50kb, ≥30% blacklist
E [LOW]: valid cluster, <25kb, no cnMOPS support
Returns single merged file each for consensus dels and consensus dups
Runs in one of two modes: with or without genotyping information
Mode chosen by parameter ${min_geno}, set in module6.sh, which corresponds to the minimum number of samples in the cohort to use genotyping
Consensus Groups with Genotyping:
A [HIGH]: Valid cluster, cnMOPS or genotyping support, <30% blacklist
B [HIGH]: cnMOPS call, ≥50kb, <30% blacklist, genotyping pass, no clustering overlap
C [MED]: cnMOPS call, <50kb, genotyping pass, <30% blacklist
D [MED]: valid cluster, genotyping or cnMOPS support, ≥30% blacklist
E [MED]: cnMOPS call, ≥50kb, genotyping pass, ≥30% blacklist
F [LOW]: cnMOPS call, ≥50kb, no clustering support, no genotyping support
G [LOW]: cnMOPS call, <50kb, genotyping pass, ≥30% blacklist
H [LOW]: valid cluster, <25kb, no cnMOPS or genotyping support
Consensus Groups without Genotyping:
A [HIGH]: Valid cluster, cnMOPS support, <30% blacklist
B [MED]: cnMOPS call, ≥50kb, <30% blacklist, no clustering overlap
C [MED]: valid cluster, cnMOPS support, ≥30% blacklist
D [LOW]: cnMOPS call, ≥50kb, ≥30% blacklist
E [LOW]: valid cluster, <25kb, no cnMOPS support
Returns single merged file each for consensus dels and consensus dups

##MODULE 7: COMPLEX SV CATEGORIZATION##
Runs inversion classification script
Runs translocation classification script
Runs complex linking script
Runs complex parsing script
Runs inversion classification script
Runs translocation classification script
Runs complex linking script
Runs complex parsing script

##MODULE 8: VARIANT CONSOLIDATION & REFORMATTING
Outputs the following seven variant files:
-Deletion
-Duplication
-Inversion
-Insertion
-Translocation
-Complex
-Unresolved






Outputs the following seven variant files:
-Deletion
-Duplication
-Inversion
-Insertion
-Translocation
-Complex
-Unresolved

0 comments on commit 8171e30

Please sign in to comment.