-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
6784851
commit 8171e30
Showing
1 changed file
with
71 additions
and
74 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,99 +1,96 @@ | ||
# Holmes | ||
Pipeline for structural variation detection from liWGS libraries, as reported in Collins et al. (in press) | ||
Pipeline for structural variation detection from liWGS libraries | ||
|
||
Readme to be updated very soon. | ||
Copyright (c) 2016 Ryan L. Collins and the laboratory of Michael E. Talkowski | ||
Contact: Ryan L. Collins <[email protected]> | ||
|
||
Preliminary description: | ||
Holmes (liWGS-SV) Documentation | ||
Contact: Ryan Collins ([email protected]) | ||
Update: August 2015 | ||
If you use this software, please cite: | ||
Collins RL, et al. Defining the spectrum of large inversions, complex structural variation, and chromothripsis in the morbid genome. In Press (2016) | ||
|
||
Execution command: | ||
runHolmes.sh samples.list parameters_info.sh | ||
Code development credits: Ryan L. Collins, Harrison Brand, Matthew R. Stone, Vamsee Pillalamarri, Joseph T. Glessner, Claire Redin, Colby Chiang, Ian Blumenthal, Adrian Heilbut | ||
|
||
Readme to be cleaned up very soon! | ||
|
||
Preliminary description: | ||
|
||
Execution command: | ||
runHolmes.sh samples.list parameters_info.sh | ||
|
||
##INPUT## | ||
samples.list: three columns, tab delimited, col 1) sample ID, col 2) full path to sample bam, col 3) expected sex (M=XY, F=XX, O=other, U=unknown (will defer to predicted sex by sexcheck)) | ||
parameters_info.sh: shell script to export all parameters for pipeline run | ||
samples.list: three columns, tab delimited, col 1) sample ID, col 2) full path to sample bam, col 3) expected sex (M=XY, F=XX, O=other, U=unknown (will defer to predicted sex by sexcheck)) | ||
parameters_info.sh: shell script to export all parameters for pipeline run | ||
|
||
##PRE-MODULE STEPS## | ||
Symlinks & indexes all bams | ||
Creates working and output directory trees | ||
Loads necessary modules | ||
Symlinks & indexes all bams | ||
Creates working and output directory trees | ||
Loads necessary modules | ||
|
||
##MODULE 1: QC## | ||
Runs the following: | ||
Picard EstimateLibraryComplexity | ||
Picard CollectAlignmentSummaryMetrics | ||
Picard CollectInsertSizeMetrics | ||
Picard CollectWgsMetrics | ||
Samtools flagstat | ||
Bamtools stats | ||
Sex Check | ||
WGS Dosage Bias Check | ||
Checks for nominal QC values, reports errors to ${OUTDIR}/${COHORT_ID}_WARNINGS.txt | ||
Writes master QC table to ${OUTDIR}/QC/cohort/${COHORT_ID}.QC.metrics | ||
Runs the following: | ||
Picard EstimateLibraryComplexity | ||
Picard CollectAlignmentSummaryMetrics | ||
Picard CollectInsertSizeMetrics | ||
Picard CollectWgsMetrics | ||
Samtools flagstat | ||
Bamtools stats | ||
Sex Check | ||
WGS Dosage Bias Check | ||
Checks for nominal QC values, reports errors to ${OUTDIR}/${COHORT_ID}_WARNINGS.txt | ||
Writes master QC table to ${OUTDIR}/QC/cohort/${COHORT_ID}.QC.metrics | ||
|
||
##MODULE 2: PHYSICAL DEPTH ANALYSES## | ||
Runs binCov to generate 1kb binned physical depth for each library | ||
BGZips & tabix indexes each coverage file (for classifier) | ||
Runs binCov to generate 1kb binned physical depth for each library | ||
BGZips & tabix indexes each coverage file (for classifier) | ||
|
||
##MODULE 3: PER-SAMPLE CLUSTERING## | ||
**Rate-limiting step of entire pipeline** | ||
If ${pre_bamstat} isn't set as "TRUE", bamstat is run at min cluster size = 3 for each sample | ||
If ${pre_bamstat}="TRUE", bamstat clusters and stats.file are copied from preexisting paths to ${WRKDIR} | ||
Removes *pairs.txt and *pairs.sorted.txt to save space | ||
If ${pre_bamstat} isn't set as "TRUE", bamstat is run at min cluster size = 3 for each sample | ||
If ${pre_bamstat}="TRUE", bamstat clusters and stats.file are copied from preexisting paths to ${WRKDIR} | ||
Removes *pairs.txt and *pairs.sorted.txt to save space | ||
|
||
##MODULE 4: PHYSICAL DEPTH CNV CALLING## | ||
Runs cnMOPS on autosomes on all samples | ||
Runs cnMOPS on allosomes on samples split by M/F. "Other" sex samples pooled with either M or F depending on ${other_assign} | ||
Merges cnMOPS calls per sample | ||
Runs Serkan's log2R DNAcopy large CNV caller. Allosomes not split by sex; maybe include this functionality in future updates | ||
Runs cnMOPS on autosomes on all samples | ||
Runs cnMOPS on allosomes on samples split by M/F. "Other" sex samples pooled with either M or F depending on ${other_assign} | ||
Merges cnMOPS calls per sample | ||
|
||
##MODULE 5: JOINT RECLUSTERING & CLASSIFICATION## | ||
Runs classifier | ||
Patches clusters | ||
Reclassifies patched clusters | ||
Applies final classification labels & sets coordinate reporting to be 1st or 3rd quartile of reads, respectively (to avoid overclustering/negative sizes) | ||
Runs classifier | ||
Patches clusters | ||
Reclassifies patched clusters | ||
Applies final classification labels & sets coordinate reporting to be 1st or 3rd quartile of reads, respectively (to avoid overclustering/negative sizes) | ||
|
||
##MODULE 6: CONSENSUS CNV CALLING## | ||
Runs in one of two modes: with or without genotyping information | ||
Mode chosen by parameter ${min_geno}, set in module6.sh, which corresponds to the minimum number of samples in the cohort to use genotyping | ||
***NEED TO ADD GENOTYPING*** | ||
Consensus Groups with Genotyping: | ||
A [HIGH]: Valid cluster, cnMOPS or genotyping support, <30% blacklist | ||
B [HIGH]: cnMOPS call, ≥50kb, <30% blacklist, genotyping pass, no clustering overlap | ||
C [MED]: cnMOPS call, <50kb, genotyping pass, <30% blacklist | ||
D [MED]: valid cluster, genotyping or cnMOPS support, ≥30% blacklist | ||
E [MED]: cnMOPS call, ≥50kb, genotyping pass, ≥30% blacklist | ||
F [LOW]: cnMOPS call, ≥50kb, no clustering support, no genotyping support | ||
G [LOW]: cnMOPS call, <50kb, genotyping pass, ≥30% blacklist | ||
H [LOW]: valid cluster, <25kb, no cnMOPS or genotyping support | ||
Consensus Groups without Genotyping: | ||
A [HIGH]: Valid cluster, cnMOPS support, <30% blacklist | ||
B [MED]: cnMOPS call, ≥50kb, <30% blacklist, no clustering overlap | ||
C [MED]: valid cluster, cnMOPS support, ≥30% blacklist | ||
D [LOW]: cnMOPS call, ≥50kb, ≥30% blacklist | ||
E [LOW]: valid cluster, <25kb, no cnMOPS support | ||
Returns single merged file each for consensus dels and consensus dups | ||
Runs in one of two modes: with or without genotyping information | ||
Mode chosen by parameter ${min_geno}, set in module6.sh, which corresponds to the minimum number of samples in the cohort to use genotyping | ||
Consensus Groups with Genotyping: | ||
A [HIGH]: Valid cluster, cnMOPS or genotyping support, <30% blacklist | ||
B [HIGH]: cnMOPS call, ≥50kb, <30% blacklist, genotyping pass, no clustering overlap | ||
C [MED]: cnMOPS call, <50kb, genotyping pass, <30% blacklist | ||
D [MED]: valid cluster, genotyping or cnMOPS support, ≥30% blacklist | ||
E [MED]: cnMOPS call, ≥50kb, genotyping pass, ≥30% blacklist | ||
F [LOW]: cnMOPS call, ≥50kb, no clustering support, no genotyping support | ||
G [LOW]: cnMOPS call, <50kb, genotyping pass, ≥30% blacklist | ||
H [LOW]: valid cluster, <25kb, no cnMOPS or genotyping support | ||
Consensus Groups without Genotyping: | ||
A [HIGH]: Valid cluster, cnMOPS support, <30% blacklist | ||
B [MED]: cnMOPS call, ≥50kb, <30% blacklist, no clustering overlap | ||
C [MED]: valid cluster, cnMOPS support, ≥30% blacklist | ||
D [LOW]: cnMOPS call, ≥50kb, ≥30% blacklist | ||
E [LOW]: valid cluster, <25kb, no cnMOPS support | ||
Returns single merged file each for consensus dels and consensus dups | ||
|
||
##MODULE 7: COMPLEX SV CATEGORIZATION## | ||
Runs inversion classification script | ||
Runs translocation classification script | ||
Runs complex linking script | ||
Runs complex parsing script | ||
Runs inversion classification script | ||
Runs translocation classification script | ||
Runs complex linking script | ||
Runs complex parsing script | ||
|
||
##MODULE 8: VARIANT CONSOLIDATION & REFORMATTING | ||
Outputs the following seven variant files: | ||
-Deletion | ||
-Duplication | ||
-Inversion | ||
-Insertion | ||
-Translocation | ||
-Complex | ||
-Unresolved | ||
|
||
|
||
|
||
|
||
|
||
|
||
Outputs the following seven variant files: | ||
-Deletion | ||
-Duplication | ||
-Inversion | ||
-Insertion | ||
-Translocation | ||
-Complex | ||
-Unresolved |