Update README.md

talkowski-lab · Dec 17, 2016 · 8171e30 · 8171e30
1 parent 6784851
commit 8171e30
Showing 1 changed file with 71 additions and 74 deletions.
diff --git a/README.md b/README.md
@@ -1,99 +1,96 @@
 # Holmes
-Pipeline for structural variation detection from liWGS libraries, as reported in Collins et al. (in press)
+Pipeline for structural variation detection from liWGS libraries  
 
-Readme to be updated very soon.
+Copyright (c) 2016 Ryan L. Collins and the laboratory of Michael E. Talkowski  
+Contact: Ryan L. Collins <[email protected]>  
 
-Preliminary description:
-Holmes (liWGS-SV) Documentation
-Contact: Ryan Collins ([email protected])
-Update: August 2015
+If you use this software, please cite:
+Collins RL, et al. Defining the spectrum of large inversions, complex structural variation, and chromothripsis in the morbid genome. In Press (2016)  
 
-Execution command:
-runHolmes.sh samples.list parameters_info.sh
+Code development credits: Ryan L. Collins, Harrison Brand, Matthew R. Stone, Vamsee Pillalamarri, Joseph T. Glessner, Claire Redin, Colby Chiang, Ian Blumenthal, Adrian Heilbut
+
+Readme to be cleaned up very soon!  
+
+Preliminary description:  
+
+Execution command:  
+runHolmes.sh samples.list parameters_info.sh  
 
 ##INPUT##
-samples.list: three columns, tab delimited, col 1) sample ID, col 2) full path to sample bam, col 3) expected sex (M=XY, F=XX, O=other, U=unknown (will defer to predicted sex by sexcheck))
-parameters_info.sh: shell script to export all parameters for pipeline run
+samples.list: three columns, tab delimited, col 1) sample ID, col 2) full path to sample bam, col 3) expected sex (M=XY, F=XX, O=other, U=unknown (will defer to predicted sex by sexcheck))  
+parameters_info.sh: shell script to export all parameters for pipeline run  
 
 ##PRE-MODULE STEPS##
-Symlinks & indexes all bams
-Creates working and output directory trees
-Loads necessary modules
+Symlinks & indexes all bams  
+Creates working and output directory trees  
+Loads necessary modules  
 
 ##MODULE 1: QC##
-Runs the following:
-	Picard EstimateLibraryComplexity
-	Picard CollectAlignmentSummaryMetrics
-	Picard CollectInsertSizeMetrics
-	Picard CollectWgsMetrics
-	Samtools flagstat
-	Bamtools stats
-	Sex Check
-	WGS Dosage Bias Check
-Checks for nominal QC values, reports errors to ${OUTDIR}/${COHORT_ID}_WARNINGS.txt
-Writes master QC table to ${OUTDIR}/QC/cohort/${COHORT_ID}.QC.metrics
+Runs the following:  
+	Picard EstimateLibraryComplexity  
+	Picard CollectAlignmentSummaryMetrics  
+	Picard CollectInsertSizeMetrics  
+	Picard CollectWgsMetrics  
+	Samtools flagstat  
+	Bamtools stats  
+	Sex Check  
+	WGS Dosage Bias Check  
+Checks for nominal QC values, reports errors to ${OUTDIR}/${COHORT_ID}_WARNINGS.txt  
+Writes master QC table to ${OUTDIR}/QC/cohort/${COHORT_ID}.QC.metrics  
 
 ##MODULE 2: PHYSICAL DEPTH ANALYSES##
-Runs binCov to generate 1kb binned physical depth for each library
-BGZips & tabix indexes each coverage file (for classifier)
+Runs binCov to generate 1kb binned physical depth for each library  
+BGZips & tabix indexes each coverage file (for classifier)  
 
 ##MODULE 3: PER-SAMPLE CLUSTERING##
 **Rate-limiting step of entire pipeline**
-If ${pre_bamstat} isn't set as "TRUE", bamstat is run at min cluster size = 3 for each sample
-If ${pre_bamstat}="TRUE", bamstat clusters and stats.file are copied from preexisting paths to ${WRKDIR}
-Removes *pairs.txt and *pairs.sorted.txt to save space
+If ${pre_bamstat} isn't set as "TRUE", bamstat is run at min cluster size = 3 for each sample  
+If ${pre_bamstat}="TRUE", bamstat clusters and stats.file are copied from preexisting paths to ${WRKDIR}  
+Removes *pairs.txt and *pairs.sorted.txt to save space  
 
 ##MODULE 4: PHYSICAL DEPTH CNV CALLING##
-Runs cnMOPS on autosomes on all samples
-Runs cnMOPS on allosomes on samples split by M/F. "Other" sex samples pooled with either M or F depending on ${other_assign}
-Merges cnMOPS calls per sample
-Runs Serkan's log2R DNAcopy large CNV caller. Allosomes not split by sex; maybe include this functionality in future updates
+Runs cnMOPS on autosomes on all samples  
+Runs cnMOPS on allosomes on samples split by M/F. "Other" sex samples pooled with either M or F depending on ${other_assign}  
+Merges cnMOPS calls per sample  
 
 ##MODULE 5: JOINT RECLUSTERING & CLASSIFICATION##
-Runs classifier
-Patches clusters
-Reclassifies patched clusters
-Applies final classification labels & sets coordinate reporting to be 1st or 3rd quartile of reads, respectively (to avoid overclustering/negative sizes)
+Runs classifier  
+Patches clusters  
+Reclassifies patched clusters  
+Applies final classification labels & sets coordinate reporting to be 1st or 3rd quartile of reads, respectively (to avoid overclustering/negative sizes)  
 
 ##MODULE 6: CONSENSUS CNV CALLING##
-Runs in one of two modes: with or without genotyping information
-Mode chosen by parameter ${min_geno}, set in module6.sh, which corresponds to the minimum number of samples in the cohort to use genotyping
-***NEED TO ADD GENOTYPING***
-Consensus Groups with Genotyping:
-	A [HIGH]: Valid cluster, cnMOPS or genotyping support, <30% blacklist
-	B [HIGH]: cnMOPS call, ≥50kb, <30% blacklist, genotyping pass, no clustering overlap
-	C [MED]: cnMOPS call, <50kb, genotyping pass, <30% blacklist
-	D [MED]: valid cluster, genotyping or cnMOPS support, ≥30% blacklist
-	E [MED]: cnMOPS call, ≥50kb, genotyping pass, ≥30% blacklist
-	F [LOW]: cnMOPS call, ≥50kb, no clustering support, no genotyping support
-	G [LOW]: cnMOPS call, <50kb, genotyping pass, ≥30% blacklist
-	H [LOW]: valid cluster, <25kb, no cnMOPS or genotyping support
-Consensus Groups without Genotyping:
-    A [HIGH]: Valid cluster, cnMOPS support, <30% blacklist
-    B [MED]: cnMOPS call, ≥50kb, <30% blacklist, no clustering overlap
-    C [MED]: valid cluster, cnMOPS support, ≥30% blacklist
-    D [LOW]: cnMOPS call, ≥50kb, ≥30% blacklist
-    E [LOW]: valid cluster, <25kb, no cnMOPS support
-Returns single merged file each for consensus dels and consensus dups
+Runs in one of two modes: with or without genotyping information  
+Mode chosen by parameter ${min_geno}, set in module6.sh, which corresponds to the minimum number of samples in the cohort to use genotyping  
+Consensus Groups with Genotyping:  
+	A [HIGH]: Valid cluster, cnMOPS or genotyping support, <30% blacklist  
+	B [HIGH]: cnMOPS call, ≥50kb, <30% blacklist, genotyping pass, no clustering overlap  
+	C [MED]: cnMOPS call, <50kb, genotyping pass, <30% blacklist  
+	D [MED]: valid cluster, genotyping or cnMOPS support, ≥30% blacklist  
+	E [MED]: cnMOPS call, ≥50kb, genotyping pass, ≥30% blacklist  
+	F [LOW]: cnMOPS call, ≥50kb, no clustering support, no genotyping support  
+	G [LOW]: cnMOPS call, <50kb, genotyping pass, ≥30% blacklist  
+	H [LOW]: valid cluster, <25kb, no cnMOPS or genotyping support  
+Consensus Groups without Genotyping:  
+    A [HIGH]: Valid cluster, cnMOPS support, <30% blacklist  
+    B [MED]: cnMOPS call, ≥50kb, <30% blacklist, no clustering overlap  
+    C [MED]: valid cluster, cnMOPS support, ≥30% blacklist  
+    D [LOW]: cnMOPS call, ≥50kb, ≥30% blacklist  
+    E [LOW]: valid cluster, <25kb, no cnMOPS support  
+Returns single merged file each for consensus dels and consensus dups  
 
 ##MODULE 7: COMPLEX SV CATEGORIZATION##
-Runs inversion classification script
-Runs translocation classification script
-Runs complex linking script
-Runs complex parsing script
+Runs inversion classification script  
+Runs translocation classification script  
+Runs complex linking script  
+Runs complex parsing script  
 
 ##MODULE 8: VARIANT CONSOLIDATION & REFORMATTING
-Outputs the following seven variant files:
--Deletion
--Duplication
--Inversion
--Insertion
--Translocation
--Complex
--Unresolved
-
-
-
-
-
-
+Outputs the following seven variant files:  
+-Deletion  
+-Duplication  
+-Inversion  
+-Insertion  
+-Translocation  
+-Complex  
+-Unresolved