-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
154 lines (142 loc) · 4.98 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
PREREQUISITES:
The pipeline is configured to run on a SLURM based cluster with modules. The pipeline is built in nextflow (nextflow.io). It has been tested using nextflow/0.30.2. Earlier versions of nextflow will not work.
This pipeline should run on other system architectures, but will require some customization.
Modules / Software versions:
R/3.5.2
bamtools/2.5.1
bedtools/2.27.1
deeptools/3.0.1
kallisto/0.45.0
macs/2.1.2
meme/5.0.1
nextflow/0.30.2
picard/2.17.11
picard/2.9.2
samtools/1.8
samtools/1.9
sratoolkit/2.9.2
ucsc/373
R packages:
corrplot
data.table
dplyr
extrafont
factoextra
ggcorrplot
ggfortify
ggplot2
ggpmisc
ggpubr
ggrepel
grid
gridExtra
leaps
lsr
numform
pROC
plyr
png
preprocessCore
purrr
reshape2
scales
tictoc
ShortRead
SYSTEM REQUIREMENTS:
~1Tb free space; Pipe will generate generate up to 1 Tb of temp files.
Other requirements are encoded in nextflow processes.
REQUIRED FILES:
This folder contains the accessory data required to run the pipeline.
In addition to these files, the pipeline requires aligned BAM files (tested only with BWA 0.7.12 alignment) as follows:
Nomenclature for aligned BAM files:
SS.RN.MMMMMM.anything.bam
SS = stage (LE,ZY,EP,LP,DI)
RN = round (R1, R2)
MMMMMM = histone modification name (must be exactly as below (case sensitive))
* Note: the names should be retained from the fastq.gz files in the GEO record (GSM121760)
Aligned bam files should be located in the following folders:
/data/timeCourse
** These files should be aligned to the mm10 genome with K-MetStat panel sequences included.
** This genome can be obtained using the genomeFiles/getGenomeFiles.sh script
DI.R1.H3K4me3.ChIPSeq.mm10_KmetStat.bam
DI.R1.H3K9Ac.ChIPSeq.mm10_KmetStat.bam
DI.R1.input.ChIPSeq.mm10_KmetStat.bam
DI.R2.H3K4me3.ChIPSeq.mm10_KmetStat.bam
DI.R2.H3K9Ac.ChIPSeq.mm10_KmetStat.bam
DI.R2.abK4me.ChIPSeq.mm10_KmetStat.bam
DI.R2.input.ChIPSeq.mm10_KmetStat.bam
EP.R1.H3K4me3.ChIPSeq.mm10_KmetStat.bam
EP.R1.H3K9Ac.ChIPSeq.mm10_KmetStat.bam
EP.R1.input.ChIPSeq.mm10_KmetStat.bam
EP.R2.H3K4me3.ChIPSeq.mm10_KmetStat.bam
EP.R2.H3K9Ac.ChIPSeq.mm10_KmetStat.bam
EP.R2.abK4me.ChIPSeq.mm10_KmetStat.bam
EP.R2.input.ChIPSeq.mm10_KmetStat.bam
LE.R1.H3K4me3.ChIPSeq.mm10_KmetStat.bam
LE.R1.H3K9Ac.ChIPSeq.mm10_KmetStat.bam
LE.R1.input.ChIPSeq.mm10_KmetStat.bam
LE.R2.H3K4me3.ChIPSeq.mm10_KmetStat.bam
LE.R2.H3K9Ac.ChIPSeq.mm10_KmetStat.bam
LE.R2.abK4me.ChIPSeq.mm10_KmetStat.bam
LE.R2.input.ChIPSeq.mm10_KmetStat.bam
LP.R1.H3K4me3.ChIPSeq.mm10_KmetStat.bam
LP.R1.H3K9Ac.ChIPSeq.mm10_KmetStat.bam
LP.R1.input.ChIPSeq.mm10_KmetStat.bam
LP.R2.H3K4me3.ChIPSeq.mm10_KmetStat.bam
LP.R2.H3K9Ac.ChIPSeq.mm10_KmetStat.bam
LP.R2.abK4me.ChIPSeq.mm10_KmetStat.bam
LP.R2.input.ChIPSeq.mm10_KmetStat.bam
ZY.R1.H3K4me3.ChIPSeq.mm10_KmetStat.bam
ZY.R1.H3K9Ac.ChIPSeq.mm10_KmetStat.bam
ZY.R1.input.ChIPSeq.mm10_KmetStat.bam
ZY.R2.H3K4me3.ChIPSeq.mm10_KmetStat.bam
ZY.R2.H3K9Ac.ChIPSeq.mm10_KmetStat.bam
ZY.R2.abK4me.ChIPSeq.mm10_KmetStat.bam
ZY.R2.input.ChIPSeq.mm10_KmetStat.bam
/data/histmods:
H3K27ac_SCP3pos_H1Tneg.bam
H3K27me1_SCP3pos_H1Tneg.bam
H3K27me3_SCP3pos_H1Tneg.bam
H3K36me3_SCP3pos_H1Tneg.bam
H3K4ac_SCP3pos_H1Tneg.bam
H3K4me1_SCP3pos_H1Tneg.bam
H3K4me2_SCP3pos_H1Tneg.bam
H3K79me1_SCP3pos_H1Tneg.bam
H3K79me3_SCP3pos_H1Tneg.bam
H3K9ac_SCP3pos_H1Tneg.bam
H3K9me2_SCP3pos_H1Tneg.bam
H3K9me3_SCP3pos_H1Tneg.bam
H3_SCP3pos_H1Tneg.bam
H4K12ac_SCP3pos_H1Tneg.bam
H4K20me3_SCP3pos_H1Tneg.bam
H4K8ac_SCP3pos_H1Tneg.bam
H4ac5_SCP3pos_H1Tneg.bam
IgG_SCP3pos_H1Tneg.bam
Input_SCP3pos_H1Tneg.bam
abK4me_SCP3pos_H1Tneg.bam
*NOTE: H3K4me3_SCP3pos_H1Tneg.bam will be built from ZY.R1.H3K4me3 bam file
Genome files:
The pipeline requires the following files in the accessoryFiles/genomeFiles folder:
mm10_genome.fa
mm10_genome.fa.fai
mm10_KmetStat_genome.fa
mm10_KmetStat_genome.fa.fai
These files can be generated using the following script:
accessoryFiles/genomeFiles/getGenomeFiles.sh
----------------------------------------------------------------------------------------------
RUNNING THE PIPELINE:
----------------------------------------------------------------------------------------------
nextflow run -with-timeline
-with-trace
-with-report
-c {accessoryFilesFolder}/nextflowConfig/nextflow.config
{accessoryFilesFolder}/scripts/analyticPipe_LamEtAl_NatComm2019.groovy
--projectdir {project folder <<FULL PATH>>}
--outdir {output folder <<FULL PATH>>}
--timecoursedir {timecourse BAM folder <<FULL PATH>>}
--allHMdir {SCP3pos H1tneg histone modifications BAM folder <<FULL PATH>>}
{accessoryFilesFolder} : downloaded folder with accessory files (location of this README)
{project folder} : parent folder of {accessoryFilesFolder}
{output folder} : location for output files
{timeCourse BAM folder} : location of all aligned BAM files from experiments in 5 MPI populations (see above)
{SCP3pos H1tneg BAM folder} : location of all aligned BAM files from experiments in SCP3pos H1tneg nuclei (see above)