-
Notifications
You must be signed in to change notification settings - Fork 0
/
Multigen_Root_processing.Rmd
executable file
·270 lines (200 loc) · 8.33 KB
/
Multigen_Root_processing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
---
title: "Multigen_Root"
author: "Abby Sulesky"
date: "2023-02-19"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Input files
#### Multiplexed files copied from raw_sequence into:
/mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_seed_analysis/input_data and /figaro
(base) -bash-4.2$ cd /mnt/research/ShadeLab/Sequence/raw_sequence/Bean_heritability/20230130_seed16S
(base) -bash-4.2$ ls
230126_Shade_Seeds_16s_DG_230126.txt Undetermined_S0_L001_R1_001.fastq
Undetermined_S0_L001_I1_001.fastq Undetermined_S0_L001_R2_001.fastq
(base) -bash-4.2$ cp Undetermined_S0_L001_R1_001.fastq Undetermined_S0_L001_R2_001.fastq /mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_seed_analysis/figaro
(base) -bash-4.2$ cp Undetermined_S0_L001_I1_001.fastq Undetermined_S0_L001_R1_001.fastq Undetermined_S0_L001_R2_001.fastq 230126_Shade_Seeds_16s_DG_230126.txt /mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_seed_analysis/input_data
#### rename files for figaro:
(base) -bash-4.2$ cd /mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_seed_analysis
(base) -bash-4.2$ ls
figaro input_data
(base) -bash-4.2$ cd figaro/
(base) -bash-4.2$ ls
Undetermined_S0_L001_R1_001.fastq Undetermined_S0_L001_R2_001.fastq
(base) -bash-4.2$ mv Undetermined_S0_L001_R1_001.fastq seed_16s_R1.fastq
(base) -bash-4.2$ mv Undetermined_S0_L001_R2_001.fastq seed_16s_R2.fastq
(base) -bash-4.2$ ls
seed_16s_R1.fastq seed_16s_R2.fastq
#### Rename files in input data
(base) -bash-4.2$ mv Undetermined_S0_L001_I1_001.fastq barcodes.fastq
(base) -bash-4.2$ mv Undetermined_S0_L001_R1_001.fastq forward.fastq
(base) -bash-4.2$ mv Undetermined_S0_L001_R2_001.fastq reverse.fastq
#### Compress files into zip format for qiime2 artifact (takes ~10 minutes)
gzip *.fastq
#### Check
ls
barcodes.fastq.gz forward.fastq.gz reverse.fastq.gz
#### Create qiime2 artifact and demultiplex with EMP paired end protocol
cd /mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_seed_analysis
nano seed_demultiplex.sb
```{bash}
#!/bin/bash -login
#SBATCH --time=03:59:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=64G
#SBATCH --job-name seed_demultiplex
#SBATCH [email protected]
#SBATCH --mail-type=BEGIN,END
######## Job code
conda activate qiime2-2022.8
qiime tools import \
--type EMPPairedEndSequences \
--input-path /mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_seed_analysis/input_data \
--output-path /mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_seed_analysis/input_data.qza
qiime demux emp-paired \
--m-barcodes-file 230126_Shade_Seeds_16s_DG_230126.txt \
--m-barcodes-column BarcodeSequence \
--p-no-golay-error-correction \
--i-seqs input_data.qza \
--o-per-sample-sequences demultiplexed-seqs.qza \
--o-error-correction-details demux-details.qza
qiime demux summarize --i-data demultiplexed-seqs.qza --o-visualization demux-vis.qzv
conda deactivate
echo -e "\n `sacct -u suleskya -j $SLURM_JOB_ID --format=JobID,JobName,Start,End,Elapsed,NCPUS,ReqMem` \n"
scontrol show job $SLURM_JOB_ID
```
submit job from Multigen_seed_analysis directory:
sbatch seed_demultiplex.sb
## Figaro
use unzipped non-demultiplexed files for figaro, copied them from raw_sequence folder
cp Undetermined_S0_L001_R1_001.fastq /mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_Root_analysis/figaro
cp Undetermined_S0_L001_R2_001.fastq /mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_Root_analysis/figaro
must be named *_16s_R1.fastq and _16s_R2.fastq and be only fastq files in folder
rename:
(base) -bash-4.2$ mv Undetermined_S0_L001_R1_001.fastq roots_16s_R1.fastq
(base) -bash-4.2$ mv Undetermined_S0_L001_R2_001.fastq roots_16s_R2.fastq
go back to home driectory and figaro install folder to run
(figaro) -bash-4.2$ cd /mnt/home/suleskya/figaro-master/figaro
run figaro (takes 5-10 minutes, don't need a job):
(figaro) -bash-4.2$ conda activate figaro
(figaro) -bash-4.2$ python figaro.py -i /mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_Rhizosphere_analysis/figaro -o /mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_Rhizosphere_analysis/figaro -f 1 -r 1 -a 253 -F zymo
go check outputs
(figaro) -bash-4.2$ cd /mnt/research/ShadeLab/WorkingSpace/Sulesky/Multigen_Root_analysis/figaro
(figaro) -bash-4.2$ ls
figaro.1676842677032661.log reverseExpectedError.png roots_16s_R2.fastq
forwardExpectedError.png roots_16s_R1.fastq trimParameters.json
look at results for trim parameters
(figaro) -bash-4.2$ less trimParameters.json
[
{
"trimPosition": [
114,
161
],
"maxExpectedError": [
1,
2
],
"readRetentionPercent": 90.91,
"score": 89.9132593719245
},
{
will use these trim parameters for DADA2: forward trim 114, reverse trim 161, will merge 90.91% of reads
# Run DADA2 on qiime2 to denoise, filter chimeras and merge reads to ASVs
view dada2 denoise-paired defaults at https://docs.qiime2.org/2022.8/plugins/available/dada2/denoise-paired/
will run this as a job
job script: nano dada2_merge.sb
```{r}
#!/bin/bash -login
########## SBATCH Lines for Resource Request ##########
#SBATCH --time=03:59:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=64G
#SBATCH --job-name dada2_roots
#SBATCH [email protected]
#SBATCH --mail-type=BEGIN,END
########## Command Lines for Job Running ##########
conda activate qiime2-2022.8
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demultiplexed-seqs.qza \
--p-trunc-len-f 114 \
--p-trunc-len-r 161 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza
conda deactivate
echo -e "\n `sacct -u suleskya -j $SLURM_JOB_ID --format=JobID,JobName,Start,End,Elapsed,NCPUS,ReqMem` \n"
scontrol show job $SLURM_JOB_I
```
submit job:
sbatch dada2_merge.sb
check queue:
sq
visualize the output using the codes below
qiime metadata tabulate \
--m-input-file denoising-stats.qza \
--o-visualization denoising-stats.qzv
qiime feature-table summarize \
--i-table table.qza \
--o-visualization table.qzv \
--m-sample-metadata-file 220928_Shade_Roots_16s_DG_220923.txt
qiime feature-table tabulate-seqs \
--i-data rep-seqs.qza \
--o-visualization rep-seqs.qzv
# Assign taxonomy using Silva
Download reference seqs from qiime2.org:
wget https://data.qiime2.org/2022.8/common/silva-138-99-515-806-nb-classifier.qza
job:
nano classify-silva-taxonomy.sb
```{bash}
#!/bin/bash -login
########## SBATCH Lines for Resource Request ##########
#SBATCH --time=03:59:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=64G
#SBATCH --job-name taxonomy
#SBATCH [email protected]
#SBATCH --mail-type=BEGIN,END
########## Command Lines for Job Running ##########
conda activate qiime2-2022.8
qiime feature-classifier classify-sklearn \
--i-classifier silva-138-99-515-806-nb-classifier.qza \
--i-reads rep-seqs.qza \
--o-classification taxonomy.qza
conda deactivate
echo -e "\n `sacct -u suleskya -j $SLURM_JOB_ID --format=JobID,JobName,Start,End,Elapsed,NCPUS,ReqMem` \n"
scontrol show job $SLURM_JOB_I
```
convert output to visualization file
(qiime2-2022.8) -bash-4.2$
qiime metadata tabulate \
--m-input-file taxonomy.qza \
--o-visualization taxonomy.qzv
Saved Visualization to: taxonomy.qzv
Export OTU table to new directory phyloseq (code will make the new directory)
(qiime2-2022.8) -bash-4.2$ qiime tools export \
> --input-path table.qza \
> --output-path phyloseq
Exported table.qza as BIOMV210DirFmt to directory phyloseq
OTU tables exports as feature-table.biom so convert to .tsv
Change -i and -o paths accordingly
biom convert \
-i phyloseq/feature-table.biom \
-o phyloseq/otu_table.txt \
--to-tsv
download otu table, manually change "#OTU ID" column header to "OTUID"
Export taxonomy table
qiime tools export \
--input-path taxonomy.qza \
--output-path phyloseq
Exported taxonomy.qza as TSVTaxonomyDirectoryFormat to directory phyloseq
download taxonomy file and manually change Feature ID to OTUID in taxonomy.tsv. change taxonomy and OTU tables to csv format.
these files are now ready to export to R and run using phyloseq