The input czi file has the following channels
The corresponding segmentions from Dotdotdot
Download the toolbox directory and assign the paths of input czi
file and toolbox to Matlab variables as shown below. Other inputs are DAPI, LIP, DROP which are the names of DAPI, Lipofuscin and channels to be dropped from processing. If you do not have Lipofuscin channel or if you donot want to drop any channels leave them empty. The toolbox should be added to the Matlab’s current working directory to run the code.
>> filename = '/dcl01/lieber/ajaffe/Maddy/RNAscope/dotdot_vignette/dotdot_vignette/Human2.czi';
>> toolbox = '/dcl01/lieber/ajaffe/Maddy/RNAscope/dotdot_vignette/dotdot_vignette/toolbox';
>> DAPI = 'DAPILp3'; %
>> LIP = 'No103_Lipofuscin_63x'; % if Lipofuscin channel doesnt exists leave it empty like LIP = '';
>> DROP = 'Ch7'; % we want to drop this residual channel which doesnt contain any data, if you dont want to drop any channel leave them empty like DROP = '';
>> addpath(genpath(toolbox)) %adding toolbox path to current working directory
The command rnascope_human takes in the filename, toolbox, DAPI, LIP and DROP as inputs.
>> rnascope_human(filename, toolbox, DAPI, LIP, DROP)
Reading Images: 1 of 70 Frames
Reading Images: 2 of 70 Frames
...
Reading Images: 69 of 70 Frames
Reading Images: 70 of 70 Frames
extracted
'DAPILp3'
'Opal520_Lp20'
'Opal570Lp1_0'
'Opal620_LP10'
'Opal690Lp30'
'No103_Lipofuscin_63x'
Elapsed time is 5.250257 seconds.
segmented DAPILp3: 5
segmented Opal520_Lp20: 155
segmented Opal570Lp1_0: 122
segmented Opal620_LP10: 2048
segmented Opal690Lp30: 954
segmented No103_Lipofuscin_63x: 176
Completed Masking DAPILp3
Completed Masking Opal520_Lp20
Completed Masking Opal570Lp1_0
Completed Masking Opal620_LP10
Completed Masking Opal690Lp30
Completed Masking No103_Lipofuscin_63x
Started Opal520_Lp20
1 cells finished in time 0.57048s
2 cells finished in time 0.80605s
3 cells finished in time 0.89423s
4 cells finished in time 0.98938s
5 cells finished in time 1.0881s
Started Opal570Lp1_0
1 cells finished in time 0.12922s
2 cells finished in time 0.19549s
3 cells finished in time 0.27868s
4 cells finished in time 0.33056s
5 cells finished in time 0.38089s
Started Opal620_LP10
1 cells finished in time 0.8361s
2 cells finished in time 1.6512s
3 cells finished in time 2.4203s
4 cells finished in time 3.0615s
5 cells finished in time 3.7481s
Started Opal690Lp30
1 cells finished in time 0.37236s
2 cells finished in time 0.75768s
3 cells finished in time 1.1484s
4 cells finished in time 1.4843s
5 cells finished in time 1.8443s
To run processing on .tif
files where each channel is saved as individual tif file for example like below
%sample 1
122718_NPY86_4_L_40x_00008_CH1.tif
122718_NPY86_4_L_40x_00008_CH2.tif
122718_NPY86_4_L_40x_00008_CH3.tif
122718_NPY86_4_L_40x_00008_CH4.tif
122718_NPY86_4_L_40x_00008_Overlay.tif
%sample 2
122718_NPY86_4_L_40x_00007_CH1.tif
122718_NPY86_4_L_40x_00007_CH2.tif
122718_NPY86_4_L_40x_00007_CH3.tif
122718_NPY86_4_L_40x_00007_CH4.tif
122718_NPY86_4_L_40x_00007_Overlay.tif
Then the command rnascope_human_tif is used with similar inputs (filename, toolbox, DAPI, LIP and DROP). Except the filename is only the sample name without the channel extensions, like below
>> filename = '/dcl01/lieber/ajaffe/Maddy/RNAscope/dotdot_vignette/tiffFiles/122718_NPY86_4_L_40x_00008';
>> toolbox = '/dcl01/lieber/ajaffe/Maddy/RNAscope/dotdot_vignette/dotdot_vignette/toolbox';
>> DAPI = 'CH4'; %CH4 is the DAPI channel or DAPI image
>> LIP = ''; %there is no Lipofuscin channel
>> DROP = 'Overlay'; % We donot want to process the Overlaid image
The above functions outputs the following four .mat
files that are saved where the input .czi
or .tif
file is located.
These matfiles are matlab structures with each field being a channel from the .czi
or .tif
file.
>> load('~/Human2_img.mat')
>> img
img =
struct with fields:
DAPILp3: [401×401×10 double]
Opal520_Lp20: [401×401×10 double]
Opal570Lp1_0: [401×401×10 double]
Opal620_LP10: [401×401×10 double]
Opal690Lp30: [401×401×10 double]
No103_Lipofuscin_63x: [401×401×10 double]
segmented images ~/Human2_segmentation.mat
This matfile has two matlab structures Segmentations
(before masking) and Segmentation_m
(after masking)
>> load('~/Human2_segmentation.mat')
>> Segmentations
Segmentations =
struct with fields:
DAPILp3: [401×401×10 logical]
Opal520_Lp20: [401×401×10 logical]
Opal570Lp1_0: [401×401×10 logical]
Opal620_LP10: [401×401×10 logical]
Opal690Lp30: [401×401×10 logical]
No103_Lipofuscin_63x: [401×401×10 logical]
>> Segmentations_m
Segmentations_m =
struct with fields:
DAPILp3: [401×401×10 logical]
Opal520_Lp20: [401×401×10 logical]
Opal570Lp1_0: [401×401×10 logical]
Opal620_LP10: [401×401×10 logical]
Opal690Lp30: [401×401×10 logical]
No103_Lipofuscin_63x: [401×401×10 logical]
imshow(max(Segmentations_m.DAPILp3,[],3)) %displays the following image
total segmented dots in each channel ~/Human2_totaldots.mat
This matfile has two matlab structures excel_totaldots
(before masking) and excel_totaldots_mask
(after masking)
>> load('~/Human2_totaldots.mat')
>> excel_totaldots
excel_totaldots =
struct with fields:
DAPILp3: [5×9 table]
Opal520_Lp20: [155×9 table]
Opal570Lp1_0: [122×9 table]
Opal620_LP10: [2048×9 table]
Opal690Lp30: [954×9 table]
No103_Lipofuscin_63x: [36×9 table]
>> excel_totaldots_mask
excel_totaldots_mask =
struct with fields:
DAPILp3: [5×9 table]
Opal520_Lp20: [155×9 table]
Opal570Lp1_0: [122×9 table]
Opal620_LP10: [2052×9 table]
Opal690Lp30: [956×9 table]
No103_Lipofuscin_63x: [0×9 table]
>> head(excel_totaldots_mask.DAPILp3) %display 1st few rows of this object
ans =
5×9 table
Volume Centroid BoundingBox VoxelIdxList VoxelList VoxelValues MeanIntensity MinIntensity MaxIntensity
______ __________________________ ____________ ________________ ________________ ________________ _____________ ____________ ____________
14364 36.124 176.07 3.239 [1x6 double] [14364×1 double] [14364×3 double] [14364×1 double] 9792 299 21394
19069 155.55 105.2 4.0746 [1x6 double] [19069×1 double] [19069×3 double] [19069×1 double] 11305 0 26782
15332 251.43 86.285 4.709 [1x6 double] [15332×1 double] [15332×3 double] [15332×1 double] 10413 276 26705
10391 109.15 144.16 4.5714 [1x6 double] [10391×1 double] [10391×3 double] [10391×1 double] 9469.1 0 18883
7506 304.23 92.893 4.6102 [1x6 double] [ 7506×1 double] [ 7506×3 double] [ 7506×1 double] 10138 81 22527
Dots in nuclei per channel ~/Human2_dots_of_ROI.mat
This matfile has two matlab structures excel_dots_of_ROI
(before masking) and excel_dots_of_ROI_mask
(after masking)
>> load('~/Human2_dots_of_ROI.mat')
>> excel_dots_of_ROI
excel_dots_of_ROI =
struct with fields:
Opal520_Lp20: [5×6 table]
Opal570Lp1_0: [5×6 table]
Opal620_LP10: [5×6 table]
Opal690Lp30: [5×6 table]
>> excel_dots_of_ROI_mask
excel_dots_of_ROI_mask =
struct with fields:
Opal520_Lp20: [5×6 table]
Opal570Lp1_0: [5×6 table]
Opal620_LP10: [5×6 table]
Opal690Lp30: [5×6 table]
>> head(excel_dots_of_ROI_mask.Opal520_Lp20) %display 1st few rows of this object
ans =
5×6 table
ROI dotname count Volume Location Intensity
__________ _____________ _____ _____________ ___________ ___________
'ROI1' [ 18] 1 [ 68] { 1×1 cell} { 1×1 cell}
'ROI2' [ 8×1 double] 8 [ 8×1 double] { 8×1 cell} { 8×1 cell}
'ROI3' [ 2×1 double] 2 [ 2×1 double] { 2×1 cell} { 2×1 cell}
'ROI4' [13×1 double] 13 [13×1 double] {13×1 cell} {13×1 cell}
'ROI5' [12×1 double] 12 [12×1 double] {12×1 cell} {12×1 cell}
The filter size for DAPI smoothening can be adjusted according to the nuclei size in your images. Line 72 in rnascope_human.m channel = medfilt3(eval(channe_i),[19 19 3]
);. Line 47 in rnascope_human_tif.m channel = medfilt3(eval(channe_i),[7 7 3]
);.
Similary the threshold for gene channel can be increased or decreased at Line 75 in rnascope_human.m channel = imhmin(eval(channe_i),1
*std2(eval(channe_i)));Line 50 in rnascope_human_tif.m channel = imhmin(eval(channe_i),2
*std2(eval(channe_i)));
The above matlab tables for all the images are consolidated into excel files (long_data - ROI level, man - Image level) saved in the path path1
, using the final_table matlab function.
path1 = '/dcl01/lieber/ajaffe/Maddy/RNAscope/dotdotdot/dotdotdot_repo/dotdotdot/output';
ext = 'Human*.czi';
toolbox = '/dcl01/lieber/ajaffe/Maddy/RNAscope/dotdotdot/dotdotdot_repo/dotdotdot/toolbox';
channels = {'DAPILp3','Opal520_Lp20','Opal570Lp1_0','Opal620_LP10','Opal690Lp30','No103_Lipofuscin_63x'};
final_table_mouse(path1,ext,toolbox,channels)
/dcl01/lieber/ajaffe/Maddy/RNAscope/dotdotdot/dotdotdot_repo/dotdotdot/output : 3
file 1 completed
file 2 completed
file 3 completed
The output files have the following columns:
man.csv
represents image-level summary statistics for each channel of input data (represented as *
)
- var1: image name,
- PO_* - total gene pixels in the image
- PM_* - total gene pixels in the image after Lipofuscin masking
- DO_* - total gene dots in the image
- DM_* - total gene dots in the image after Lipofuscin masking
- IO_* - Mean Intensity of gene dots in the image
- IM_* - Mean Intensity of gene dots in the image after Lipofuscin masking
- MI_* - Mean Intensity of the gene channel in the image
longdat.csv
represents ROI-level (or nuclei-level) summary statistics for each channel of input data (represented as *
)
- names: image name in that dataset
- RVolume: ROI volume
- P_* : total transcript pixels in that ROI
- MP_* : total transcript pixels in that ROI after Lipofuscin masking
- MI_P_* : Mean intensity of transcript pixels in that ROI
- MI_MP_* : Mean intensity of transcript pixels in that ROI after Lipofuscin masking
- D_* : total transcript dots in that ROI
- MD_* : total transcript dots in that ROI after Lipofuscin masking
- V_* : mean volume/size (in pixels) of transcript dots in that ROI
- MV_* : mean volume/size (in pixels) of transcript dots in that ROI after Lipofuscin masking
- MI_D_* = Mean intensity of transcript dots in that ROI
- MI_MD_* = Mean intensity of transcript dots in that ROI after Lipofuscin masking
These two CSVs can be read into R (or python, or Matlab, or your favorite statistical sofware tool).
The above csv files long_data, man are extracted into R objects like below. RVolume is volume of ROI, P520 is number of ROI pixels covered with gene in channel 520, MP520 is number of ROI pixels covered with gene in channel 520 after lipofuscin masking, MI_P520 is mean intensity of ROI pixels covered with gene, MI_MP520 is mean intensity of ROI pixels covered with gene after lipofuscin masking, PP520 is proportion of ROI covered with 520 pixels.
head(long_Dat)
name RVolume P520 MP520 MI_P520 MI_MP520 P570 MP570
1 Br1350_Strip1_Layer2and3_1 16727 8 8 9682.75 9682.75 2577 2567
2 Br1350_Strip1_Layer2and3_1 17243 138 135 11582.46 11612.07 31 31
3 Br1350_Strip1_Layer2and3_1 4829 13 13 12051.08 12051.08 468 468
4 Br1350_Strip1_Layer2and3_1 35946 135 135 12042.07 12042.07 6788 6776
5 Br1350_Strip1_Layer2and3_1 15994 239 175 11340.56 11220.83 59 57
6 Br1350_Strip1_Layer2and3_1 43408 4968 4963 14307.84 14312.59 60 60
MI_P570 MI_MP570 P620 MP620 MI_P620 MI_MP620 P690 MP690 MI_P690 MI_MP690
1 20413.50 20410.38 1 1 5824.000 5824.000 9744 9734 6110.654 6111.147
2 18010.06 18010.06 1960 1943 9555.410 9544.715 7287 7272 4856.068 4857.804
3 20599.55 20599.55 0 0 0.000 0.000 937 937 3875.629 3875.629
4 21536.88 21540.04 11 11 9725.455 9725.455 11979 11978 4947.050 4947.152
5 15772.75 15915.42 0 0 0.000 0.000 3 3 3012.000 3012.000
6 15815.37 15815.37 6 6 6966.500 6966.500 27 26 3766.519 3794.808
PP520 PP570 PP620 PP690 Layer ROI Rcluster
1 0.0004782687 0.153464459 5.978358e-05 0.5819334011 Layer2and3 ROI1 SLC17A7
2 0.0078292640 0.001797831 1.126834e-01 0.4217363568 Layer2and3 ROI2 GAD1
3 0.0026920688 0.096914475 0.000000e+00 0.1940360323 Layer2and3 ROI3 SLC17A7
4 0.0037556335 0.188504980 3.060146e-04 0.3332220553 Layer2and3 ROI4 SLC17A7
5 0.0109416031 0.003563836 0.000000e+00 0.0001875703 Layer2and3 ROI5 Neg
6 0.1143337634 0.001382234 1.382234e-04 0.0005989679 Layer2and3 ROI6 MBP
Brain
1 Br1350
2 Br1350
3 Br1350
4 Br1350
5 Br1350
6 Br1350
A test and train dataset is created from a couple of randomnly selected and manually annotated images like below.
Of 89 ROIs from above dataset, 60 random ROIs were used as training dataset and the remaining as test dataset for CART modelling.
library(rpart)
library(rpart.plot)
s = sample(1:89, 60)
aa_train = manually_annotated[s,]
aa_test = manually_annotated[-s,]
head(aa_test)
name RVolume PP520 PP570 PP620
530 Br1350_Strip2_LayerVI_1 18242 0.008551694 0.248711764 0.003124657
532 Br1350_Strip2_LayerVI_1 9313 0.312895952 0.043380221 0.000000000
536 Br1350_Strip2_LayerVI_1 26877 0.001227816 0.174796294 0.003646240
537 Br1350_Strip2_LayerVI_1 9021 0.002882164 0.003768983 0.000000000
540 Br1350_Strip2_LayerVI_1 14528 0.003716960 0.000000000 0.000000000
543 Br1350_Strip2_LayerVI_1 931 0.000000000 0.272824919 0.000000000
PP690 cluster Kcluster
530 0.31723495 r SLC17A7
532 0.02544830 g MBP
536 0.36120103 r SLC17A7
537 0.01263718 n Glia
540 0.00000000 n Glia
543 0.68528464 r SLC17A7
head(aa_train)
name RVolume PP520 PP570 PP620
527 Br1350_Strip2_LayerVI_1 13293 0.339501994 0.0382908298 0.0005265929
528 Br1350_Strip2_LayerVI_1 79877 0.040186787 0.0254391126 0.0004381737
529 Br1350_Strip2_LayerVI_1 28124 0.015289433 0.2272080785 0.0016711705
531 Br1350_Strip2_LayerVI_1 19632 0.007436838 0.0005093725 0.0000000000
533 Br1350_Strip2_LayerVI_1 15278 0.058777327 0.0000000000 0.0000000000
534 Br1350_Strip2_LayerVI_1 10640 0.016165414 0.0002819549 0.0000000000
PP690 cluster Kcluster
527 0.02362145 g MBP
528 0.06294678 n Glia
529 0.40044090 n Glia
531 0.00000000 n Glia
533 0.00000000 n Glia
534 0.00000000 n Glia
fit = rpart(Kcluster~PP520+PP570+PP620, method = "class", data = aa_train)
fit
n= 60
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 60 40 SLC17A7 (0.21666667 0.33333333 0.13333333 0.31666667)
2) PP570>=0.04282934 21 2 SLC17A7 (0.04761905 0.90476190 0.00000000 0.04761905) *
3) PP570< 0.04282934 39 21 Glia (0.30769231 0.02564103 0.20512821 0.46153846)
6) PP520>=0.07558398 11 0 MBP (1.00000000 0.00000000 0.00000000 0.00000000) *
7) PP520< 0.07558398 28 10 Glia (0.03571429 0.03571429 0.28571429 0.64285714)
14) PP620>=0.0007935354 9 1 GAD1 (0.00000000 0.00000000 0.88888889 0.11111111) *
15) PP620< 0.0007935354 19 2 Glia (0.05263158 0.05263158 0.00000000 0.89473684) *
aa_test$rcluster = predict(fit, aa_test, type = "class") #cell labels for whole dataset are similary predicted
aa_test #Kcluster(manually annotated) Rcluster(CART predicted)
name RVolume PP520 PP570 PP620
530 Br1350_Strip2_LayerVI_1 18242 8.551694e-03 0.2487117641 3.124657e-03
532 Br1350_Strip2_LayerVI_1 9313 3.128960e-01 0.0433802212 0.000000e+00
536 Br1350_Strip2_LayerVI_1 26877 1.227816e-03 0.1747962942 3.646240e-03
537 Br1350_Strip2_LayerVI_1 9021 2.882164e-03 0.0037689835 0.000000e+00
540 Br1350_Strip2_LayerVI_1 14528 3.716960e-03 0.0000000000 0.000000e+00
543 Br1350_Strip2_LayerVI_1 931 0.000000e+00 0.2728249194 0.000000e+00
546 Br1350_Strip2_LayerVI_1 10737 3.725435e-03 0.0000000000 1.015181e-01
551 Br1350_Strip2_LayerVI_1 1034 0.000000e+00 0.1972920696 0.000000e+00
554 Br1350_Strip2_LayerVI_1 18926 3.011730e-03 0.0000000000 2.604882e-02
555 Br1350_Strip2_LayerVI_1 8863 3.272030e-03 0.0875550039 0.000000e+00
566 Br1350_Strip2_LayerVI_2 13898 5.445388e-01 0.0016549144 0.000000e+00
567 Br1350_Strip2_LayerVI_2 14462 1.329000e-01 0.0005531738 0.000000e+00
572 Br1350_Strip2_LayerVI_2 11646 7.727975e-04 0.0000000000 0.000000e+00
590 Br1350_Strip2_LayerVI_3 18465 1.949634e-03 0.3502301652 2.166260e-03
598 Br1350_Strip2_LayerVI_3 5005 6.993007e-03 0.0003996004 0.000000e+00
600 Br1350_Strip2_LayerVI_3 9525 2.099738e-04 0.0012598425 0.000000e+00
606 Br1350_Strip2_LayerVI_3 9989 6.086695e-02 0.0000000000 3.286615e-01
1038 Br1531_Strip2_LayerVI_5 27500 4.094545e-02 0.2513454545 5.090909e-04
1040 Br1531_Strip2_LayerVI_5 22390 1.514069e-02 0.0213488164 4.466280e-05
1046 Br1531_Strip2_LayerVI_5 12168 6.163708e-03 0.0004109139 0.000000e+00
1047 Br1531_Strip2_LayerVI_5 50302 1.013876e-03 0.1960160630 8.349569e-04
1077 Br1531_Strip2_LayerVI_6 21493 0.000000e+00 0.0000000000 1.543758e-01
1080 Br1531_Strip2_LayerVI_6 25736 6.294684e-03 0.0000000000 3.885608e-05
1081 Br1531_Strip2_LayerVI_6 29854 9.713941e-04 0.0520198298 0.000000e+00
1083 Br1531_Strip2_LayerVI_6 10081 3.852792e-01 0.0000000000 0.000000e+00
1084 Br1531_Strip2_LayerVI_6 37827 5.142094e-01 0.0018240939 0.000000e+00
1089 Br1531_Strip2_LayerVI_6 9213 0.000000e+00 0.1157060675 0.000000e+00
1093 Br1531_Strip2_LayerVI_6 33521 2.983205e-05 0.0792339131 0.000000e+00
1094 Br1531_Strip2_LayerVI_6 16865 7.596205e-01 0.0000000000 0.000000e+00
PP690 cluster Kcluster Rcluster
530 0.3172349523 r SLC17A7 SLC17A7
532 0.0254482981 g MBP SLC17A7
536 0.3612010269 r SLC17A7 SLC17A7
537 0.0126371799 n Glia Glia
540 0.0000000000 n Glia Glia
543 0.6852846402 r SLC17A7 SLC17A7
546 0.1996833380 p GAD1 GAD1
551 0.7823984526 r SLC17A7 SLC17A7
554 0.5700095107 p GAD1 GAD1
555 0.2998984542 r SLC17A7 SLC17A7
566 0.0048208375 g MBP MBP
567 0.0000000000 g MBP MBP
572 0.0000000000 n Glia Glia
590 0.2777145952 r SLC17A7 SLC17A7
598 0.0000000000 n Glia Glia
600 0.0000000000 n Glia Glia
606 0.4500951046 p GAD1 GAD1
1038 0.4776000000 r SLC17A7 SLC17A7
1040 0.0648503796 r SLC17A7 Glia
1046 0.0023011177 n Glia Glia
1047 0.2858136853 r SLC17A7 SLC17A7
1077 0.2703205695 p GAD1 GAD1
1080 0.0001165682 n Glia Glia
1081 0.0751993033 r SLC17A7 SLC17A7
1083 0.0000000000 g MBP MBP
1084 0.0116583393 g MBP MBP
1089 0.3992184956 r SLC17A7 SLC17A7
1093 0.2989767608 r SLC17A7 SLC17A7
1094 0.0000000000 g MBP MBP