defaulting to soft signal extraction without overlaps. Updated readme…

… file.
shinkira · Mar 27, 2017 · 550fefb · 550fefb
1 parent 385ec15
commit 550fefb
Show file tree

Hide file tree

Showing 7 changed files with 244 additions and 40 deletions.
diff --git a/README.md b/README.md
@@ -1,9 +1,6 @@
-UPDATE (Christmas 2016): number of clusters determined automatically, but do specify the "diameter" of an average cell for best results. You can do this with
-either db(iexp).diameter or ops.diameter. 
-
 # Suite2p: fast, accurate and complete two-photon pipeline#
 
-Registration, cell detection, spike extraction and manual GUI. 
+Registration, cell detection, spike extraction and visualization GUI. 
 
 Details in [http://biorxiv.org/content/early/2016/06/30/061507](http://biorxiv.org/content/early/2016/06/30/061507).
 
@@ -13,25 +10,47 @@ This code was written by Marius Pachitariu and members of the lab of Kenneth Har
 
 ### I. Introduction ###
 
-This is a complete automated pipeline for processing two-photon Calcium imaging recordings. It is very simple, very fast and yields a large set of active ROIs. A GUI further provides point-and-click capabilities for refining the results in minutes. The pipeline includes the following steps
+This is a complete, automated pipeline for processing two-photon Calcium imaging recordings. It is simple, fast and yields a large set of active ROIs. A GUI further provides point-and-click capabilities for refining the results in minutes. The pipeline includes the following steps
 
 
 1) X-Y subpixel registration --- using a version of the phase correlation algorithm and subpixel translation in the FFT domain. If a GPU is available, this completes in 20 minutes per 1h of recordings at 30Hz and 512x512 resolution.
 
-2) SVD decomposition --- this provides the basis for a number of pixel-level visualization tools. 
+2) SVD decomposition --- this provides the input to cell detection and accelerates the algorithm. 
+
+3) Cell detection --- using clustering methods in a low-dimensional space. The clustering algorithm provides a positive mask for each ROI identified, and allows for overlaps between masks. 
 
-3) Cell detection --- using clustering methods in a low-dimensional space of the fluorescence activity. The pixel clustering algorithm directly provides a positive mask for each ROI identified and stands in contrast to more "black-box" methods that have been previously proposed. The mask is used to weigh the pixels of the ROI before taking the average over pixel ROIs. 
+4) Signal extraction --- by default, all overlapping pixels are discarded when computing the signal inside each ROI, to avoid using "demixing" approaches, which can be biased. The neuropil signal is also computed independently for each ROI, as a weighted pixel average, pooling from a large area around each ROI, but excluding all pixels assigned to ROIs during cell detection. The neuropil subtraction coefficient is estimated by maximizing the skewness of F - coef*Fneu (F is the ROI signal, Fneu is the neuropil signal). The user is encouraged to also try varying this coefficient, to make sure that any scientific results do not depend crucially on it. 
 
-4) Manual curation --- the output of the cell detection algorithm can be visualized and further refined using the included GUI. The GUI is designed to make cell sorting a fun and enjoyable experience. 
+5) Manual curation --- the output of the cell detection algorithm can be visualized and further refined using the included GUI. The GUI is designed to make cell sorting a fun and enjoyable experience. It also includes an automatic classifier that gradually refines itself based on the manual labelling provided by the user. This allows the automated classifier to adapt for different types of data, acquired under different conditions. 
 
-5) Spike deconvolution --- cell and neuropil traces are further processed to obtain an estimate of spike times and spike "amplitudes". The amplitudes here are understood as proportional to the number of spikes in a burst/bin. Under low SNR conditions, the deconvolution is still useful for temporally-localizing responses, and for providing an estimate of the neuropil contamination coefficient (estimated together, see paper for how this works).
+6) Spike deconvolution --- cell and neuropil traces are further processed to obtain an estimate of spike times and spike "amplitudes". The amplitudes are proportional to the number of spikes in a burst/bin. Even under low SNR conditions, where transients might be hard to identify, the deconvolution is still useful for temporally-localizing responses. The cell traces are corrected using the ICA-derived neuropil contamination coefficients, and baselined using the minimum of the overly-smoothed trace. 
 
 ### II. Getting started ###
 
 The toolbox runs in Matlab (+ one mex file) and currently only supports tiff file inputs. To begin using the toolbox, you will need to make local copies (in a separate folder) of two included files: master_file and make_db. It is important that you make local copies of these files, otherwise updating the repository will overwrite them (and you can lose your files). The make_db file assembles a database of experiments that you would like to be processed in batch. It also adds per-session specific information that the algorithm requires such as the number of imaged planes and channels. The master_file sets general processing options that are applied to all sessions included in make_db, UNLESS the option is over-ridden in the make_db file.  The global and session-specific options are described in detail below. 
 
 For spike deconvolution, you need to run mex -largeArrayDims SpikeDetection/deconvL0.c (or .cpp under Linux/Mac)
 
+Below we describe the outputs of the pipeline first, and then describe the options for setting it up, and customizing it. Importantly, almost all options have pre-specified defaults. Any options specified in master_file in ops0 overrides these defaults. Furthermore, any option specified in the make_db file (experiment specific) overrides both the defaults and the options set in master_file. This allows for flexibility in processing different experiments with different options. The only critical option that you'll need to set is ops0.diameter, or db(N).diameter. This gives the algorithm the scale of the recording, and the size of ROIs you are trying to extract. We recommend as a first run to try the pipeline after setting the diameter option. Depending on the results, you can come back and try changing some of the other options.  
+
+Note: some of the options are not specified in either the example master_file or the example make_db file. These are usually more specialized features.
+
+### III. Outputs. ###
+
+The output is a struct called dat which is saved into a mat file in ResultsSavePath using the same subfolder structure, under a name formatted like F_M150329_MP009_2015-04-29_plane1. It contains all the information collected throughout the processing, and contains the ROI and neuropil traces in Fcell and FcellNeu, and whether each ROI j is a cell or not in stat(j).iscell. stat(j) contains information about each ROI j and can be used to recover the corresponding pixels for each ROI in stat.ipix. The centroid of the ROI is specified in stat as well. Here is a summary of where the important results are:
+
+cell traces are in dat.Fcell
+neuropil traces are in dat.FcellNeu
+manual, GUI overwritten "iscell" labels are in dat.cl.iscell
+
+stat(icell) contains all other information:
+iscell: automated label, based on anatomy
+neuropilCoefficient: neuropil subtraction coefficient, based on maximizing the skewness of the corrected trace (ICA)
+st: are the deconvolved spike times (in frames)
+c:  are the deconvolved amplitudes
+kernel: is the estimated kernel
+
+
 ### III. Input-output file paths ###
 
 RootStorage --- the root location where the raw tiff files are  stored.
@@ -48,19 +67,6 @@ All of these filepaths are completed with separate subfolders per animal and exp
 
 \RootStorage\mouse_name\session\block\*.tif(f)
 
-The output is a struct called dat which is saved into a mat file in ResultsSavePath using the same subfolder structure, under a name formatted like F_M150329_MP009_2015-04-29_plane1_Nk650. It contains all the information collected throughout the processing, and contains the fluorescence traces in dat.F.Fcell and whether a given ROI is a cell or not in dat.F.iscell. dat.stat contains information about each ROI and can be used to recover the corresponding pixels for each ROI in dat.stat.ipix. The centroid of the ROI is specified in dat.stat as well. Here is a summary of where the important results are:
-
-cell traces are in dat.F.Fcell
-neuropil traces are in dat.F.FcellNeu
-neuropil subtraction coefficient is dat.cl.dcell{i}.B(3)
-baseline is dat.cl.dcell{i}.B(2)
-anatomical cell criterion is in dat.cl.isroi
-manual overwritten cell labels are in dat.cl.iscell
-dat.cl.dcell{i}.st are the deconvolved spike times (in frames)
-dat.cl.dcell{i}.c  are the deconvolved amplitudes
-dat.cl.dcell{i}.kernel is the estimated kernel
-
-
 ### IV. Options for registration ###
 
 showTargetRegistration --- whether to show an image of the target frame immediately after it is computed. 
@@ -139,13 +145,6 @@ MaxNpix --- maximum number of pixels per ROI. Automatically set by diameter, if
 
 MinNpix --- minimum number of pixels per ROI. Automatically set by diameter, if provided. 
 
-Compact --- a compactness criterion for how close pixels are to the center of the ROI. 1 is the lowest possible value, achieved by perfect disks. Best to leave this to a high value (i.e. 2) before the manual sorting stage. 
-
-parent --- these are criteria imposed on the parent cluster (before separating connected regions). These are not currently used during the automated step, but are available in the GUI.
-
-parent.minPixRelVar --- significant regions need to have at least >1/10 the mean variance of all regions
-
-parent.MaxRegions --- if there are more non-significant regions than this number, this parent ROI is probably very spread out over many small components and its connected regions are not good cells: it will be discarded. 
 
 ### IX. Example database entry ###
 
@@ -158,3 +157,5 @@ db(i).expts         = [5 6]; % which experiments to process together
 
 Other (hidden) options are described in make_db_example.m, and at the top of run_pipeline.m (set to reasonable defaults), and get_signals_and_neuropil.m (neuropil "surround" option). 
 
+
+
diff --git a/cellDetection/getNeuropilBasis.m b/cellDetection/getNeuropilBasis.m
@@ -1,7 +1,9 @@
 function S = getNeuropilBasis(ops, Ly, Lx, type)
 
+ops.neuropilRange = getOr(ops, 'neuropilRange', 3);
+
 TileFactor = getOr(ops, {'TileFactor'}, 1); % this option can be overwritten by the user
-nTiles = ceil(TileFactor * (Ly+Lx)/2 / (3 * ops.diameter)); % neuropil is modelled as nTiles by nTiles
+nTiles = ceil(TileFactor * (Ly+Lx)/2 / (ops.neuropilRange * ops.diameter)); % neuropil is modelled as nTiles by nTiles
 
 S = zeros(Ly, Lx, nTiles, nTiles, 'single');
 

diff --git a/cellDetection/sourcery.m b/cellDetection/sourcery.m
@@ -134,7 +134,7 @@
     %
     % ADD NEUROPIL INTO REGRESSION HERE    
     LtL     = full(L'*L);
-    codes = ([LtL LtS; LtS' StS]+ 0e-3 * eye(icell+nBasis))\[LtU; StU];
+    codes = ([LtL LtS; LtS' StS]+ 1e-3 * eye(icell+nBasis))\[LtU; StU];
     neu   = codes(icell+1:end,:);    
     codes = codes(1:icell,:);
 %     codes = (LtL+ 1e-3 * eye(icell))\LtU;    

diff --git a/master_file_example.m b/master_file_example.m
@@ -21,6 +21,7 @@
 
 ops0.useGPU                 = 1; % if you can use an Nvidia GPU in matlab this accelerates registration approx 3 times. You only need the Nvidia drivers installed (not CUDA).
 ops0.fig                    = 1; % turn off figure generation with 0
+% ops0.diameter               = 12; % most important parameter. Set here, or individually per experiment in make_db file
 
 % root paths for files and temporary storage (ideally an SSD drive. my SSD is C:/)
 ops0.RootStorage            = '//zserver4/Data/2P'; % Suite2P assumes a folder structure, check out README file
@@ -44,6 +45,7 @@
 ops0.sig                    = 0.5;  % spatial smoothing length in pixels; encourages localized clusters
 ops0.nSVDforROI             = 1000; % how many SVD components for cell clustering
 ops0.NavgFramesSVD          = 5000; % how many (binned) timepoints to do the SVD based on
+ops0.signalExtraction       = 'raw'; % how to extract ROI and neuropil signals: 'raw', 'regression'
 
 % spike deconvolution options
 ops0.imageRate              = 30;   % imaging rate (cumulative over planes!). Approximate, for initialization of deconvolution kernel.
@@ -82,10 +84,10 @@
 % cell traces are in dat.Fcell
 % neuropil traces are in dat.FcellNeu
 % manual, GUI overwritten "iscell" labels are in dat.cl.iscell
-
-% stat(icell) contains all other information
-% autoamted iscell label, based on anatomy
-% neuropil subtraction coefficient 
-% st are the deconvolved spike times (in frames)
-% c  are the deconvolved amplitudes
-% kernel is the estimated kernel
+%  
+% stat(icell) contains all other information:
+% iscell: automated label, based on anatomy
+% neuropilCoefficient: neuropil subtraction coefficient, based on maximizing the skewness of the corrected trace (ICA)
+% st: are the deconvolved spike times (in frames)
+% c:  are the deconvolved amplitudes
+% kernel: is the estimated kernel
diff --git a/run_pipeline.m b/run_pipeline.m
@@ -10,7 +10,7 @@ function  run_pipeline(db, ops0)
 ops0.getROIs                        = getOr(ops0, {'getROIs'}, 1);   % whether to run the optimization
 ops0.getSVDcomps                    = getOr(ops0, {'getSVDcomps'}, 0);   % whether to save SVD components to disk for later processing
 ops0.nSVD                           = getOr(ops0, {'nSVD'}, 1000);   % how many SVD components to save to disk
-
+ops0.signalExtraction               = getOr(ops0, 'signalExtraction', 'raw');
 
 ops                                 = build_ops3(db, ops0);
 if isfield(ops, 'numBlocks') && ~isempty(ops.numBlocks)
@@ -82,7 +82,12 @@ function  run_pipeline(db, ops0)
         [ops, stat, model]           = sourcery(ops,U, model);
 
         % extract dF
-        [ops, stat, Fcell, FcellNeu] = extractSignals(ops, model, stat);
+        switch ops.signalExtraction
+            case 'raw'
+                [ops, stat, Fcell, FcellNeu] = extractSignalsNoOverlaps(ops, model, stat);
+            case 'regression'
+                [ops, stat, Fcell, FcellNeu] = extractSignals(ops, model, stat);
+        end
 
         % apply user-specific clustrules to infer stat.iscell
         stat                         = classifyROI(stat, ops.clustrules);

diff --git a/signalExtraction/extractSignalsNoOverlaps.m b/signalExtraction/extractSignalsNoOverlaps.m
@@ -0,0 +1,182 @@
+function [ops, stat, Fcell, FcellNeu]      = extractSignalsNoOverlaps(ops, m, stat)
+
+ops.saveNeuropil = getOr(ops, 'saveNeuropil', 0);
+
+Nk = numel(stat);
+
+Ly = numel(ops.yrange);
+Lx = numel(ops.xrange);
+
+% make new set of basis functions (larger coverage)
+ops.neuropilRange = 10;
+
+S = getNeuropilBasis(ops, Ly, Lx, 'raisedcosyne'); % 'raisedcosyne', 'Fourier'
+S = normc(S);
+
+% S = m.S;
+
+S = reshape(S, [], size(S, ndims(S)));
+nBasis = size(S,2);
+
+% initialize mask
+maskNeu = ones(size(S,1), 1);
+
+stat = getNonOverlapROIs(stat, Ly, Lx);
+
+for k = 1:Nk
+    ix = stat(k).ipix(~stat(k).isoverlap);
+    maskNeu(stat(k).ipix)= 0;
+    if numel(ix)==0
+        LtS(k,:) = 0;
+    else
+        LtS(k,:) = stat(k).lam(~stat(k).isoverlap)' * S(ix, :);
+    end
+end
+
+% add all pixels within X um 
+if isfield(ops, 'exclFracCell') && ops.exclFracCell>0
+    H       = fspecial('disk', round(ops.diameter * ops.exclFracCell));
+    maskNeu = reshape(maskNeu, Ly, Lx);
+    maskNeu = imfilter(maskNeu, H, 'replicate');
+    maskNeu = single(maskNeu(:) > 1-1e-3);
+end
+%% get signals  
+S    = bsxfun(@times, S, maskNeu(:));
+StS = S' * S;
+StS = StS + 1e-2 * eye(size(StS));
+
+nimgbatch = 2000;
+
+ix = 0;
+fclose all;
+fid = fopen(ops.RegFile, 'r');
+
+tic
+F        = zeros(Nk, sum(ops.Nframes), 'single');
+Fneu    = zeros(Nk, sum(ops.Nframes), 'single');
+if ops.saveNeuropil
+    Ntraces = zeros(nBasis, sum(ops.Nframes), 'single');
+end
+% S    = bsxfun(@times, S, maskNeu(:));
+
+[Ly Lx] = size(ops.mimg1);
+
+mimg1 = ops.mimg1(ops.yrange, ops.xrange);
+% find indices of good clusters 
+while 1
+    data = fread(fid,  Ly*Lx*nimgbatch, '*int16');
+    if isempty(data)
+       break; 
+    end
+    data = reshape(data, Ly, Lx, []);
+    data = data(ops.yrange, ops.xrange, :);
+    data = single(data);
+    NT   = size(data,3);
+
+    % process the data
+    data = bsxfun(@minus, data, mimg1);
+    data = my_conv2(data, ops.sig, [1 2]);
+    data = bsxfun(@rdivide, data, m.sdmov);    
+    data = single(reshape(data, [], NT));
+
+    %
+    Ftemp = zeros(Nk, NT, 'single');
+    for k = 1:Nk
+       ipix = stat(k).ipix(~stat(k).isoverlap)'; 
+       if ~isempty(ipix)
+           Ftemp(k,:) = stat(k).lam(~stat(k).isoverlap)' * data(ipix,:);
+       end
+    end
+    F(:,ix + (1:NT))    = Ftemp;
+
+    Tneu                = StS\(S' * data);
+    Ftemp2              = LtS * Tneu;    
+    Fneu(:,ix + (1:NT)) = Ftemp2;
+
+%     Fneu(:,ix + (1:NT))     = m.LtS * Fdeconv(1+Nk:end, :); % estimated neuropil
+%     F(:,ix + (1:NT))        = Fneu(:,ix + (1:NT)) + Fdeconv(1:Nk, :); % estimated ROI signal
+
+    if ops.saveNeuropil
+        Ntraces(:,ix + (1:NT)) = Tneu;
+    end
+
+    ix = ix + NT;
+    if rem(ix, 3*NT)==0
+        fprintf('Frame %d done in time %2.2f \n', ix, toc)
+    end
+end
+fclose(fid);
+%% add the means back in to both neuropil and total
+data = my_conv2(mimg1, ops.sig, [1 2]);
+data = bsxfun(@rdivide, data, m.sdmov);
+data = single(reshape(data, [], 1));
+
+scalefactors = nan(numel(stat),1);
+Ftemp = zeros(Nk, 1, 'single');
+for k = 1:Nk
+    ipix = stat(k).ipix(~stat(k).isoverlap)'; 
+    if ~isempty(ipix)
+        Ftemp(k,:) = stat(k).lam(~stat(k).isoverlap)' * data(ipix,1);
+        scalefactors(k) = mean(m.sdmov(ipix));
+    end
+end
+
+Tneu                = StS\(S' * data);
+Ftemp2              = LtS * Tneu;
+
+Fneu     = bsxfun(@plus, Fneu, Ftemp2); % estimated neuropil
+F        = bsxfun(@plus, F,    Ftemp);
+
+Fneu     = bsxfun(@times, Fneu, scalefactors); % estimated neuropil
+F        = bsxfun(@times, F,    scalefactors);
+
+%% get activity stats
+indNoNaN    = find(~ops.badframes);
+ix          = cumsum(~ops.badframes) + 1;
+ix          = ix(ops.badframes);
+ix(ix>numel(indNoNaN))  = numel(indNoNaN);
+
+   F(:, ops.badframes)  = F(:,    indNoNaN(ix));
+Fneu(:, ops.badframes)  = Fneu(:, indNoNaN(ix));
+
+% figure out the ICA coefficients here
+ops.fs           = getOr(ops, 'fs', ops.imageRate/ops.nplanes);
+%
+[coefNeu, inomax]   = my_ica(F', Fneu', ops.fs, 0.7);
+dF                  = F - bsxfun(@times, Fneu, coefNeu(:));
+
+% dF          = F - Fneu;
+
+sd           = std(dF, [], 2);
+sdN          = std(Fneu, [], 2);
+
+sk(:, 1) = skewness(dF, [], 2);
+sk(:, 2) = sd./sdN; 
+sk(:, 3) = (max(dF, [], 2)-median(dF, 2))./sd;
+sk(:, 4) = (prctile(dF, 95, 2)-median(dF, 2))./sd;
+
+for j = 1:numel(stat)
+    stat(j).dFstat           = sk(j,:);
+    stat(j).skew             = sk(j,1);
+    stat(j).std              = sk(j,2);
+    stat(j).maxMinusMed      = sk(j,3);
+    stat(j).top5pcMinusMed   = sk(j,4);
+    stat(j).blockstarts      = [0 cumsum(ops.Nframes)];
+    stat(j).iplane                 = ops.iplane;
+    stat(j).neuropilCoefficient    = coefNeu(j); 
+end
+
+%%
+csumNframes = [0 cumsum(ops.Nframes)];
+Fcell       = cell(1, length(ops.Nframes));
+FcellNeu    = cell(1, length(ops.Nframes));
+for i = 1:length(ops.Nframes)
+    Fcell{i}     = F(:, csumNframes(i) + (1:ops.Nframes(i)));
+    FcellNeu{i}  = Fneu(:, csumNframes(i) + (1:ops.Nframes(i)));
+end
+
+if getOr(ops, 'saveNeuropil', 0)
+    S = reshape(S, numel(ops.yrange), numel(ops.xrange), Nbasis);
+    save(sprintf('%s/NEU_%s_%s_plane%d.mat', ops.ResultsSavePath, ...
+        ops.mouse_name, ops.date, ops.iplane),  'ops', 'S', 'Ntraces', '-v7.3')
+end
diff --git a/signalExtraction/getNonOverlapROIs.m b/signalExtraction/getNonOverlapROIs.m
@@ -0,0 +1,12 @@
+function stat = getNonOverlapROIs(stat, Ly, Lx)
+
+
+Mask = zeros(Ly, Lx);
+
+for k = 1:numel(stat)
+   Mask(stat(k).ipix) = Mask(stat(k).ipix) + 1;
+end
+
+for k = 1:numel(stat)
+   stat(k).isoverlap = Mask(stat(k).ipix)>1; 
+end