Neural recordings are accompanied by various other measurements, controlled manipulations, and other meta data. For example, timing of a visual cue, time course of an auditory stimulus, behavioral measurements (button press, eye movement, EMG), time and magnitude of the reward, heart beat, on going EEG/ECoG/LFP measurements, and so on. It is cumbersome to find a representation of these covariates for regression-style analysis, especially when they are events. This is because events usually do not have an instantaneous effect to the response (dependent) variable. This package allows you to expand and transform your experimental variables to a feature space as a design matrix such that a simple linear analysis could yield desired results.
This tutorial will explain how to import your experimental data, and build appropriate features spaces to do fancy regression analysis easily.
We assume the experimental variables are observations over time, and organized into trials. If your data don't have a trial structure, you can put all your data in a single trial. There are 4 types of variables that constitute data: spike train, timing, continuous, and value. This framework uses string labels to address each variable later.
Each spike train is a sequence of spike timings from a single neuron. The spike timings are relative to the beginning of the trial.
A typical trial based experiment may have a cue that indicates the beginning of a trial, cues that indicate waiting period, or presentation of a target at random times. These covariates are best represented as events. However, if two or more timings are perfectly correlated, it would be sufficient to include just one (see Experiment Design for ore information). Note that many behaviors are also recorded as timing: reaction time, button press time, etc.
Continuous data are measured continuously over time. For instance, eye position of the animal may be correlated with the activities of neurons of the study, so the experimentalist could carefully measure it throughout the experiment. Note that the sampling rate should match the bin size of the analysis, otherwise up-sampling, or down-sampling (with appropriate filtering) is necessary.
Each trial can have a single value associated with it. In many cases these are trial specific parameters such as strength of the stimulus, type of cue, or the behavioral category. These values can be used to build a feature space, or to include specific feature in trials only when certain conditions are met.
Each experimental variable must be registered before the data are loaded.
First, create an experiment object using initExperiment
:
expt = buildGLM.initExperiment(unitOfTime, binSize, uniqueID, expParam);
where unitOfTime
is a string for the time unit that's going to be used consistently throughout (e.g., 's' or 'ms'), binSize
is the duration of the time bin to discretize the timings.
uniqueID
is a string to uniquely identify the experiment among other experiments (mostly for the organizational purpose) and automatically generated if omitted using []
.
expParam
can be anything that you want to associate with the experiment structure for easy access later, since it will be carried around throughout the code.
Then, each experimental variable is registered by indicating the type, label, and user friendly name of the variable.
expt = buildGLM.registerContinuous(expt, 'LFP', 'Local Field Potential', 1); % continuous obsevation over time
expt = buildGLM.registerContinuous(expt, 'eyepos', 'Eye Position', 2); % 2 dimensional observation
expt = buildGLM.registerTiming(expt, 'dotson', 'Motion Dots Onset'); % events that happen 0 or more times per trial (sparse)
expt = buildGLM.registerTiming(expt, 'saccade', 'Monkey's Saccade Timing');
expt = buildGLM.registerSpikeTrain(expt, 'sptrain', 'Our Neuron'); % Spike train!!!
expt = buildGLM.registerSpikeTrain(expt, 'sptrain2', 'Neighbor Neuron');
expt = buildGLM.registerValue(expt, 'coh', 'Dots Coherence'); % information on the trial, but not associated with time
expt = buildGLM.registerValue(expt, 'choice', 'Direction of Monkey's Choice');
Note that one can omit the prefix buildGLM.
by importing the name space once via
import buildGLM.*
For each trial, we load each of the possible covariate into the experiment structure.
For each trial, we make a temporary object trial
to load the data:
trial = buildGLM.newTrial(expt, duration);
where duration
is the length of the current trial in unitOfTime
.
trial
is a structure where you can need to add each of your experimental variables you have registered for the experiment as fields. Below are examples with randomly generated dummy data.
trial.dotson = rand() * duration; % timing variable
st = sort(rand(poissrnd(0.1 * duration), 1) * duration); % homogeneous Poisson process
trial.sptrain = st; % spike train variable
trial.choice = round(rand); % value variable
T = expt.binfun(trial.duration); % number of bins for this trial
trial.eyepos = randn(T, 1); % continuous variable
Finally, we add the trial object to the experiment object with an associated trial index kTrial
:
expt = buildGLM.addTrial(expt, trial, kTrial);
Repeat this for all your trials, and your are done loading your data. See tutorial*.m
for examples.
Once you are comfortable with the desired data structure, which is just a structure array of the trial objects, you can avoid calling the newTrial
and addTrial
functions, and directly plug-in your data into the structure via (see tutorial_exampleData.m
):
expt.trial = dataInTrialStruct; % only if you know what you are doing
Once you have your data loaded as an experiment object, you are now ready to specify how your experimental variables will be represented, and hence how your design matrix will be formed.
We start by creating a design specification object.
dspec = buildGLM.initDesignSpec(expt);
You can have multiple such object per experiment to analyze your experiments in different ways and compare models.
The design specification object dspec
contains specification of how each covariate for the analysis is defined, and the information necessary for temporal embedding and/or nonlinear transformation.
For a timing variable, the following syntax adds a delta function at the time of the event:
dspec = buildGLM.addCovariateTiming(dspec, 'fpon', 'fpon', 'Fixation On');
However, this is seldom what you want. You probably want to have temporal basis to represent delayed effects of the covariate to the response variable. Let's make a set of 8 boxcar basis functions to cover 300 ms evenly:
bs = basisFactory.makeSmoothTemporalBasis('boxcar', 300, 8, expt.binfun);
and use this to represent the effect of timing event instead:
dspec = buildGLM.addCovariateTiming(dspec, 'fpon', 'fpon', 'Fixation On', bs);
If you want to use autoregressive point process modeling (often known as GLM in neuroscience) by adding the spike history filter, you can do the following:
dspec = buildGLM.addCovariateSpiketrain(dspec, 'hist', 'sptrain', 'History filter');
This adds spike history filters with default history basis functions. You can do the same to add the coupling filters from other neurons (causal only by default):
dspec = buildGLM.addCovariateSpiketrain(dspec, 'coupling', 'sptrain2', 'Coupling from neuron 2');
You can add continuous covariates with or without basis functions as well:
dspec = buildGLM.addCovariateRaw(dspec, 'eyepos', 'Eye position effect', bs);
If you have two timing variables that represent a duration, and would like to represent it as a boxcar:
%% Stimulus starts at `dotson` and ends at `dotsoff`
dspec = buildGLM.addCovariateBoxcar(dspec, 'dots', 'dotson', 'dotsoff', 'Motion dots stim');
The +basisFactory
package provides functions that generate basis function structures.
Instead of a boxcar, you can use raised cosine basis functions:
bs = basisFactory.makeSmoothTemporalBasis('raised cosine', 200, 10, expt.binfun);
The raised cosine functions are spaced such that they can represent linear functions as well as smoothly varying functions.
The ultimate output is the design matrix:
dm = buildGLM.compileSparseDesignMatrix(dspec, trialIndices);
where trialIndices
are the trials to include in making the design matrix. This function is memory intensive, and could take a few seconds to complete.
dm
is a structure that contains the actual design matrix as dm.X
. You can visualize this matrix using what?
If your design matrix is not very sparse (less than 10% sparse, for example), it's better to conver the design matrix to a full (dense) matrix for speed.
dm.X = full(dm.X);
Coming soon!
Once you have designed your features, and obtained the design matrix, it's finally time to do some analysis!
You need to obtain the response variable of the same length as the number of rows in the design matrix to do regression. For point process regression, where we want to predict the observed spike train from covariates, this would be a finely binned spike train concatenated over the trials of interest:
%% Get the spike trains back to regress against
y = buildGLM.getBinnedSpikeTrain(expt, 'sptrain', dm.trialIndices);
For predicting some continuous observation, such as predicting the LFP, you can do:
y = buildGLM.getResponseVariable(expt, 'LFP', dm.trialIndices);
Make sure your y
is a column vector; getResponseVariable
returns a matrix if the experimental variable is more than 1 dimension.
You can do whatever you want to do the regression. Simple least squares can be done via:
w = dm.X' * dm.X \ dm.X' * y;
Or you can use the glmfit
in MATLAB statistics toolbox to do the Poisson regression.
%% Maximum likelihood estimation using glmfit
[w, dev, stats] = glmfit(dm.X, y, 'poisson', 'link', 'log');
Result of regression is a weight vector (and sometimes additional associated statistics in a vector or matrix) in the feature space. Hence, the weight vector is as long as the number of columns in the design matrix. In order to obtain meaningful temporal weights back corresponding to each covariate, use
ws = buildGLM.combineWeights(dm, w);
It returns a structure that contains a time axis ws.(covLabel).tr
and data ws.(covLabel).data
for each covLabel
in the design specification structure.