Skip to content

open field post sorting (Python)

klaragerlei edited this page Oct 31, 2019 · 9 revisions

Introduction

To facilitate efficient and reproducible analysis of electrophysiological and behavioural data we have developed a framework that uses data frames implemented in Python. Our overall philosophy is to clearly separate pre-processing of data from analyses that directly address experimental questions. Bonsai, OpenEphys, and MountainSort are used to pre-process the experimental data. For example, Mountainsort is first used for spike clustering. The outputs of this pre-processing are used to initialize data frames that are then used for subsequent analyses.

The framework use two data frames, one for analyses at the level of behavioural sessions, and one for analyses at the level of spike clusters. Data from new behavioural sessions, or clusters, are added to the frames as new rows. Outputs of new analyses are added as new columns. Because the data frames store data and analyses, for multiple sessions and clusters respectively, it's straightforward to implement new analyses over many sessions or clusters without writing complicated looping code. The framework is currently implemented for open field and virtual reality sessions (see VR_post_sorting page for description of VR dataframes).

New analysis code should be added in a way that uses the data frames. If analyses require access to raw data, then a processing step should be used to add the required data to the data frames. Results of subsequent analyses should be added into these data frames as new columns. For instance, if we implement calculating the grid score of cells, this should be a new column in the data frame that contains information on clusters.

At the end of the analysis, three data frames are saved: the position data frame, the cluster data frame, and a thrird data frame that contains the rejected 'noisy' clusters. These are saved as pkl files in each recording folder in /DataFrames on the server.

Description of two main data frames

The 'session' data frame contains processed data describing the position and head-direction of the animal. Each row is data from one session. The columns are organized as follows:

synced_spatial_data (this is the name of the df in the main code)

  • synced_time : arrays of time in seconds, synchronized with the ephys data
  • position_x : arrays of x coordinates of position of animal in arena in cm
  • position_y : y coordinate of position of animal in arena in cm
  • position_x_pixels : x coordinate of position of animal in arena in pixels
  • position_y_pixels : y coordinate of position of animal in arena in pixels
  • hd : head-direction of animal in degrees [-180;180]
  • speed : speed of animal

synced_spatial_data.head() image


The 'clusters' data frame contains data for each cluster and their spatial firing. Each row is a cluster. The columns are organized as follows:

spatial_firing (this is the name of the df in the main code)

  • session_id : name of main recording folder (example: M5_2018-03-06_15-34-44_of)
  • cluster_id : id of cluster within session (1 - number of clusters)
  • tetrode : id of tetrode within session (1 - 4)
  • primary_channel : channel where the event was detected on (1 - 4)
  • firing_times : array of all firing event times that belong to cluster from the open field exploration (in sampling points)
  • number_of_spikes : total number of spikes in session excluding opto tagging part
  • mean_firing_rate : total number of spikes / total time exclding opto tagging data [Hz]
  • firing_times_opto : array of firing events from the opto tagging part of the recording (in sampling points)
  • position_x : x coordinate of position of animal in arena in cm corresponding to each firing event from the exploration
  • position_y : y coordinate of position of animal in arena in cm corresponding to each firing event from the exploration
  • position_x_pixels : x coordinate of position of animal in arena in pixels corresponding to each firing event from the exploration
  • position_y_pixels : y coordinate of position of animal in arena in pixels corresponding to each firing event from the exploration
  • hd : head-direction of animal in degrees corresponding to each firing event from the exploration [-180;180]
  • speed_score : The speed score is a measure of the correlation between the firing rate of the neuron and the running speed of the animal. The firing times of the neuron are binned at the same sampling rate as the position data (speed). The resulting temporal firing histogram is then smoothed with a Gaussian (standard deviation ~250ms). Speed and temporal firing rate are correlated (Pearson correlation) to obtain the speed score. Based on: Gois & Tort, 2018, Cell Reports 25, 1872–1884
  • speed_score_p_values : Pearson correlation p values corresponding to speed scores above
  • firing_maps : binned data array for each cluster with firing rate maps. The firing rate map is calculated by summing the number of spikes in each location and dividing that by the time the animal spent there and then smoothing the surface with a Gaussian centered on each location bin (Leutgeb et al. 2007). m5_2018-03-06_15-34-44_of_rate_map_6 0
  • hd_spike_histogram : polar histogram of HD when the cell fired. For each degree the number of events are counted and then smoothing is done on this data by adding the values up in a (23 degree) rolling window. For each degree between 0 and 360 the number of events between n-window and n+window is added up. This histogram is then divided by the histogram obtained from all the HD data from the session divided by the sampling rate. spike_histogram = spike_histogram_hd/(hd_histogram_session/ephys_sampling_rate)

This is then normalized on the plot hd_hist*(max(hd_hist_cluster))/max(hd_hist)) is plotted.

m5_2018-03-06_15-34-44_of_hd_polar_5

  • rate_map_autocorrelogram : autocorrelogram of rate map. This is obtained by shifting the rate map along x and y axes into every possible overlapping position. Correlation is calculated for all these overlaps. The autocorrelogam is needed for grid cells analyses. image

  • firing_fields : lists of indices that belong to an individual firing field detected. One cluster can have multiple lists. (Indices correspond to the firing rate map.) For example on this image, the yellow circles represent the local maximum that the algorithm found and then all the blue parts around them were taken for that particular firing field. This cluster will have four lists added to its firing fields.

  • firing_fields_hd_session : head-direction histograms that correspond to firing fields (each cluster has a list) - so this data is only from when the animal was in the given field

  • firing_fields_hd_cluster : head-direction histograms that correspond to firing fields when the cell fired - this data is from when the animal was in the field AND the cell fired

  • field_hd_p : Kuiper p values corresponding to the head-direction histograms of each field

  • field_stat : Kuiper raw statistic corresponding to the head-direction histograms of each field

  • field_hd_max_rate : maximum firing rate in field

  • field_preferred_hd : preferred head-direction in field

  • field_hd_score : hd score in field (see hd score definition above)

  • field_max_firing_rate : max firing rate in given field among rate bins

  • number_of_spikes_in_fields : for each cluster where fields were detected, this contains the list of number of spikes for each field

  • time_spent_in_fields_sampling_points : number of sampling points the animal spent in a firing field, this corresponds to number_of_spikes_in_fields

  • spike_times_in_fields : list of firing times of spikes for each field of each cluster

  • times_in_session_fields : time stamps when the mouse was in the field for each field

image

  • max_firing_rate : the highest among the firing rates of the bins of the rate map (Hz)

  • max_firing_rate_hd : the highest among the firing rates of angles of the polar histogram (Hz)

  • preferred_HD : the head-direction angle where the cell fires the most (highest rate), degrees

  • hd_score : score between 0 and 1. The higher the score, the more head-direction specific the cell is.

dy = np.sin(angles_rad) dx = np.cos(angles_rad)

    `totx = sum(dx * hd_hist)/sum(hd_hist)`
    `toty = sum(dy * hd_hist)/sum(hd_hist)`
    `r = np.sqrt(totx*totx + toty*toty)`
    `hd_scores.append(r)`
  • hd_p : result of two-sample Kuiper test on the distribution of hd from the whole session and the distribution of hd when the cell fired. The probability of obtaining two samples this different from the same distribution.

  • hd_stat : the raw test statistic from the Kuiper test described above

  • rayleigh_score : This test is used to identify a non-uniform distribution, i.e. it is designed for detecting an unimodal deviation from uniformity. More precisely, it assumes the following hypotheses: - H0 (null hypothesis): The population is distributed uniformly around the circle. - H1 (alternative hypothesis): The population is not distributed uniformly around the circle. Small p-values suggest to reject the null hypothesis.

This is an alternative to using the population mean vector as a head-directions score.

https://docs.astropy.org/en/stable/_modules/astropy/stats/circstats.html#rayleightest

  • hd_correlation_first_vs_second_half : Pearson correlation coefficient calculated from correlating normalized histograms of head-directions from the first and second half of the session

  • hd_correlation_first_vs_second_half_p : corresponding p value for correlation

Circular statistics are done in R using the circular.r package

(documentation: https://cran.r-project.org/web/packages/circular/circular.pdf)

  • watson_test_hd - stats restuls of two sample Watson test comparing distribution of HD from the whole session to the HD when the cell fired. p value ranged can be inferred from stats https://github.com/MattNolanLab/in_vivo_ephys_openephys/blob/add_python_post_clustering/PostSorting/process_fields.r

  • kuiper_cluster - one sample Kuiper test stats for HD when the cell fired

  • kuiper_session - one sample Kuiper test stats for HD from the whole session

  • watson_cluster - one sample Watson test stats for HD when the cell fired

  • watson_session - one sample Watson test stats for HD for the whole session

spike_data_spatial.head()

image


post_process_recording

The following section will detail what happens in each line of the part of the code that calls analyses that initialize and fill the data frames described above.

In the PostSorting main, parameters (such as pixel ratio and sampling rate) are initialized using the parameters class:

initialize_parameters(recording_to_process)

The spatial data frame is initialized and filled by extracting data from the Bonsai output file:

spatial_data = process_position_data(recording_to_process, session_type, prm)

Light pulses are loaded into arrays. It is also determined at what time point the opto stimulation started.

opto_on, opto_off, is_found = process_light_stimulation(recording_to_process, prm)

The Bonsai data is trimmed to start at the same time as the electrophysiology data. Data from before the first simultaneously recorded sync pulses is discarded. (See more detail on how it is synchronized and why here.)

synced_spatial_data = sync_data(recording_to_process, prm, spatial_data)

Firing times of clusters are loaded into the cluster data frame, where each cluster is one row.

spike_data = PostSorting.load_firing_data.create_firing_data_frame(recording_to_process, session_type, prm)

Spatial data is added to the spike data frame. For each firing time the corresponding location and head-direction is added.

spike_data_spatial = PostSorting.open_field_spatial_firing.process_spatial_firing(spike_data, synced_spatial_data)

Firing rate map arrays are calculated and added to the spike data frame.

position_heat_map, spatial_firing = PostSorting.open_field_firing_maps.make_firing_field_maps(synced_spatial_data, spike_data_spatial, prm)

Example function to calculate speed

Calculate array 'elapsed_time' from the time column of the position data frame:

elapsed_time = position_data['time_seconds'].diff()

Calculate distance traveled based on x and y position coordinates using Pythagorean theorem:

distance_travelled = np.sqrt(position_data['position_x'].diff().pow(2) + position_data['position_y'].diff().pow(2))

Calculate speed and add it to 'position_data' data frame by dividing distance traveled by elapsed time for each data point: position_data['speed'] = distance_travelled / elapsed_time