Merge branch 'main' of github.com:SpikeInterface/spikeinterface into …

…split_merge_clean
SpikeInterface · Sep 29, 2023 · 1a01e49 · 1a01e49
2 parents cf2e58c + 225d9d1
commit 1a01e49
Show file tree

Hide file tree

Showing 89 changed files with 2,908 additions and 2,307 deletions.
diff --git a/doc/conf.py b/doc/conf.py
@@ -118,14 +118,15 @@
     'examples_dirs': ['../examples/modules_gallery'],
     'gallery_dirs': ['modules_gallery', ],  # path where to save gallery generated examples
     'subsection_order': ExplicitOrder([
-                                       '../examples/modules_gallery/core/',
-                                       '../examples/modules_gallery/extractors/',
+                                       '../examples/modules_gallery/core',
+                                       '../examples/modules_gallery/extractors',
                                        '../examples/modules_gallery/qualitymetrics',
                                        '../examples/modules_gallery/comparison',
                                        '../examples/modules_gallery/widgets',
                                        ]),
     'within_subsection_order': FileNameSortKey,
     'ignore_pattern': '/generate_',
+    'nested_sections': False,
 }
 
 intersphinx_mapping = {

diff --git a/doc/how_to/index.rst b/doc/how_to/index.rst
@@ -7,3 +7,4 @@ How to guides
     get_started
     analyse_neuropixels
     handle_drift
+    load_matlab_data
diff --git a/doc/how_to/load_matlab_data.rst b/doc/how_to/load_matlab_data.rst
@@ -0,0 +1,101 @@
+Exporting MATLAB Data to Binary & Loading in SpikeInterface
+===========================================================
+
+In this tutorial, we will walk through the process of exporting data from MATLAB in a binary format and subsequently loading it using SpikeInterface in Python.
+
+Exporting Data from MATLAB
+--------------------------
+
+Begin by ensuring your data structure is correct. Organize your data matrix so that the first dimension corresponds to samples/time and the second to channels.
+Here, we present a MATLAB code that creates a random dataset and writes it to a binary file as an illustration.
+
+.. code-block:: matlab
+
+   % Define the size of your data
+   numSamples = 1000;
+   numChannels = 384;
+
+   % Generate random data as an example
+   data = rand(numSamples, numChannels);
+
+   % Write the data to a binary file
+   fileID = fopen('your_data_as_a_binary.bin', 'wb');
+   fwrite(fileID, data, 'double');
+   fclose(fileID);
+
+.. note::
+
+   In your own script, replace the random data generation with your actual dataset.
+
+Loading Data in SpikeInterface
+------------------------------
+
+After executing the above MATLAB code, a binary file named :code:`your_data_as_a_binary.bin` will be created in your MATLAB directory. To load this file in Python, you'll need its full path.
+
+Use the following Python script to load the binary data into SpikeInterface:
+
+.. code-block:: python
+
+   import spikeinterface as si
+   from pathlib import Path
+
+   # Define file path
+   # For Linux or macOS:
+   file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin")
+   # For Windows:
+   # file_path = Path(r"c:\path\to\your\data\your_data_as_a_binary.bin")
+
+   # Confirm file existence
+   assert file_path.is_file(), f"Error: {file_path} is not a valid file. Please check the path."
+
+   # Define recording parameters
+   sampling_frequency = 30_000.0  # Adjust according to your MATLAB dataset
+   num_channels = 384  # Adjust according to your MATLAB dataset
+   dtype = "float64"  # MATLAB's double corresponds to Python's float64
+
+   # Load data using SpikeInterface
+   recording = si.read_binary(file_path, sampling_frequency=sampling_frequency,
+                              num_channels=num_channels, dtype=dtype)
+
+   # Confirm that the data was loaded correctly by comparing the data shapes and see they match the MATLAB data
+   print(recording.get_num_frames(), recording.get_num_channels())
+
+Follow the steps above to seamlessly import your MATLAB data into SpikeInterface. Once loaded, you can harness the full power of SpikeInterface for data processing, including filtering, spike sorting, and more.
+
+Common Pitfalls & Tips
+----------------------
+
+1. **Data Shape**: Make sure your MATLAB data matrix's first dimension is samples/time and the second is channels. If your time is in the second dimension, use :code:`time_axis=1` in :code:`si.read_binary()`.
+2. **File Path**: Always double-check the Python file path.
+3. **Data Type Consistency**: Ensure data types between MATLAB and Python are consistent. MATLAB's `double` is equivalent to Numpy's `float64`.
+4. **Sampling Frequency**: Set the appropriate sampling frequency in Hz for SpikeInterface.
+5. **Transition to Python**: Moving from MATLAB to Python can be challenging. For newcomers to Python, consider reviewing numpy's `Numpy for MATLAB Users <https://numpy.org/doc/stable/user/numpy-for-matlab-users.html>`_ guide.
+
+Using gains and offsets for integer data
+----------------------------------------
+
+Raw data formats often store data as integer values for memory efficiency. To give these integers meaningful physical units, you can apply a gain and an offset.
+In SpikeInterface, you can use the :code:`gain_to_uV` and :code:`offset_to_uV` parameters, since traces are handled in microvolts (uV). Both parameters can be integrated into the :code:`read_binary` function.
+If your data in MATLAB is stored as :code:`int16`, and you know the gain and offset, you can use the following code to load the data:
+
+.. code-block:: python
+
+   sampling_frequency = 30_000.0  # Adjust according to your MATLAB dataset
+   num_channels = 384  # Adjust according to your MATLAB dataset
+   dtype_int = 'int16'  # Adjust according to your MATLAB dataset
+   gain_to_uV = 0.195  # Adjust according to your MATLAB dataset
+   offset_to_uV = 0   # Adjust according to your MATLAB dataset
+
+   recording = si.read_binary(file_path, sampling_frequency=sampling_frequency,
+                              num_channels=num_channels, dtype=dtype_int,
+                              gain_to_uV=gain_to_uV, offset_to_uV=offset_to_uV)
+
+   recording.get_traces()  # Return traces in original units [type: int]
+   recording.get_traces(return_scaled=True)  # Return traces in micro volts (uV) [type: float]
+
+
+This will equip your recording object with capabilities to convert the data to float values in uV using the :code:`get_traces()` method with the :code:`return_scaled` parameter set to :code:`True`.
+
+.. note::
+
+   The gain and offset parameters are usually format dependent and you will need to find out the correct values for your data format. You can load your data without gain and offset but then the traces will be in integer values and not in uV.
diff --git a/doc/modules/comparison.rst b/doc/modules/comparison.rst
@@ -248,21 +248,19 @@ An **over-merged** unit has a relatively high agreement (>= 0.2 by default) for
 We also have a high level class to compare many sorters against ground truth:
 :py:func:`~spiekinterface.comparison.GroundTruthStudy()`
 
-A study is a systematic performance comparison of several ground truth recordings with several sorters.
+A study is a systematic performance comparison of several ground truth recordings with several sorters or several cases
+like the different parameter sets.
 
-The study class proposes high-level tool functions to run many ground truth comparisons with many sorters
+The study class proposes high-level tool functions to run many ground truth comparisons with many "cases"
 on many recordings and then collect and aggregate results in an easy way.
 
 The all mechanism is based on an intrinsic organization into a "study_folder" with several subfolder:
 
-  * raw_files : contain a copy of recordings in binary format
-  * sorter_folders : contains outputs of sorters
-  * ground_truth : contains a copy of sorting ground truth in npz format
-  * sortings: contains light copy of all sorting in npz format
-  * tables: some tables in csv format
-
-In order to run and rerun the computation all gt_sorting and recordings are copied to a fast and universal format:
-binary (for recordings) and npz (for sortings).
+  * datasets: contains ground truth datasets
+  * sorters : contains outputs of sorters
+  * sortings: contains light copy of all sorting
+  * metrics: contains metrics
+  * ...
 
 
 .. code-block:: python
@@ -274,28 +272,51 @@ binary (for recordings) and npz (for sortings).
     import spikeinterface.widgets as sw
     from spikeinterface.comparison import GroundTruthStudy
 
-    # Setup study folder
-    rec0, gt_sorting0 = se.toy_example(num_channels=4, duration=10, seed=10, num_segments=1)
-    rec1, gt_sorting1 = se.toy_example(num_channels=4, duration=10, seed=0, num_segments=1)
-    gt_dict = {
-        'rec0': (rec0, gt_sorting0),
-        'rec1': (rec1, gt_sorting1),
+
+    # generate 2 simulated datasets (could be also mearec files)
+    rec0, gt_sorting0 = generate_ground_truth_recording(num_channels=4, durations=[30.], seed=42)
+    rec1, gt_sorting1 = generate_ground_truth_recording(num_channels=4, durations=[30.], seed=91)
+
+    datasets = {
+        "toy0": (rec0, gt_sorting0),
+        "toy1": (rec1, gt_sorting1),
     }
-    study_folder = 'a_study_folder'
-    study = GroundTruthStudy.create(study_folder, gt_dict)
 
-    # all sorters for all recordings in one function.
-    sorter_list = ['herdingspikes', 'tridesclous', ]
-    study.run_sorters(sorter_list, mode_if_folder_exists="keep")
+    # define some "cases" here we want to tests tridesclous2 on 2 datasets and spykingcircus on one dataset
+    # so it is a two level study (sorter_name, dataset)
+    # this could be more complicated like (sorter_name, dataset, params)
+    cases = {
+        ("tdc2", "toy0"): {
+            "label": "tridesclous2 on tetrode0",
+            "dataset": "toy0",
+            "run_sorter_params": {
+                "sorter_name": "tridesclous2",
+            },
+        },
+        ("tdc2", "toy1"): {
+            "label": "tridesclous2 on tetrode1",
+            "dataset": "toy1",
+            "run_sorter_params": {
+                "sorter_name": "tridesclous2",
+            },
+        },
+
+        ("sc", "toy0"): {
+            "label": "spykingcircus2 on tetrode0",
+            "dataset": "toy0",
+            "run_sorter_params": {
+                "sorter_name": "spykingcircus",
+                "docker_image": True
+            },
+        },
+    }
+    # this initilize a folder
+    study = GroundTruthStudy.create(study_folder, datasets=datasets, cases=cases,
+                                    levels=["sorter_name", "dataset"])
 
-    # You can re-run **run_study_sorters** as many times as you want.
-    # By default **mode='keep'** so only uncomputed sorters are re-run.
-    # For instance, just remove the "sorter_folders/rec1/herdingspikes" to re-run
-    # only one sorter on one recording.
-    #
-    # Then we copy the spike sorting outputs into a separate subfolder.
-    # This allow us to remove the "large" sorter_folders.
-    study.copy_sortings()
+
+    # all cases in one function
+    study.run_sorters()
 
     # Collect comparisons
     #  
@@ -306,11 +327,11 @@ binary (for recordings) and npz (for sortings).
     # Note: use exhaustive_gt=True when you know exactly how many
     # units in ground truth (for synthetic datasets)
 
+    # run all comparisons and loop over the results
     study.run_comparisons(exhaustive_gt=True)
-
-    for (rec_name, sorter_name), comp in study.comparisons.items():
+    for key, comp in study.comparisons.items():
         print('*' * 10)
-        print(rec_name, sorter_name)
+        print(key)
         # raw counting of tp/fp/...
         print(comp.count_score)
         # summary
@@ -323,26 +344,27 @@ binary (for recordings) and npz (for sortings).
 
     # Collect synthetic dataframes and display
     # As shown previously, the performance is returned as a pandas dataframe.
-    # The :py:func:`~spikeinterface.comparison.aggregate_performances_table()` function,
+    # The :py:func:`~spikeinterface.comparison.get_performance_by_unit()` function,
     # gathers all the outputs in the study folder and merges them in a single dataframe.
+    # Same idea for :py:func:`~spikeinterface.comparison.get_count_units()`
 
-    dataframes = study.aggregate_dataframes()
+    # this is a dataframe
+    perfs = study.get_performance_by_unit()
 
-    # Pandas dataframes can be nicely displayed as tables in the notebook.
-    print(dataframes.keys())
+    # this is a dataframe
+    unit_counts = study.get_count_units()
 
     # we can also access run times
-    print(dataframes['run_times'])
+    run_times = study.get_run_times()
+    print(run_times)
 
     # Easy plot with seaborn
-    run_times = dataframes['run_times']
     fig1, ax1 = plt.subplots()
     sns.barplot(data=run_times, x='rec_name', y='run_time', hue='sorter_name', ax=ax1)
     ax1.set_title('Run times')
 
     ##############################################################################
 
-    perfs = dataframes['perf_by_unit']
     fig2, ax2 = plt.subplots()
     sns.swarmplot(data=perfs, x='sorter_name', y='recall', hue='rec_name', ax=ax2)
     ax2.set_title('Recall')

diff --git a/doc/modules/core.rst b/doc/modules/core.rst
@@ -547,8 +547,7 @@ workflow.
 In order to do this, one can use the :code:`Numpy*` classes, :py:class:`~spikeinterface.core.NumpyRecording`,
 :py:class:`~spikeinterface.core.NumpySorting`, :py:class:`~spikeinterface.core.NumpyEvent`, and
 :py:class:`~spikeinterface.core.NumpySnippets`. These object behave exactly like normal SpikeInterface objects,
-but they are not bound to a file. This makes these objects *not dumpable*, so parallel processing is not supported.
-In order to make them *dumpable*, one can simply :code:`save()` them (see :ref:`save_load`).
+but they are not bound to a file.
 
 Also note the class :py:class:`~spikeinterface.core.SharedMemorySorting` which is very similar to
 Similar to :py:class:`~spikeinterface.core.NumpySorting` but with an unerlying SharedMemory which is usefull for

diff --git a/doc/modules/qualitymetrics.rst b/doc/modules/qualitymetrics.rst
@@ -25,9 +25,11 @@ For more details about each metric and it's availability and use within SpikeInt
   :glob:
 
   qualitymetrics/amplitude_cutoff
+  qualitymetrics/amplitude_cv
   qualitymetrics/amplitude_median
   qualitymetrics/d_prime
   qualitymetrics/drift
+  qualitymetrics/firing_range
   qualitymetrics/firing_rate
   qualitymetrics/isi_violations
   qualitymetrics/isolation_distance

diff --git a/doc/modules/qualitymetrics/amplitude_cutoff.rst b/doc/modules/qualitymetrics/amplitude_cutoff.rst
@@ -21,12 +21,12 @@ Example code
 
 .. code-block:: python
 
-	import spikeinterface.qualitymetrics as qm
+	import spikeinterface.qualitymetrics as sqm
 
 	# It is also recommended to run `compute_spike_amplitudes(wvf_extractor)`
 	# in order to use amplitudes from all spikes
-	fraction_missing = qm.compute_amplitude_cutoffs(wvf_extractor, peak_sign="neg")
-	# fraction_missing is a dict containing the units' IDs as keys,
+	fraction_missing = sqm.compute_amplitude_cutoffs(wvf_extractor, peak_sign="neg")
+	# fraction_missing is a dict containing the unit IDs as keys,
 	# and their estimated fraction of missing spikes as values.
 
 Reference

diff --git a/doc/modules/qualitymetrics/amplitude_cv.rst b/doc/modules/qualitymetrics/amplitude_cv.rst
@@ -0,0 +1,55 @@
+Amplitude CV (:code:`amplitude_cv_median`, :code:`amplitude_cv_range`)
+======================================================================
+
+
+Calculation
+-----------
+
+The amplitude CV (coefficient of variation) is a measure of the amplitude variability.
+It is computed as the ratio between the standard deviation and the amplitude mean.
+To obtain a better estimate of this measure, it is first computed separately for several temporal bins.
+Out of these values, the median and the range (percentile distance, by default between the
+5th and 95th percentiles) are computed.
+
+The computation requires either spike amplitudes (see :py:func:`~spikeinterface.postprocessing.compute_spike_amplitudes()`)
+or amplitude scalings (see :py:func:`~spikeinterface.postprocessing.compute_amplitude_scalings()`) to be pre-computed.
+
+
+Expectation and use
+-------------------
+
+The amplitude CV median is expected to be relatively low for well-isolated units, indicating a "stereotypical" spike shape.
+
+The amplitude CV range can be high in the presence of noise contamination, due to amplitude outliers like in
+the example below.
+
+.. image:: amplitudes.png
+    :width: 600
+
+
+Example code
+------------
+
+.. code-block:: python
+
+    import spikeinterface.qualitymetrics as sqm
+
+    # Make recording, sorting and wvf_extractor object for your data.
+	# It is required to run `compute_spike_amplitudes(wvf_extractor)` or
+	# `compute_amplitude_scalings(wvf_extractor)` (if missing, values will be NaN)
+    amplitude_cv_median, amplitude_cv_range = sqm.compute_amplitude_cv_metrics(wvf_extractor)
+    # amplitude_cv_median and  amplitude_cv_range are dicts containing the unit ids as keys,
+    # and their amplitude_cv metrics as values.
+
+
+
+References
+----------
+
+.. autofunction:: spikeinterface.qualitymetrics.misc_metrics.compute_amplitude_cv_metrics
+
+
+Literature
+----------
+
+Designed by Simon Musall and adapted to SpikeInterface by Alessio Buccino.
diff --git a/doc/modules/qualitymetrics/amplitude_median.rst b/doc/modules/qualitymetrics/amplitude_median.rst
@@ -20,12 +20,12 @@ Example code
 
 .. code-block:: python
 
-	import spikeinterface.qualitymetrics as qm
+	import spikeinterface.qualitymetrics as sqm
 
 	# It is also recommended to run `compute_spike_amplitudes(wvf_extractor)`
 	# in order to use amplitude values from all spikes.
-	amplitude_medians = qm.compute_amplitude_medians(wvf_extractor)
-	# amplitude_medians is a dict containing the units' IDs as keys,
+	amplitude_medians = sqm.compute_amplitude_medians(wvf_extractor)
+	# amplitude_medians is a dict containing the unit IDs as keys,
 	# and their estimated amplitude medians as values.
 
 Reference

diff --git a/doc/modules/qualitymetrics/amplitudes.png b/doc/modules/qualitymetrics/amplitudes.png
diff --git a/doc/modules/qualitymetrics/d_prime.rst b/doc/modules/qualitymetrics/d_prime.rst
@@ -32,9 +32,9 @@ Example code
 
 .. code-block:: python
 
-    import spikeinterface.qualitymetrics as qm
+    import spikeinterface.qualitymetrics as sqm
 
-    d_prime = qm.lda_metrics(all_pcs, all_labels, 0)
+    d_prime = sqm.lda_metrics(all_pcs, all_labels, 0)
 
 
 Reference