Merge branch 'main' into curation_debug

SpikeInterface · Jan 14, 2025 · 23095cf · 23095cf
2 parents cead752 + 142cff8
commit 23095cf
Show file tree

Hide file tree

Showing 31 changed files with 2,697 additions and 8 deletions.
diff --git a/doc/api.rst b/doc/api.rst
@@ -346,6 +346,9 @@ spikeinterface.curation
     .. autofunction:: remove_redundant_units
     .. autofunction:: remove_duplicated_spikes
     .. autofunction:: remove_excess_spikes
+    .. autofunction:: load_model
+    .. autofunction:: auto_label_units
+    .. autofunction:: train_model
 
 Deprecated
 ~~~~~~~~~~

diff --git a/doc/conf.py b/doc/conf.py
@@ -125,6 +125,7 @@
     'subsection_order': ExplicitOrder([
                                        '../examples/tutorials/core',
                                        '../examples/tutorials/extractors',
+                                       '../examples/tutorials/curation',
                                        '../examples/tutorials/qualitymetrics',
                                        '../examples/tutorials/comparison',
                                        '../examples/tutorials/widgets',

diff --git a/doc/how_to/auto_curation_prediction.rst b/doc/how_to/auto_curation_prediction.rst
@@ -0,0 +1,43 @@
+How to use a trained model to predict the curation labels
+=========================================================
+
+For a more detailed guide to using trained models, `read our tutorial here
+<https://spikeinterface.readthedocs.io/en/latest/tutorials/curation/plot_1_automated_curation.html>`_).
+
+There is a Collection of models for automated curation available on the
+`SpikeInterface HuggingFace page <https://huggingface.co/SpikeInterface>`_.
+
+We'll apply the model ``toy_tetrode_model`` from ``SpikeInterface`` on a SortingAnalyzer
+called ``sorting_analyzer``. We assume that the quality and template metrics have
+already been computed.
+
+We need to pass the ``sorting_analyzer``, the ``repo_id`` (which is just the part of the
+repo's URL after huggingface.co/) and that we trust the model.
+
+.. code::
+
+    from spikeinterface.curation import auto_label_units
+
+    labels_and_probabilities = auto_label_units(
+        sorting_analyzer = sorting_analyzer,
+        repo_id = "SpikeInterface/toy_tetrode_model",
+        trust_model = True
+    )
+
+If you have a local directory containing the model in a ``skops`` file you can use this to
+create the labels:
+
+.. code::
+
+    labels_and_probabilities = si.auto_label_units(
+        sorting_analyzer = sorting_analyzer,
+        model_folder = "my_folder_with_a_model_in_it",
+    )
+
+The returned labels are a dictionary of model's predictions and it's confidence. These
+are also saved as a property of your ``sorting_analyzer`` and can be accessed like so:
+
+.. code::
+
+    labels = sorting_analyzer.sorting.get_property("classifier_label")
+    probabilities = sorting_analyzer.sorting.get_property("classifier_probability")
diff --git a/doc/how_to/auto_curation_training.rst b/doc/how_to/auto_curation_training.rst
@@ -0,0 +1,58 @@
+How to train a model to predict curation labels
+===============================================
+
+A full tutorial for model-based curation can be found `here <https://spikeinterface.readthedocs.io/en/latest/tutorials/curation/plot_2_train_a_model.html>`_.
+
+Here, we assume that you have:
+
+* Two SortingAnalyzers called ``analyzer_1`` and
+  ``analyzer_2``, and have calculated some template and quality metrics for both
+* Manually curated labels for the units in each analyzer, in lists called
+  ``analyzer_1_labels`` and ``analyzer_2_labels``. If you have used phy, the lists can
+  be accessed using ``curated_labels = analyzer.sorting.get_property("quality")``.
+
+With these objects calculated, you can train a model as follows
+
+.. code::
+
+    from spikeinterface.curation import train_model
+
+    analyzer_list = [analyzer_1, analyzer_2]
+    labels_list = [analyzer_1_labels, analyzer_2_labels]
+    output_folder = "/path/to/output_folder"
+
+    trainer = train_model(
+        mode="analyzers",
+        labels=labels_list,
+        analyzers=analyzer_list,
+        output_folder=output_folder,
+        metric_names=None, # Set if you want to use a subset of metrics, defaults to all calculated quality and template metrics
+        imputation_strategies=None, # Default is all available imputation strategies
+        scaling_techniques=None, # Default is all available scaling techniques
+        classifiers=None, # Defaults to Random Forest classifier only - we usually find this gives the best results, but a range of classifiers is available
+        seed=None, # Set a seed for reproducibility
+    )
+
+
+The trainer tries several models and chooses the most accurate one. This model and
+some metadata are stored in the ``output_folder``, which can later be loaded using the
+``load_model`` function (`more details <https://spikeinterface.readthedocs.io/en/latest/tutorials/curation/plot_1_automated_curation.html#download-a-pretrained-model>`_).
+We can also access the model, which is an sklearn ``Pipeline``, from the trainer object
+
+.. code::
+
+    best_model = trainer.best_pipeline
+
+
+The training function can also be run in “csv” mode, if you prefer to
+store metrics in as .csv files. If the target labels are stored as a column in
+the file, you can point to these with the ``target_label`` parameter
+
+.. code::
+
+    trainer = train_model(
+        mode="csv",
+        metrics_paths = ["/path/to/csv_file_1", "/path/to/csv_file_2"],
+        target_label = "my_label",
+        output_folder=output_folder,
+    )
diff --git a/doc/how_to/index.rst b/doc/how_to/index.rst
@@ -15,3 +15,5 @@ Guides on how to solve specific, short problems in SpikeInterface. Learn how to.
     load_your_data_into_sorting
     benchmark_with_hybrid_recordings
     drift_with_lfp
+    auto_curation_training
+    auto_curation_prediction
diff --git a/doc/images/files_screen.png b/doc/images/files_screen.png
diff --git a/doc/images/hf-logo.svg b/doc/images/hf-logo.svg
diff --git a/doc/images/initial_model_screen.png b/doc/images/initial_model_screen.png
diff --git a/doc/tutorials_custom_index.rst b/doc/tutorials_custom_index.rst
@@ -119,8 +119,8 @@ The :code:`spikeinterface.qualitymetrics` module allows users to compute various
 
    .. grid-item-card:: Quality Metrics
       :link-type: ref
-      :link: sphx_glr_tutorials_qualitymetrics_plot_3_quality_mertics.py
-      :img-top: /tutorials/qualitymetrics/images/thumb/sphx_glr_plot_3_quality_mertics_thumb.png
+      :link: sphx_glr_tutorials_qualitymetrics_plot_3_quality_metrics.py
+      :img-top: /tutorials/qualitymetrics/images/thumb/sphx_glr_plot_3_quality_metrics_thumb.png
       :img-alt: Quality Metrics
       :class-card: gallery-card
       :text-align: center
@@ -133,6 +133,39 @@ The :code:`spikeinterface.qualitymetrics` module allows users to compute various
       :class-card: gallery-card
       :text-align: center
 
+Automated curation tutorials
+----------------------------
+
+Learn how to curate your units using a trained machine learning model. Or how to create
+and share your own model.
+
+.. grid:: 1 2 2 3
+   :gutter: 2
+
+   .. grid-item-card:: Model-based curation
+      :link-type: ref
+      :link: sphx_glr_tutorials_curation_plot_1_automated_curation.py
+      :img-top: /tutorials/curation/images/sphx_glr_plot_1_automated_curation_002.png
+      :img-alt: Model-based curation
+      :class-card: gallery-card
+      :text-align: center
+
+   .. grid-item-card:: Train your own model
+      :link-type: ref
+      :link: sphx_glr_tutorials_curation_plot_2_train_a_model.py
+      :img-top: /tutorials/curation/images/thumb/sphx_glr_plot_2_train_a_model_thumb.png
+      :img-alt: Train your own model
+      :class-card: gallery-card
+      :text-align: center
+
+   .. grid-item-card:: Upload your model to HuggingFaceHub
+      :link-type: ref
+      :link: sphx_glr_tutorials_curation_plot_3_upload_a_model.py
+      :img-top: /images/hf-logo.svg
+      :img-alt: Upload your model
+      :class-card: gallery-card
+      :text-align: center
+
 Comparison tutorial
 -------------------
 

diff --git a/examples/tutorials/curation/README.rst b/examples/tutorials/curation/README.rst
@@ -0,0 +1,5 @@
+Curation tutorials
+------------------
+
+Learn how to use models to automatically curated your sorted data, or generate models
+based on your own curation.