Develop (#703)

* change behavior qc to truncate trials up until 400 and check if performance passes * add test for sleepless decorator * Add get_trials_tasks function * data release update * change number of parallel workflows * Issue #701 --------- Co-authored-by: juhuntenburg <[email protected]> Co-authored-by: Florian Rau <[email protected]> Co-authored-by: Gaelle <[email protected]> Co-authored-by: GaelleChapuis <[email protected]> Co-authored-by: Mayo Faulkner <[email protected]>
int-brain-lab · Jan 4, 2024 · 9bda491 · 9bda491
1 parent 0e78b96
commit 9bda491
Show file tree

Hide file tree

Showing 14 changed files with 421 additions and 69 deletions.
diff --git a/.github/workflows/ibllib_ci.yml b/.github/workflows/ibllib_ci.yml
@@ -15,7 +15,7 @@ jobs:
     runs-on: ${{ matrix.os }}
     strategy:
       fail-fast: false  # Whether to stop execution of other instances
-      max-parallel: 4
+      max-parallel: 2
       matrix:
         os: ["windows-latest", "ubuntu-latest"]
         python-version: ["3.8", "3.11"]

diff --git a/examples/data_release/data_release_brainwidemap.ipynb b/examples/data_release/data_release_brainwidemap.ipynb
@@ -11,14 +11,14 @@
    "source": [
     "# Data Release - Brain Wide Map\n",
     "\n",
-    "IBL aims to understand the neural basis of decision-making in the mouse by gathering a whole-brain activity map composed of electrophysiological recordings pooled from multiple laboratories. We have systematically recorded from nearly all major brain areas with Neuropixels probes, using a grid system for unbiased sampling and replicating each recording site in at least two laboratories. These data have been used to construct a brain-wide map of activity at single-spike cellular resolution during a [decision-making task]((https://elifesciences.org/articles/63711)).  In addition to the map, this data set contains other information gathered during the task: sensory stimuli presented to the mouse; mouse decisions and response times; and mouse pose information from video recordings and DeepLabCut analysis. Please read our accompanying [technical paper](https://doi.org/10.6084/m9.figshare.21400815) for details on the experiment and data processing pipelines. To explore the data, visit [our vizualisation website](https://viz.internationalbrainlab.org/)."
+    "IBL aims to understand the neural basis of decision-making in the mouse by gathering a whole-brain activity map composed of electrophysiological recordings pooled from multiple laboratories. We have systematically recorded from nearly all major brain areas with Neuropixels probes, using a grid system for unbiased sampling and replicating each recording site in at least two laboratories. These data have been used to construct a brain-wide map of activity at single-spike cellular resolution during a [decision-making task]((https://elifesciences.org/articles/63711)). Please read the associated article [(IBL et al. 2023)](https://www.biorxiv.org/content/10.1101/2023.07.04.547681v2). In addition to the map, this data set contains other information gathered during the task: sensory stimuli presented to the mouse; mouse decisions and response times; and mouse pose information from video recordings and DeepLabCut analysis. Please read our accompanying [technical paper](https://doi.org/10.6084/m9.figshare.21400815) for details on the experiment and data processing pipelines. To explore the data, visit [our vizualisation website](https://viz.internationalbrainlab.org/)."
    ]
   },
   {
    "cell_type": "markdown",
    "source": [
     "## Overview of the Data\n",
-    "We have released data from 354 Neuropixel recording sessions, which encompass 547 probe insertions, obtained in 115 subjects performing the IBL task across 11 different laboratories. As output of spike-sorting, there are 295501 units; of which 32766 are considered to be of good quality. These units were recorded in overall 194 different brain regions.\n",
+    "We have released data from 459 Neuropixel recording sessions, which encompass 699 probe insertions, obtained in 139 subjects performing the IBL task across 12 different laboratories. As output of spike-sorting, there are 376730 units; of which 45085 are considered to be of good quality. In total, 138 brain regions were recorded in sufficient numbers for inclusion in IBL’s analyses [(IBL et al. 2023)](https://www.biorxiv.org/content/10.1101/2023.07.04.547681v2).\n",
     "\n",
     "## Data structure and download\n",
     "The organisation of the data follows the standard IBL data structure.\n",
@@ -31,7 +31,10 @@
     "\n",
     "Note:\n",
     "\n",
-    "* The tag associated to this release is `2022_Q4_IBL_et_al_BWM`"
+    "* The tag associated to this release is `Brainwidemap`\n",
+    "\n",
+    "## Receive updates on the data\n",
+    "To receive a notification that we released new datasets, please fill up [this form](https://forms.gle/9ex2vL1JwV4QXnf98)\n"
    ],
    "metadata": {
     "collapsed": false

diff --git a/ibllib/__init__.py b/ibllib/__init__.py
@@ -2,7 +2,7 @@
 import logging
 import warnings
 
-__version__ = '2.27.1'
+__version__ = '2.28'
 warnings.filterwarnings('always', category=DeprecationWarning, module='ibllib')
 
 # if this becomes a full-blown library we should let the logging configuration to the discretion of the dev

diff --git a/ibllib/io/extractors/base.py b/ibllib/io/extractors/base.py
@@ -1,4 +1,5 @@
 """Base Extractor classes.
+
 A module for the base Extractor classes.  The Extractor, given a session path, will extract the
 processed data from raw hardware files and optionally save them.
 """
@@ -10,7 +11,6 @@
 
 import numpy as np
 import pandas as pd
-from one.alf.files import get_session_path
 from ibllib.io import raw_data_loaders as raw
 from ibllib.io.raw_data_loaders import load_settings, _logger
 
@@ -162,7 +162,8 @@ def extract(self, bpod_trials=None, settings=None, **kwargs):
 
 def run_extractor_classes(classes, session_path=None, **kwargs):
     """
-    Run a set of extractors with the same inputs
+    Run a set of extractors with the same inputs.
+
     :param classes: list of Extractor class
     :param save: True/False
     :param path_out: (defaults to alf path)
@@ -195,12 +196,30 @@ def run_extractor_classes(classes, session_path=None, **kwargs):
 
 
 def _get_task_types_json_config():
+    """
+    Return the extractor types map.
+
+    This function is only used for legacy sessions, i.e. those without an experiment description
+    file and will be removed in favor of :func:`_get_task_extractor_map`, which directly returns
+    the Bpod extractor class name. The experiment description file cuts out the need for pipeline
+    name identifiers.
+
+    Returns
+    -------
+    Dict[str, str]
+        A map of task protocol to task extractor identifier, e.g. 'ephys', 'habituation', etc.
+
+    See Also
+    --------
+    _get_task_extractor_map - returns a map of task protocol to Bpod trials extractor class name.
+    """
     with open(Path(__file__).parent.joinpath('extractor_types.json')) as fp:
         task_types = json.load(fp)
     try:
         # look if there are custom extractor types in the personal projects repo
         import projects.base
         custom_extractors = Path(projects.base.__file__).parent.joinpath('extractor_types.json')
+        _logger.debug('Loading extractor types from %s', custom_extractors)
         with open(custom_extractors) as fp:
             custom_task_types = json.load(fp)
         task_types.update(custom_task_types)
@@ -210,8 +229,28 @@ def _get_task_types_json_config():
 
 
 def get_task_protocol(session_path, task_collection='raw_behavior_data'):
+    """
+    Return the task protocol name from task settings.
+
+    If the session path and/or task collection do not exist, the settings file is missing or
+    otherwise can not be parsed, or if the 'PYBPOD_PROTOCOL' key is absent, None is returned.
+    A warning is logged if the session path or settings file doesn't exist. An error is logged if
+    the settings file can not be parsed.
+
+    Parameters
+    ----------
+    session_path : str, pathlib.Path
+        The absolute session path.
+    task_collection : str
+        The session path directory containing the task settings file.
+
+    Returns
+    -------
+    str or None
+        The Pybpod task protocol name or None if not found.
+    """
     try:
-        settings = load_settings(get_session_path(session_path), task_collection=task_collection)
+        settings = load_settings(session_path, task_collection=task_collection)
     except json.decoder.JSONDecodeError:
         _logger.error(f'Can\'t read settings for {session_path}')
         return
@@ -223,11 +262,26 @@ def get_task_protocol(session_path, task_collection='raw_behavior_data'):
 
 def get_task_extractor_type(task_name):
     """
-    Returns the task type string from the full pybpod task name:
-    _iblrig_tasks_biasedChoiceWorld3.7.0 returns "biased"
-    _iblrig_tasks_trainingChoiceWorld3.6.0 returns "training'
-    :param task_name:
-    :return: one of ['biased', 'habituation', 'training', 'ephys', 'mock_ephys', 'sync_ephys']
+    Returns the task type string from the full pybpod task name.
+
+    Parameters
+    ----------
+    task_name : str
+        The complete task protocol name from the PYBPOD_PROTOCOL field of the task settings.
+
+    Returns
+    -------
+    str
+        The extractor type identifier. Examples include 'biased', 'habituation', 'training',
+        'ephys', 'mock_ephys' and 'sync_ephys'.
+
+    Examples
+    --------
+    >>> get_task_extractor_type('_iblrig_tasks_biasedChoiceWorld3.7.0')
+    'biased'
+
+    >>> get_task_extractor_type('_iblrig_tasks_trainingChoiceWorld3.6.0')
+    'training'
     """
     if isinstance(task_name, Path):
         task_name = get_task_protocol(task_name)
@@ -245,16 +299,30 @@ def get_task_extractor_type(task_name):
 
 def get_session_extractor_type(session_path, task_collection='raw_behavior_data'):
     """
-    From a session path, loads the settings file, finds the task and checks if extractors exist
-    task names examples:
-    :param session_path:
-    :return: bool
+    Infer trials extractor type from task settings.
+
+    From a session path, loads the settings file, finds the task and checks if extractors exist.
+    Examples include 'biased', 'habituation', 'training', 'ephys', 'mock_ephys', and 'sync_ephys'.
+    Note this should only be used for legacy sessions, i.e. those without an experiment description
+    file.
+
+    Parameters
+    ----------
+    session_path : str, pathlib.Path
+        The session path for which to determine the pipeline.
+    task_collection : str
+        The session path directory containing the raw task data.
+
+    Returns
+    -------
+    str or False
+        The task extractor type, e.g. 'biased', 'habituation', 'ephys', or False if unknown.
     """
-    settings = load_settings(session_path, task_collection=task_collection)
-    if settings is None:
-        _logger.error(f'ABORT: No data found in "{task_collection}" folder {session_path}')
+    task_protocol = get_task_protocol(session_path, task_collection=task_collection)
+    if task_protocol is None:
+        _logger.error(f'ABORT: No task protocol found in "{task_collection}" folder {session_path}')
         return False
-    extractor_type = get_task_extractor_type(settings['PYBPOD_PROTOCOL'])
+    extractor_type = get_task_extractor_type(task_protocol)
     if extractor_type:
         return extractor_type
     else:
@@ -263,28 +331,52 @@ def get_session_extractor_type(session_path, task_collection='raw_behavior_data'
 
 def get_pipeline(session_path, task_collection='raw_behavior_data'):
     """
-    Get the pre-processing pipeline name from a session path
-    :param session_path:
-    :return:
+    Get the pre-processing pipeline name from a session path.
+
+    Note this is only suitable for legacy sessions, i.e. those without an experiment description
+    file. This function will be removed in the future.
+
+    Parameters
+    ----------
+    session_path : str, pathlib.Path
+        The session path for which to determine the pipeline.
+    task_collection : str
+        The session path directory containing the raw task data.
+
+    Returns
+    -------
+    str
+        The pipeline name inferred from the extractor type, e.g. 'ephys', 'training', 'widefield'.
     """
     stype = get_session_extractor_type(session_path, task_collection=task_collection)
     return _get_pipeline_from_task_type(stype)
 
 
 def _get_pipeline_from_task_type(stype):
     """
-    Returns the pipeline from the task type. Some tasks types directly define the pipeline
-    :param stype: session_type or task extractor type
-    :return:
+    Return the pipeline from the task type.
+
+    Some task types directly define the pipeline. Note this is only suitable for legacy sessions,
+    i.e. those without an experiment description file. This function will be removed in the future.
+
+    Parameters
+    ----------
+    stype : str
+        The session type or task extractor type, e.g. 'habituation', 'ephys', etc.
+
+    Returns
+    -------
+    str
+        A task pipeline identifier.
     """
     if stype in ['ephys_biased_opto', 'ephys', 'ephys_training', 'mock_ephys', 'sync_ephys']:
         return 'ephys'
     elif stype in ['habituation', 'training', 'biased', 'biased_opto']:
         return 'training'
-    elif 'widefield' in stype:
+    elif isinstance(stype, str) and 'widefield' in stype:
         return 'widefield'
     else:
-        return stype
+        return stype or ''
 
 
 def _get_task_extractor_map():
@@ -293,7 +385,7 @@ def _get_task_extractor_map():
 
     Returns
     -------
-    dict(str, str)
+    Dict[str, str]
         A map of task protocol to Bpod trials extractor class.
     """
     FILENAME = 'task_extractor_map.json'
@@ -315,34 +407,35 @@ def get_bpod_extractor_class(session_path, task_collection='raw_behavior_data'):
     """
     Get the Bpod trials extractor class associated with a given Bpod session.
 
+    Note that unlike :func:`get_session_extractor_type`, this function maps directly to the Bpod
+    trials extractor class name. This is hardware invariant and is purly to determine the Bpod only
+    trials extractor.
+
     Parameters
     ----------
     session_path : str, pathlib.Path
         The session path containing Bpod behaviour data.
     task_collection : str
-        The session_path subfolder containing the Bpod settings file.
+        The session_path sub-folder containing the Bpod settings file.
 
     Returns
     -------
     str
         The extractor class name.
     """
-    # Attempt to load settings files
-    settings = load_settings(session_path, task_collection=task_collection)
-    if settings is None:
-        raise ValueError(f'No data found in "{task_collection}" folder {session_path}')
-    # Attempt to get task protocol
-    protocol = settings.get('PYBPOD_PROTOCOL')
+    # Attempt to get protocol name from settings file
+    protocol = get_task_protocol(session_path, task_collection=task_collection)
     if not protocol:
-        raise ValueError(f'No task protocol found in {session_path/task_collection}')
+        raise ValueError(f'No task protocol found in {Path(session_path) / task_collection}')
     return protocol2extractor(protocol)
 
 
 def protocol2extractor(protocol):
     """
     Get the Bpod trials extractor class associated with a given Bpod task protocol.
 
-    The Bpod task protocol can be found in the 'PYBPOD_PROTOCOL' field of _iblrig_taskSettings.raw.json.
+    The Bpod task protocol can be found in the 'PYBPOD_PROTOCOL' field of the
+    _iblrig_taskSettings.raw.json file.
 
     Parameters
     ----------

diff --git a/ibllib/io/extractors/camera.py b/ibllib/io/extractors/camera.py
@@ -1,4 +1,5 @@
 """ Camera extractor functions.
+
 This module handles extraction of camera timestamps for both Bpod and DAQ.
 """
 import logging
@@ -29,7 +30,7 @@
 
 def extract_camera_sync(sync, chmap=None):
     """
-    Extract camera timestamps from the sync matrix
+    Extract camera timestamps from the sync matrix.
 
     :param sync: dictionary 'times', 'polarities' of fronts detected on sync trace
     :param chmap: dictionary containing channel indices. Default to constant.
@@ -45,7 +46,8 @@ def extract_camera_sync(sync, chmap=None):
 
 def get_video_length(video_path):
     """
-    Returns video length
+    Returns video length.
+
     :param video_path: A path to the video
     :return:
     """
@@ -58,9 +60,7 @@ def get_video_length(video_path):
 
 
 class CameraTimestampsFPGA(BaseExtractor):
-    """
-    Extractor for videos using DAQ sync and channel map.
-    """
+    """Extractor for videos using DAQ sync and channel map."""
 
     def __init__(self, label, session_path=None):
         super().__init__(session_path)

diff --git a/ibllib/io/extractors/ephys_fpga.py b/ibllib/io/extractors/ephys_fpga.py
@@ -1483,7 +1483,7 @@ def extract_all(session_path, sync_collection='raw_ephys_data', save=True, save_
     # Sync Bpod trials to FPGA
     sync, chmap = get_sync_and_chn_map(session_path, sync_collection)
     # sync, chmap = get_main_probe_sync(session_path, bin_exists=bin_exists)
-    trials = FpgaTrials(session_path, bpod_trials=bpod_trials | bpod_wheel)
+    trials = FpgaTrials(session_path, bpod_trials={**bpod_trials, **bpod_wheel})  # py3.9 -> |
     outputs, files = trials.extract(
         save=save, sync=sync, chmap=chmap, path_out=save_path,
         task_collection=task_collection, protocol_number=protocol_number, **kwargs)

diff --git a/ibllib/pipes/behavior_tasks.py b/ibllib/pipes/behavior_tasks.py
@@ -374,17 +374,26 @@ def signature(self):
         }
         return signature
 
-    def _behaviour_criterion(self, update=True):
+    def _behaviour_criterion(self, update=True, truncate_to_pass=True):
         """
         Computes and update the behaviour criterion on Alyx
         """
         from brainbox.behavior import training
 
-        trials = alfio.load_object(self.session_path.joinpath(self.output_collection), 'trials')
+        trials = alfio.load_object(self.session_path.joinpath(self.output_collection), 'trials').to_df()
         good_enough = training.criterion_delay(
-            n_trials=trials["intervals"].shape[0],
+            n_trials=trials.shape[0],
             perf_easy=training.compute_performance_easy(trials),
         )
+        if truncate_to_pass and not good_enough:
+            n_trials = trials.shape[0]
+            while not good_enough and n_trials > 400:
+                n_trials -= 1
+                good_enough = training.criterion_delay(
+                    n_trials=n_trials,
+                    perf_easy=training.compute_performance_easy(trials[:n_trials]),
+                )
+
         if update:
             eid = self.one.path2eid(self.session_path, query_type='remote')
             self.one.alyx.json_field_update(