Nick's changes.

catalystneuro · Dec 18, 2023 · c030943 · c030943
1 parent 3557448
commit c030943
Show file tree

Hide file tree

Showing 18 changed files with 1,135 additions and 669 deletions.
diff --git a/README.md b/README.md
@@ -40,27 +40,22 @@ Each conversion is organized in a directory of its own in the `src` directory:
     └── src
         ├── jazayeri_lab_to_nwb
         │   ├── watters
-        │       ├── wattersbehaviorinterface.py
-        │       ├── watters_convert_session.py
-        │       ├── watters_metadata.yml
-        │       ├── wattersnwbconverter.py
-        │       ├── watters_requirements.txt
-        │       ├── watters_notes.md
-
+        │       ├── behavior_interface.py
+        │       ├── main_convert_session.py
+        │       ├── metadata.yml
+        │       ├── nwb_converter.py
+        │       ├── requirements.txt
         │       └── __init__.py
-
         │   └── another_conversion
-
         └── __init__.py
 
  For example, for the conversion `watters` you can find a directory located in `src/jazayeri-lab-to-nwb/watters`. Inside each conversion directory you can find the following files:
 
-* `watters_convert_sesion.py`: this script defines the function to convert one full session of the conversion.
-* `watters_requirements.txt`: dependencies specific to this conversion.
-* `watters_metadata.yml`: metadata in yaml format for this specific conversion.
-* `wattersbehaviorinterface.py`: the behavior interface. Usually ad-hoc for each conversion.
-* `wattersnwbconverter.py`: the place where the `NWBConverter` class is defined.
-* `watters_notes.md`: notes and comments concerning this specific conversion.
+* `main_convert_sesion.py`: this script defines the function to convert one full session of the conversion.
+* `requirements.txt`: dependencies specific to this conversion.
+* `metadata.yml`: metadata in yaml format for this specific conversion.
+* `behavior_interface.py`: the behavior interface. Usually ad-hoc for each conversion.
+* `nwb_converter.py`: the place where the `NWBConverter` class is defined.
 
 The directory might contain other files that are necessary for the conversion but those are the central ones.
 
@@ -73,15 +68,16 @@ pip install -r src/jazayeri_lab_to_nwb/watters/watters_requirements.txt
 
 You can run a specific conversion with the following command:
 ```
-python src/jazayeri_lab_to_nwb/watters/watters_convert_session.py
+python src/jazayeri_lab_to_nwb/watters/main_convert_session.py $SUBJECT $SESSION
 ```
 
 ### Watters working memory task data
-The conversion function for this experiment, `session_to_nwb`, is found in `src/watters/watters_convert_session.py`. The function takes three arguments:
-* `data_dir_path` points to the root directory for the data for a given session.
-* `output_dir_path` points to where the converted data should be saved.
+The conversion function for this experiment, `session_to_nwb`, is found in `src/watters/main_convert_session.py`. The function takes arguments:
+* `subject` subject name, either `'Perle'` or `'Elgar'`.
+* `session` session date in format `'YYYY-MM-DD'`.
 * `stub_test` indicates whether only a small portion of the data should be saved (mainly used by us for testing purposes).
-* `overwrite` indicates whether existing NWB files at the auto-generated output file paths should be overwritten.
+* `overwrite` indicates whether to overwrite nwb output files.
+* `dandiset_id` optional dandiset ID.
 
 The function can be imported in a separate script with and run, or you can run the file directly and specify the arguments in the `if name == "__main__"` block at the bottom.
 
@@ -111,8 +107,8 @@ The function expects the raw data in `data_dir_path` to follow this structure:
         └── spikeglx
     ...
 
-The conversion will try to automatically fetch metadata from the provided data directory. However, some information, such as the subject's name and age, must be specified by the user in the file `src/jazayeri_lab_to_nwb/watters/watters_metadata.yaml`. If any of the automatically fetched metadata is incorrect, it can also be overriden from this file.
+The conversion will try to automatically fetch metadata from the provided data directory. However, some information, such as the subject's name and age, must be specified by the user in the file `src/jazayeri_lab_to_nwb/watters/metadata.yaml`. If any of the automatically fetched metadata is incorrect, it can also be overriden from this file.
 
 The converted data will be saved in two files, one called `{session_id}_raw.nwb`, which contains the raw electrophysiology data from the Neuropixels and V-Probes, and one called `{session_id}_processed.nwb` with behavioral data, trial info, and sorted unit spiking.
 
-If you run into memory issues when writing the `{session_id}_raw.nwb` files, you may want to set `buffer_gb` to a value smaller than 1 (its default) in the `conversion_options` dicts for the recording interfaces, i.e. [here](https://github.com/catalystneuro/jazayeri-lab-to-nwb/blob/vprobe_dev/src/jazayeri_lab_to_nwb/watters/watters_convert_session.py#L49) and [here](https://github.com/catalystneuro/jazayeri-lab-to-nwb/blob/vprobe_dev/src/jazayeri_lab_to_nwb/watters/watters_convert_session.py#L71).
+If you run into memory issues when writing the `{session_id}_raw.nwb` files, you may want to set `buffer_gb` to a value smaller than 1 (its default) in the `conversion_options` dicts for the recording interfaces, i.e. [here](https://github.com/catalystneuro/jazayeri-lab-to-nwb/blob/vprobe_dev/src/jazayeri_lab_to_nwb/watters/main_convert_session.py#L189).
diff --git a/requirements.txt b/requirements.txt
@@ -1,5 +1,6 @@
-neuroconv==0.4.4
-spikeinterface==0.98.2
-nwbwidgets
-nwbinspector
-pre-commit
+neuroconv==0.4.6
+spikeinterface==0.99.1
+nwbwidgets==0.11.3
+nwbinspector==0.4.31
+pre-commit==3.6.0
+ndx-events==0.2.0
diff --git a/src/jazayeri_lab_to_nwb/watters/README.md b/src/jazayeri_lab_to_nwb/watters/README.md
@@ -0,0 +1,56 @@
+# Watters data conversion pipeline
+NWB conversion scripts for Watters data to the [Neurodata Without Borders](https://nwb-overview.readthedocs.io/) data format.
+
+
+## Usage
+To run a specific conversion, you might need to install first some conversion specific dependencies that are located in each conversion directory:
+```
+pip install -r src/jazayeri_lab_to_nwb/watters/watters_requirements.txt
+```
+
+You can run a specific conversion with the following command:
+```
+python src/jazayeri_lab_to_nwb/watters/main_convert_session.py $SUBJECT $SESSION
+```
+
+### Watters working memory task data
+The conversion function for this experiment, `session_to_nwb`, is found in `src/watters/main_convert_session.py`. The function takes arguments:
+* `subject` subject name, either `'Perle'` or `'Elgar'`.
+* `session` session date in format `'YYYY-MM-DD'`.
+* `stub_test` indicates whether only a small portion of the data should be saved (mainly used by us for testing purposes).
+* `overwrite` indicates whether to overwrite nwb output files.
+* `dandiset_id` optional dandiset ID.
+
+The function can be imported in a separate script with and run, or you can run the file directly and specify the arguments in the `if name == "__main__"` block at the bottom.
+
+The function expects the raw data in `data_dir_path` to follow this structure:
+
+    data_dir_path/
+    ├── data_open_source
+    │   ├── behavior
+    │   │   └── eye.h.times.npy, etc.
+    │   ├── task
+    │       └── trials.start_times.json, etc.
+    │   └── probes.metadata.json
+    ├── raw_data
+    │   ├── spikeglx
+    │       └── */*/*.ap.bin, */*/*.lf.bin, etc.
+    │   ├── v_probe_0
+    │       └── raw_data.dat
+    │   └── v_probe_{n}
+    │       └── raw_data.dat
+    ├── spike_sorting_raw
+    │   ├── np
+    │   ├── vp_0
+    │   └── vp_{n}
+    ├── sync_pulses
+        ├── mworks
+        ├── open_ephys
+        └── spikeglx
+    ...
+
+The conversion will try to automatically fetch metadata from the provided data directory. However, some information, such as the subject's name and age, must be specified by the user in the file `src/jazayeri_lab_to_nwb/watters/metadata.yaml`. If any of the automatically fetched metadata is incorrect, it can also be overriden from this file.
+
+The converted data will be saved in two files, one called `{session_id}_raw.nwb`, which contains the raw electrophysiology data from the Neuropixels and V-Probes, and one called `{session_id}_processed.nwb` with behavioral data, trial info, and sorted unit spiking.
+
+If you run into memory issues when writing the `{session_id}_raw.nwb` files, you may want to set `buffer_gb` to a value smaller than 1 (its default) in the `conversion_options` dicts for the recording interfaces, i.e. [here](https://github.com/catalystneuro/jazayeri-lab-to-nwb/blob/vprobe_dev/src/jazayeri_lab_to_nwb/watters/main_convert_session.py#L189).
diff --git a/src/jazayeri_lab_to_nwb/watters/__init__.py b/src/jazayeri_lab_to_nwb/watters/__init__.py
@@ -1,4 +1,4 @@
-from .wattersbehaviorinterface import WattersEyePositionInterface, WattersPupilSizeInterface
-from .watterstrialsinterface import WattersTrialsInterface
-from .wattersrecordinginterface import WattersDatRecordingInterface
-from .wattersnwbconverter import WattersNWBConverter
+from .behavior_interface import EyePositionInterface, PupilSizeInterface
+from .trials_interface import TrialsInterface
+from .recording_interface import DatRecordingInterface
+from .nwb_converter import NWBConverter
diff --git a/src/jazayeri_lab_to_nwb/watters/display_interface.py b/src/jazayeri_lab_to_nwb/watters/display_interface.py
@@ -0,0 +1,98 @@
+"""Class for converting data about display frames."""
+
+import itertools
+import json
+from pathlib import Path
+from typing import Optional
+
+import numpy as np
+import pandas as pd
+from neuroconv.datainterfaces.text.timeintervalsinterface import TimeIntervalsInterface
+from neuroconv.utils import DeepDict, FilePathType, FolderPathType
+from pynwb import NWBFile
+
+
+class DisplayInterface(TimeIntervalsInterface):
+    """Class for converting data about display frames.
+    
+    All events that occur exactly once per display update are contained in this
+    interface.
+    """
+
+    KEY_MAP = {
+        'frame_object_positions': 'object_positions',
+        'frame_fixation_cross_scale': 'fixation_cross_scale',
+        'frame_closed_loop_gaze_position': 'closed_loop_eye_position',
+        'frame_task_phase': 'task_phase',
+        'frame_display_times': 'start_time',
+    }
+
+    def __init__(self, folder_path: FolderPathType, verbose: bool = True):
+        super().__init__(file_path=folder_path, verbose=verbose)
+
+    def get_metadata(self) -> dict:
+        metadata = super().get_metadata()
+        metadata['TimeIntervals'] = dict(
+            display=dict(
+                table_name='display',
+                table_description='data about each displayed frame',
+            )
+        )
+        return metadata
+
+    def get_timestamps(self) -> np.ndarray:
+        return super(DisplayInterface, self).get_timestamps(column='start_time')
+
+    def set_aligned_starting_time(self, aligned_starting_time: float) -> None:
+        self.dataframe.start_time += aligned_starting_time
+
+    def _read_file(self, file_path: FolderPathType):
+        # Create dataframe with data for each frame
+        trials = json.load(open(Path(file_path) / 'trials.json', 'r'))
+        frames = {
+            k_mapped: list(itertools.chain(*[d[k] for d in trials]))
+            for k, k_mapped in DisplayInterface.KEY_MAP.items()
+        }
+
+        # Serialize object_positions data for hdf5 conversion to work
+        frames['object_positions'] = [
+            json.dumps(x) for x in frames['object_positions']
+        ]
+
+        return pd.DataFrame(frames)
+
+    def add_to_nwbfile(self,
+                       nwbfile: NWBFile,
+                       metadata: Optional[dict] = None,
+                       tag: str = 'display'):
+        return super(DisplayInterface, self).add_to_nwbfile(
+            nwbfile=nwbfile,
+            metadata=metadata,
+            tag=tag,
+            column_descriptions=self.column_descriptions,
+        )
+
+    @property
+    def column_descriptions(self):
+        column_descriptions = {
+            'object_positions': (
+                'For each frame, a serialized list with one element for each '
+                'object. Each element is an (x, y) position of the '
+                'corresponding object, in coordinates of arena width.'
+            ),
+            'fixation_cross_scale': (
+                'For each frame, the scale of the central fixation cross. '
+                'Fixation cross scale grows as the eye position deviates from '
+                'the center of the fixation cross, to provide a cue to '
+                'maintain good fixation.'
+            ),
+            'closed_loop_eye_position': (
+                'For each frame, the eye position in the close-loop task '
+                'engine. This was used to for real-time eye position '
+                'computations, such as saccade detection and reward delivery.'
+            ),
+            'task_phase': 'The phase of the task for each frame.',
+            'start_time': 'Time of display update for each frame.',
+        }
+
+        return column_descriptions
diff --git a/src/jazayeri_lab_to_nwb/watters/get_session_paths.py b/src/jazayeri_lab_to_nwb/watters/get_session_paths.py
@@ -0,0 +1,131 @@
+"""Function for getting paths to data on openmind."""
+
+import collections
+import pathlib
+
+SUBJECT_NAME_TO_ID = {
+    'Perle': 'monkey0',
+    'Elgar': 'monkey1',
+}
+
+SessionPaths = collections.namedtuple(
+    'SessionPaths',
+    [
+        'output',
+        'raw_data',
+        'data_open_source',
+        'task_behavior_data',
+        'sync_pulses',
+        'spike_sorting_raw',
+    ],
+)
+
+
+def _get_session_paths_openmind(subject, session, stub_test=False):
+    """Get paths to all components of the data on openmind."""
+    subject_id = SUBJECT_NAME_TO_ID[subject]
+
+    # Path to write output nwb files to
+    output_path = (
+        f'/om/user/nwatters/nwb_data_multi_prediction/{subject}/{session}'
+    )
+    if stub_test:
+        output_path = f'{output_path}/stub'
+
+    # Path to the raw data. This is used for reading raw physiology data.
+    raw_data_path = (
+        f'/om4/group/jazlab/nwatters/multi_prediction/phys_data/{subject}/'
+        f'{session}/raw_data'
+    )
+
+    # Path to task and behavior data.
+    task_behavior_data_path = (
+        '/om4/group/jazlab/nwatters/multi_prediction/datasets/data_nwb_trials/'
+        f'{subject}/{session}'
+    )
+
+    # Path to open-source data. This is used for reading behavior and task data.
+    data_open_source_path = (
+        '/om4/group/jazlab/nwatters/multi_prediction/datasets/data_open_source/'
+        f'Subjects/{subject_id}/{session}/001'
+    )
+
+    # Path to sync pulses. This is used for reading timescale transformations
+    # between physiology and mworks data streams.
+    sync_pulses_path = (
+        '/om4/group/jazlab/nwatters/multi_prediction/data_processed/'
+        f'{subject}/{session}/sync_pulses'
+    )
+
+    # Path to spike sorting. This is used for reading spike sorted data.
+    spike_sorting_raw_path = (
+        f'/om4/group/jazlab/nwatters/multi_prediction/phys_data/{subject}/'
+        f'{session}/spike_sorting'
+    )
+
+    session_paths = SessionPaths(
+        output=pathlib.Path(output_path),
+        raw_data=pathlib.Path(raw_data_path),
+        data_open_source=pathlib.Path(data_open_source_path),
+        task_behavior_data=pathlib.Path(task_behavior_data_path),
+        sync_pulses=pathlib.Path(sync_pulses_path),
+        spike_sorting_raw=pathlib.Path(spike_sorting_raw_path),
+    )
+
+    return session_paths
+
+
+def _get_session_paths_globus(subject, session, stub_test=False):
+    """Get paths to all components of the data in the globus repo."""
+    subject_id = SUBJECT_NAME_TO_ID[subject]
+    base_data_dir = f'/shared/catalystneuro/JazLab/{subject_id}/{session}/'
+
+    # Path to write output nwb files to
+    output_path = (
+        f'~/conversion_nwb/jazayeri-lab-to-nwb/{subject}/{session}'
+    )
+    if stub_test:
+        output_path = f'{output_path}/stub'
+
+    # Path to the raw data. This is used for reading raw physiology data.
+    raw_data_path = f'{base_data_dir}/raw_data'
+
+    # Path to task and behavior data.
+    task_behavior_data_path = f'{base_data_dir}/processed_task_data'
+
+    # Path to open-source data. This is used for reading behavior and task data.
+    data_open_source_path = f'{base_data_dir}/data_open_source'
+
+    # Path to sync pulses. This is used for reading timescale transformations
+    # between physiology and mworks data streams.
+    sync_pulses_path = f'{base_data_dir}/sync_pulses'
+
+    # Path to spike sorting. This is used for reading spike sorted data.
+    spike_sorting_raw_path = f'{base_data_dir}/spike_sorting'
+
+    session_paths = SessionPaths(
+        output=pathlib.Path(output_path),
+        raw_data=pathlib.Path(raw_data_path),
+        data_open_source=pathlib.Path(data_open_source_path),
+        task_behavior_data=pathlib.Path(task_behavior_data_path),
+        sync_pulses=pathlib.Path(sync_pulses_path),
+        spike_sorting_raw=pathlib.Path(spike_sorting_raw_path),
+    )
+
+    return session_paths
+
+
+def get_session_paths(subject, session, stub_test=False, repo='openmind'):
+    """Get paths to all components of the data.
+    
+    Returns:
+        SessionPaths namedtuple.
+    """
+    if repo == 'openmind':
+        return _get_session_paths_openmind(
+            subject=subject, session=session, stub_test=stub_test)
+    elif repo == 'globus':
+        return _get_session_paths_globus(
+            subject=subject, session=session, stub_test=stub_test)
+    else:
+        raise ValueError(f'Invalid repo {repo}')