Merge pull request #4 from catalystneuro/vprobe_dev

Add interfaces for V-probe data
catalystneuro · Sep 20, 2023 · ad6eb70 · ad6eb70
2 parents c6700bd + e8a2d8e
commit ad6eb70
Show file tree

Hide file tree

Showing 6 changed files with 338 additions and 54 deletions.
diff --git a/README.md b/README.md
@@ -3,23 +3,6 @@ NWB conversion scripts for Jazayeri lab data to the [Neurodata Without Borders](
 
 
 ## Installation
-## Basic installation
-
-You can install the latest release of the package with pip:
-
-```
-pip install jazayeri-lab-to-nwb
-```
-
-We recommend that you install the package inside a [virtual environment](https://docs.python.org/3/tutorial/venv.html). A simple way of doing this is to use a [conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html) from the `conda` package manager ([installation instructions](https://docs.conda.io/en/latest/miniconda.html)). Detailed instructions on how to use conda environments can be found in their [documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).
-
-### Running a specific conversion
-Once you have installed the package with pip, you can run any of the conversion scripts in a notebook or a python file:
-
-https://github.com/catalystneuro/jazayeri-lab-to-nwb//tree/main/src/watters/watters_conversion_script.py
-
-
-
 
 ## Installation from Github
 Another option is to install the package directly from Github. This option has the advantage that the source code can be modifed if you need to amend some of the code we originally provided to adapt to future experimental differences. To install the conversion from GitHub you will need to use `git` ([installation instructions](https://github.com/git-guides/install-git)). We also recommend the installation of `conda` ([installation instructions](https://docs.conda.io/en/latest/miniconda.html)) as it contains all the required machinery in a single and simple instal
@@ -46,17 +29,6 @@ pip install -e .
 Note:
 both of the methods above install the repository in [editable mode](https://pip.pypa.io/en/stable/cli/pip_install/#editable-installs).
 
-### Running a specific conversion
-To run a specific conversion, you might need to install first some conversion specific dependencies that are located in each conversion directory:
-```
-pip install -r src/jazayeri_lab_to_nwb/watters/watters_requirements.txt
-```
-
-You can run a specific conversion with the following command:
-```
-python src/jazayeri_lab_to_nwb/watters/watters_conversion_script.py
-```
-
 ## Repository structure
 Each conversion is organized in a directory of its own in the `src` directory:
 
@@ -93,3 +65,55 @@ Each conversion is organized in a directory of its own in the `src` directory:
 * `watters_notes.md`: notes and comments concerning this specific conversion.
 
 The directory might contain other files that are necessary for the conversion but those are the central ones.
+
+
+## Running a specific conversion
+To run a specific conversion, you might need to install first some conversion specific dependencies that are located in each conversion directory:
+```
+pip install -r src/jazayeri_lab_to_nwb/watters/watters_requirements.txt
+```
+
+You can run a specific conversion with the following command:
+```
+python src/jazayeri_lab_to_nwb/watters/watters_conversion_script.py
+```
+
+### Watters working memory task data
+The conversion function for this experiment, `session_to_nwb`, is found in `src/watters/watters_conversion_script.py`. The function takes three arguments:
+* `data_dir_path` points to the root directory for the data for a given session.
+* `output_dir_path` points to where the converted data should be saved.
+* `stub_test` indicates whether only a small portion of the data should be saved (mainly used by us for testing purposes).
+
+The function can be imported in a separate script with and run, or you can run the file directly and specify the arguments in the `if name == "__main__"` block at the bottom.
+
+The function expects the raw data in `data_dir_path` to follow this structure:
+
+    data_dir_path/
+    ├── data_open_source
+    │   ├── behavior
+    │   │   └── eye.h.times.npy, etc.
+    │   ├── task
+    │       └── trials.start_times.json, etc.
+    │   └── probes.metadata.json
+    ├── raw_data
+    │   ├── spikeglx
+    │       └── */*/*.ap.bin, */*/*.lf.bin, etc.
+    │   ├── v_probe_0
+    │       └── raw_data.dat
+    │   └── v_probe_{n}
+    │       └── raw_data.dat
+    ├── spike_sorting_raw
+    │   ├── np
+    │   ├── vp_0
+    │   └── vp_{n}
+    ├── sync_pulses
+        ├── mworks
+        ├── open_ephys
+        └── spikeglx
+    ...
+
+The conversion will try to automatically fetch metadata from the provided data directory. However, some information, such as the subject's name and age, must be specified by the user in the file `src/jazayeri_lab_to_nwb/watters/watters_metadata.yaml`. If any of the automatically fetched metadata is incorrect, it can also be overriden from this file.
+
+The converted data will be saved in two files, one called `{session_id}_raw.nwb`, which contains the raw electrophysiology data from the Neuropixels and V-Probes, and one called `{session_id}_processed.nwb` with behavioral data, trial info, and sorted unit spiking.
+
+If you run into memory issues when writing the `{session_id}_raw.nwb` files, you may want to set `buffer_gb` to a value smaller than 1 (its default) in the `conversion_options` dicts for the recording interfaces, i.e. [here](https://github.com/catalystneuro/jazayeri-lab-to-nwb/blob/vprobe_dev/src/jazayeri_lab_to_nwb/watters/watters_convert_session.py#L49) and [here](https://github.com/catalystneuro/jazayeri-lab-to-nwb/blob/vprobe_dev/src/jazayeri_lab_to_nwb/watters/watters_convert_session.py#L71).
diff --git a/requirements.txt b/requirements.txt
@@ -1,4 +1,5 @@
-neuroconv
+neuroconv==0.4.3
+spikeinterface==0.98.2
 nwbwidgets
 nwbinspector
 pre-commit
diff --git a/src/jazayeri_lab_to_nwb/watters/__init__.py b/src/jazayeri_lab_to_nwb/watters/__init__.py
@@ -1,3 +1,4 @@
 from .wattersbehaviorinterface import WattersEyePositionInterface, WattersPupilSizeInterface
 from .watterstrialsinterface import WattersTrialsInterface
+from .wattersrecordinginterface import WattersDatRecordingInterface
 from .wattersnwbconverter import WattersNWBConverter
diff --git a/src/jazayeri_lab_to_nwb/watters/watters_convert_session.py b/src/jazayeri_lab_to_nwb/watters/watters_convert_session.py
@@ -19,53 +19,101 @@ def session_to_nwb(data_dir_path: Union[str, Path], output_dir_path: Union[str,
         output_dir_path = output_dir_path / "nwb_stub"
     output_dir_path.mkdir(parents=True, exist_ok=True)
 
-    session_id = "20220601-combined"
-    nwbfile_path = output_dir_path / f"{session_id}.nwb"
+    session_id = f"ses-{data_dir_path.name}"
+    raw_nwbfile_path = output_dir_path / f"{session_id}_raw.nwb"
+    processed_nwbfile_path = output_dir_path / f"{session_id}_processed.nwb"
+
+    raw_source_data = dict()
+    raw_conversion_options = dict()
+    processed_source_data = dict()
+    processed_conversion_options = dict()
+
+    for probe_num in range(2):
+        # Add V-Probe Recording
+        if not (data_dir_path / "raw_data" / f"v_probe_{probe_num}").exists():
+            continue
+        recording_files = list(glob.glob(str(data_dir_path / "raw_data" / f"v_probe_{probe_num}" / "*.dat")))
+        assert len(recording_files) > 0, f"No .dat files found in {data_dir_path}"
+        assert len(recording_files) == 1, f"Multiple .dat files found in {data_dir_path}"
+        recording_source_data = {
+            f"RecordingVP{probe_num}": dict(
+                file_path=str(recording_files[0]),
+                probe_metadata_file=str(data_dir_path / "data_open_source" / "probes.metadata.json"),
+                probe_key=f"probe{(probe_num+1):02d}",
+                probe_name=f"vprobe{probe_num}",
+                es_key=f"ElectricalSeriesVP{probe_num}",
+            )
+        }
+        raw_source_data.update(recording_source_data)
+        processed_source_data.update(recording_source_data)
+        raw_conversion_options.update({f"RecordingVP{probe_num}": dict(stub_test=stub_test)})
+        processed_conversion_options.update(
+            {f"RecordingVP{probe_num}": dict(stub_test=stub_test, write_electrical_series=False)}
+        )
 
-    source_data = dict()
-    conversion_options = dict()
+        # Add V-Probe Sorting
+        processed_source_data.update(
+            {
+                f"SortingVP{probe_num}": dict(
+                    folder_path=str(data_dir_path / "spike_sorting_raw" / f"v_probe_{probe_num}"),
+                    keep_good_only=False,
+                )
+            }
+        )
+        processed_conversion_options.update({f"SortingVP{probe_num}": dict(stub_test=stub_test, write_as="processing")})
 
     # Add Recording
     recording_files = list(glob.glob(str(data_dir_path / "raw_data" / "spikeglx" / "*" / "*" / "*.ap.bin")))
     assert len(recording_files) > 0, f"No .ap.bin files found in {data_dir_path}"
     assert len(recording_files) == 1, f"Multiple .ap.bin files found in {data_dir_path}"
-    source_data.update(dict(RecordingNP=dict(file_path=str(recording_files[0]))))
-    conversion_options.update(dict(RecordingNP=dict(stub_test=stub_test)))
+    raw_source_data.update(dict(RecordingNP=dict(file_path=str(recording_files[0]))))
+    processed_source_data.update(dict(RecordingNP=dict(file_path=str(recording_files[0]))))
+    raw_conversion_options.update(dict(RecordingNP=dict(stub_test=stub_test)))
+    processed_conversion_options.update(dict(RecordingNP=dict(stub_test=stub_test, write_electrical_series=False)))
 
     # Add LFP
     lfp_files = list(glob.glob(str(data_dir_path / "raw_data" / "spikeglx" / "*" / "*" / "*.lf.bin")))
     assert len(lfp_files) > 0, f"No .lf.bin files found in {data_dir_path}"
     assert len(lfp_files) == 1, f"Multiple .lf.bin files found in {data_dir_path}"
-    source_data.update(dict(LFP=dict(file_path=str(lfp_files[0]), es_key="ElectricalSeriesLF")))
-    conversion_options.update(dict(LFP=dict(write_as="lfp", stub_test=stub_test)))
+    raw_source_data.update(dict(LF=dict(file_path=str(lfp_files[0]))))
+    processed_source_data.update(dict(LF=dict(file_path=str(lfp_files[0]))))
+    raw_conversion_options.update(dict(LF=dict(stub_test=stub_test)))
+    processed_conversion_options.update(dict(LF=dict(stub_test=stub_test, write_electrical_series=False)))
 
     # Add Sorting
-    source_data.update(
+    processed_source_data.update(
         dict(
             SortingNP=dict(
                 folder_path=str(data_dir_path / "spike_sorting_raw" / "np"),
-                keep_good_only=True,
+                keep_good_only=False,
             )
         )
     )
-    conversion_options.update(dict(SortingNP=dict(stub_test=stub_test, write_as="processing")))
+    processed_conversion_options.update(dict(SortingNP=dict(stub_test=stub_test, write_as="processing")))
 
     # Add Behavior
-    source_data.update(dict(EyePosition=dict(folder_path=str(data_dir_path / "data_open_source" / "behavior"))))
-    conversion_options.update(dict(EyePosition=dict()))
+    processed_source_data.update(
+        dict(EyePosition=dict(folder_path=str(data_dir_path / "data_open_source" / "behavior")))
+    )
+    processed_conversion_options.update(dict(EyePosition=dict()))
 
-    source_data.update(dict(PupilSize=dict(folder_path=str(data_dir_path / "data_open_source" / "behavior"))))
-    conversion_options.update(dict(PupilSize=dict()))
+    processed_source_data.update(dict(PupilSize=dict(folder_path=str(data_dir_path / "data_open_source" / "behavior"))))
+    processed_conversion_options.update(dict(PupilSize=dict()))
 
     # Add Trials
-    source_data.update(dict(Trials=dict(folder_path=str(data_dir_path / "data_open_source"))))
-    conversion_options.update(dict(Trials=dict()))
+    processed_source_data.update(dict(Trials=dict(folder_path=str(data_dir_path / "data_open_source"))))
+    processed_conversion_options.update(dict(Trials=dict()))
 
-    converter = WattersNWBConverter(source_data=source_data, sync_dir=str(data_dir_path / "sync_pulses"))
+    processed_converter = WattersNWBConverter(
+        source_data=processed_source_data, sync_dir=str(data_dir_path / "sync_pulses")
+    )
 
     # Add datetime to conversion
-    metadata = converter.get_metadata()
-    date = datetime.datetime(year=2022, month=6, day=1, tzinfo=ZoneInfo("US/Eastern"))
+    metadata = processed_converter.get_metadata()  # use processed b/c it has everything
+    try:
+        date = datetime.datetime.strptime(data_dir_path.name, "%Y-%m-%d").replace(tzinfo=ZoneInfo("US/Eastern"))
+    except:
+        date = datetime.datetime(year=2022, month=6, day=1, tzinfo=ZoneInfo("US/Eastern"))
     metadata["NWBFile"]["session_start_time"] = date
     metadata["NWBFile"]["session_id"] = session_id
 
@@ -95,13 +143,21 @@ def session_to_nwb(data_dir_path: Union[str, Path], output_dir_path: Union[str,
     metadata = dict_deep_update(metadata, editable_metadata)
 
     # Run conversion
-    converter.run_conversion(metadata=metadata, nwbfile_path=nwbfile_path, conversion_options=conversion_options)
+    processed_converter.run_conversion(
+        metadata=metadata, nwbfile_path=processed_nwbfile_path, conversion_options=processed_conversion_options
+    )
+
+    raw_converter = WattersNWBConverter(source_data=raw_source_data, sync_dir=str(data_dir_path / "sync_pulses"))
+    raw_converter.run_conversion(
+        metadata=metadata, nwbfile_path=raw_nwbfile_path, conversion_options=raw_conversion_options
+    )
 
 
 if __name__ == "__main__":
 
     # Parameters for conversion
     data_dir_path = Path("/shared/catalystneuro/JazLab/monkey0/2022-06-01/")
+    # data_dir_path = Path("/shared/catalystneuro/JazLab/monkey1/2022-06-05/")
     output_dir_path = Path("~/conversion_nwb/jazayeri-lab-to-nwb/watters_perle_combined/").expanduser()
     stub_test = True
 

diff --git a/src/jazayeri_lab_to_nwb/watters/wattersnwbconverter.py b/src/jazayeri_lab_to_nwb/watters/wattersnwbconverter.py
@@ -15,7 +15,11 @@
 from neuroconv.basetemporalalignmentinterface import BaseTemporalAlignmentInterface
 from neuroconv.datainterfaces.text.timeintervalsinterface import TimeIntervalsInterface
 
+from spikeinterface.core.waveform_tools import has_exceeding_spikes
+from spikeinterface.curation import remove_excess_spikes
+
 from jazayeri_lab_to_nwb.watters import (
+    WattersDatRecordingInterface,
     WattersEyePositionInterface,
     WattersPupilSizeInterface,
     WattersTrialsInterface,
@@ -26,8 +30,12 @@ class WattersNWBConverter(NWBConverter):
     """Primary conversion class for my extracellular electrophysiology dataset."""
 
     data_interface_classes = dict(
+        RecordingVP0=WattersDatRecordingInterface,
+        SortingVP0=KiloSortSortingInterface,
+        RecordingVP1=WattersDatRecordingInterface,
+        SortingVP1=KiloSortSortingInterface,
         RecordingNP=SpikeGLXRecordingInterface,
-        LFP=SpikeGLXRecordingInterface,
+        LF=SpikeGLXRecordingInterface,
         SortingNP=KiloSortSortingInterface,
         EyePosition=WattersEyePositionInterface,
         PupilSize=WattersPupilSizeInterface,
@@ -44,6 +52,15 @@ def __init__(
         super().__init__(source_data=source_data, verbose=verbose)
         self.sync_dir = sync_dir
 
+        unit_name_start = 0
+        for name, data_interface in self.data_interface_objects.items():
+            if isinstance(data_interface, BaseSortingExtractorInterface):
+                unit_ids = np.array(data_interface.sorting_extractor.unit_ids)
+                data_interface.sorting_extractor.set_property(
+                    key="unit_name", values=(unit_ids + unit_name_start).astype(str)
+                )
+                unit_name_start += np.max(unit_ids) + 1
+
     def temporally_align_data_interfaces(self):
         if self.sync_dir is None:
             return
@@ -53,18 +70,57 @@ def temporally_align_data_interfaces(self):
         with open(sync_dir / "mworks" / "open_source_minus_processed", "r") as f:
             bias = float(f.read().strip())
 
+        # openephys alignment
+        with open(sync_dir / "open_ephys" / "recording_start_time") as f:
+            start_time = float(f.read().strip())
+        with open(sync_dir / "open_ephys" / "transform", "r") as f:
+            transform = json.load(f)
+        for i in [0, 1]:
+            if f"RecordingVP{i}" in self.data_interface_objects:
+                orig_timestamps = self.data_interface_objects[f"RecordingVP{i}"].get_timestamps()
+                aligned_timestamps = bias + transform["intercept"] + transform["coef"] * (start_time + orig_timestamps)
+                self.data_interface_objects[f"RecordingVP{i}"].set_aligned_timestamps(aligned_timestamps)
+                # openephys sorting alignment
+                if f"SortingVP{i}" in self.data_interface_objects:
+                    if has_exceeding_spikes(
+                        recording=self.data_interface_objects[f"RecordingVP{i}"].recording_extractor,
+                        sorting=self.data_interface_objects[f"SortingVP{i}"].sorting_extractor,
+                    ):
+                        print(
+                            f"Spikes exceeding recording found in SortingVP{i}! Removing with `spikeinterface.curation.remove_excess_spikes()`"
+                        )
+                        self.data_interface_objects[f"SortingVP{i}"].sorting_extractor = remove_excess_spikes(
+                            recording=self.data_interface_objects[f"RecordingVP{i}"].recording_extractor,
+                            sorting=self.data_interface_objects[f"SortingVP{i}"].sorting_extractor,
+                        )
+                    self.data_interface_objects[f"SortingVP{i}"].register_recording(
+                        self.data_interface_objects[f"RecordingVP{i}"]
+                    )
+
         # neuropixel alignment
         orig_timestamps = self.data_interface_objects["RecordingNP"].get_timestamps()
         with open(sync_dir / "spikeglx" / "transform", "r") as f:
             transform = json.load(f)
         aligned_timestamps = bias + transform["intercept"] + transform["coef"] * orig_timestamps
         self.data_interface_objects["RecordingNP"].set_aligned_timestamps(aligned_timestamps)
         # neuropixel LFP alignment
-        orig_timestamps = self.data_interface_objects["LFP"].get_timestamps()
+        orig_timestamps = self.data_interface_objects["LF"].get_timestamps()
         aligned_timestamps = bias + transform["intercept"] + transform["coef"] * orig_timestamps
-        self.data_interface_objects["LFP"].set_aligned_timestamps(aligned_timestamps)
+        self.data_interface_objects["LF"].set_aligned_timestamps(aligned_timestamps)
         # neuropixel sorting alignment
-        self.data_interface_objects["SortingNP"].register_recording(self.data_interface_objects["RecordingNP"])
+        if "SortingNP" in self.data_interface_objects:
+            if has_exceeding_spikes(
+                recording=self.data_interface_objects[f"RecordingNP"].recording_extractor,
+                sorting=self.data_interface_objects[f"SortingNP"].sorting_extractor,
+            ):
+                print(
+                    "Spikes exceeding recording found in SortingNP! Removing with `spikeinterface.curation.remove_excess_spikes()`"
+                )
+                self.data_interface_objects[f"SortingNP"].sorting_extractor = remove_excess_spikes(
+                    recording=self.data_interface_objects[f"RecordingNP"].recording_extractor,
+                    sorting=self.data_interface_objects[f"SortingNP"].sorting_extractor,
+                )
+            self.data_interface_objects[f"SortingNP"].register_recording(self.data_interface_objects[f"RecordingNP"])
 
         # align recording start to 0
         aligned_start_times = []