Skip to content

Commit

Permalink
Merge pull request #4 from catalystneuro/vprobe_dev
Browse files Browse the repository at this point in the history
Add interfaces for V-probe data
  • Loading branch information
CodyCBakerPhD authored Sep 20, 2023
2 parents c6700bd + e8a2d8e commit ad6eb70
Show file tree
Hide file tree
Showing 6 changed files with 338 additions and 54 deletions.
80 changes: 52 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,6 @@ NWB conversion scripts for Jazayeri lab data to the [Neurodata Without Borders](


## Installation
## Basic installation

You can install the latest release of the package with pip:

```
pip install jazayeri-lab-to-nwb
```

We recommend that you install the package inside a [virtual environment](https://docs.python.org/3/tutorial/venv.html). A simple way of doing this is to use a [conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html) from the `conda` package manager ([installation instructions](https://docs.conda.io/en/latest/miniconda.html)). Detailed instructions on how to use conda environments can be found in their [documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).

### Running a specific conversion
Once you have installed the package with pip, you can run any of the conversion scripts in a notebook or a python file:

https://github.com/catalystneuro/jazayeri-lab-to-nwb//tree/main/src/watters/watters_conversion_script.py




## Installation from Github
Another option is to install the package directly from Github. This option has the advantage that the source code can be modifed if you need to amend some of the code we originally provided to adapt to future experimental differences. To install the conversion from GitHub you will need to use `git` ([installation instructions](https://github.com/git-guides/install-git)). We also recommend the installation of `conda` ([installation instructions](https://docs.conda.io/en/latest/miniconda.html)) as it contains all the required machinery in a single and simple instal
Expand All @@ -46,17 +29,6 @@ pip install -e .
Note:
both of the methods above install the repository in [editable mode](https://pip.pypa.io/en/stable/cli/pip_install/#editable-installs).

### Running a specific conversion
To run a specific conversion, you might need to install first some conversion specific dependencies that are located in each conversion directory:
```
pip install -r src/jazayeri_lab_to_nwb/watters/watters_requirements.txt
```

You can run a specific conversion with the following command:
```
python src/jazayeri_lab_to_nwb/watters/watters_conversion_script.py
```

## Repository structure
Each conversion is organized in a directory of its own in the `src` directory:

Expand Down Expand Up @@ -93,3 +65,55 @@ Each conversion is organized in a directory of its own in the `src` directory:
* `watters_notes.md`: notes and comments concerning this specific conversion.

The directory might contain other files that are necessary for the conversion but those are the central ones.


## Running a specific conversion
To run a specific conversion, you might need to install first some conversion specific dependencies that are located in each conversion directory:
```
pip install -r src/jazayeri_lab_to_nwb/watters/watters_requirements.txt
```

You can run a specific conversion with the following command:
```
python src/jazayeri_lab_to_nwb/watters/watters_conversion_script.py
```

### Watters working memory task data
The conversion function for this experiment, `session_to_nwb`, is found in `src/watters/watters_conversion_script.py`. The function takes three arguments:
* `data_dir_path` points to the root directory for the data for a given session.
* `output_dir_path` points to where the converted data should be saved.
* `stub_test` indicates whether only a small portion of the data should be saved (mainly used by us for testing purposes).

The function can be imported in a separate script with and run, or you can run the file directly and specify the arguments in the `if name == "__main__"` block at the bottom.

The function expects the raw data in `data_dir_path` to follow this structure:

data_dir_path/
├── data_open_source
│ ├── behavior
│ │ └── eye.h.times.npy, etc.
│ ├── task
│ └── trials.start_times.json, etc.
│ └── probes.metadata.json
├── raw_data
│ ├── spikeglx
│ └── */*/*.ap.bin, */*/*.lf.bin, etc.
│ ├── v_probe_0
│ └── raw_data.dat
│ └── v_probe_{n}
│ └── raw_data.dat
├── spike_sorting_raw
│ ├── np
│ ├── vp_0
│ └── vp_{n}
├── sync_pulses
├── mworks
├── open_ephys
└── spikeglx
...

The conversion will try to automatically fetch metadata from the provided data directory. However, some information, such as the subject's name and age, must be specified by the user in the file `src/jazayeri_lab_to_nwb/watters/watters_metadata.yaml`. If any of the automatically fetched metadata is incorrect, it can also be overriden from this file.

The converted data will be saved in two files, one called `{session_id}_raw.nwb`, which contains the raw electrophysiology data from the Neuropixels and V-Probes, and one called `{session_id}_processed.nwb` with behavioral data, trial info, and sorted unit spiking.

If you run into memory issues when writing the `{session_id}_raw.nwb` files, you may want to set `buffer_gb` to a value smaller than 1 (its default) in the `conversion_options` dicts for the recording interfaces, i.e. [here](https://github.com/catalystneuro/jazayeri-lab-to-nwb/blob/vprobe_dev/src/jazayeri_lab_to_nwb/watters/watters_convert_session.py#L49) and [here](https://github.com/catalystneuro/jazayeri-lab-to-nwb/blob/vprobe_dev/src/jazayeri_lab_to_nwb/watters/watters_convert_session.py#L71).
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
neuroconv
neuroconv==0.4.3
spikeinterface==0.98.2
nwbwidgets
nwbinspector
pre-commit
1 change: 1 addition & 0 deletions src/jazayeri_lab_to_nwb/watters/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from .wattersbehaviorinterface import WattersEyePositionInterface, WattersPupilSizeInterface
from .watterstrialsinterface import WattersTrialsInterface
from .wattersrecordinginterface import WattersDatRecordingInterface
from .wattersnwbconverter import WattersNWBConverter
98 changes: 77 additions & 21 deletions src/jazayeri_lab_to_nwb/watters/watters_convert_session.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,53 +19,101 @@ def session_to_nwb(data_dir_path: Union[str, Path], output_dir_path: Union[str,
output_dir_path = output_dir_path / "nwb_stub"
output_dir_path.mkdir(parents=True, exist_ok=True)

session_id = "20220601-combined"
nwbfile_path = output_dir_path / f"{session_id}.nwb"
session_id = f"ses-{data_dir_path.name}"
raw_nwbfile_path = output_dir_path / f"{session_id}_raw.nwb"
processed_nwbfile_path = output_dir_path / f"{session_id}_processed.nwb"

raw_source_data = dict()
raw_conversion_options = dict()
processed_source_data = dict()
processed_conversion_options = dict()

for probe_num in range(2):
# Add V-Probe Recording
if not (data_dir_path / "raw_data" / f"v_probe_{probe_num}").exists():
continue
recording_files = list(glob.glob(str(data_dir_path / "raw_data" / f"v_probe_{probe_num}" / "*.dat")))
assert len(recording_files) > 0, f"No .dat files found in {data_dir_path}"
assert len(recording_files) == 1, f"Multiple .dat files found in {data_dir_path}"
recording_source_data = {
f"RecordingVP{probe_num}": dict(
file_path=str(recording_files[0]),
probe_metadata_file=str(data_dir_path / "data_open_source" / "probes.metadata.json"),
probe_key=f"probe{(probe_num+1):02d}",
probe_name=f"vprobe{probe_num}",
es_key=f"ElectricalSeriesVP{probe_num}",
)
}
raw_source_data.update(recording_source_data)
processed_source_data.update(recording_source_data)
raw_conversion_options.update({f"RecordingVP{probe_num}": dict(stub_test=stub_test)})
processed_conversion_options.update(
{f"RecordingVP{probe_num}": dict(stub_test=stub_test, write_electrical_series=False)}
)

source_data = dict()
conversion_options = dict()
# Add V-Probe Sorting
processed_source_data.update(
{
f"SortingVP{probe_num}": dict(
folder_path=str(data_dir_path / "spike_sorting_raw" / f"v_probe_{probe_num}"),
keep_good_only=False,
)
}
)
processed_conversion_options.update({f"SortingVP{probe_num}": dict(stub_test=stub_test, write_as="processing")})

# Add Recording
recording_files = list(glob.glob(str(data_dir_path / "raw_data" / "spikeglx" / "*" / "*" / "*.ap.bin")))
assert len(recording_files) > 0, f"No .ap.bin files found in {data_dir_path}"
assert len(recording_files) == 1, f"Multiple .ap.bin files found in {data_dir_path}"
source_data.update(dict(RecordingNP=dict(file_path=str(recording_files[0]))))
conversion_options.update(dict(RecordingNP=dict(stub_test=stub_test)))
raw_source_data.update(dict(RecordingNP=dict(file_path=str(recording_files[0]))))
processed_source_data.update(dict(RecordingNP=dict(file_path=str(recording_files[0]))))
raw_conversion_options.update(dict(RecordingNP=dict(stub_test=stub_test)))
processed_conversion_options.update(dict(RecordingNP=dict(stub_test=stub_test, write_electrical_series=False)))

# Add LFP
lfp_files = list(glob.glob(str(data_dir_path / "raw_data" / "spikeglx" / "*" / "*" / "*.lf.bin")))
assert len(lfp_files) > 0, f"No .lf.bin files found in {data_dir_path}"
assert len(lfp_files) == 1, f"Multiple .lf.bin files found in {data_dir_path}"
source_data.update(dict(LFP=dict(file_path=str(lfp_files[0]), es_key="ElectricalSeriesLF")))
conversion_options.update(dict(LFP=dict(write_as="lfp", stub_test=stub_test)))
raw_source_data.update(dict(LF=dict(file_path=str(lfp_files[0]))))
processed_source_data.update(dict(LF=dict(file_path=str(lfp_files[0]))))
raw_conversion_options.update(dict(LF=dict(stub_test=stub_test)))
processed_conversion_options.update(dict(LF=dict(stub_test=stub_test, write_electrical_series=False)))

# Add Sorting
source_data.update(
processed_source_data.update(
dict(
SortingNP=dict(
folder_path=str(data_dir_path / "spike_sorting_raw" / "np"),
keep_good_only=True,
keep_good_only=False,
)
)
)
conversion_options.update(dict(SortingNP=dict(stub_test=stub_test, write_as="processing")))
processed_conversion_options.update(dict(SortingNP=dict(stub_test=stub_test, write_as="processing")))

# Add Behavior
source_data.update(dict(EyePosition=dict(folder_path=str(data_dir_path / "data_open_source" / "behavior"))))
conversion_options.update(dict(EyePosition=dict()))
processed_source_data.update(
dict(EyePosition=dict(folder_path=str(data_dir_path / "data_open_source" / "behavior")))
)
processed_conversion_options.update(dict(EyePosition=dict()))

source_data.update(dict(PupilSize=dict(folder_path=str(data_dir_path / "data_open_source" / "behavior"))))
conversion_options.update(dict(PupilSize=dict()))
processed_source_data.update(dict(PupilSize=dict(folder_path=str(data_dir_path / "data_open_source" / "behavior"))))
processed_conversion_options.update(dict(PupilSize=dict()))

# Add Trials
source_data.update(dict(Trials=dict(folder_path=str(data_dir_path / "data_open_source"))))
conversion_options.update(dict(Trials=dict()))
processed_source_data.update(dict(Trials=dict(folder_path=str(data_dir_path / "data_open_source"))))
processed_conversion_options.update(dict(Trials=dict()))

converter = WattersNWBConverter(source_data=source_data, sync_dir=str(data_dir_path / "sync_pulses"))
processed_converter = WattersNWBConverter(
source_data=processed_source_data, sync_dir=str(data_dir_path / "sync_pulses")
)

# Add datetime to conversion
metadata = converter.get_metadata()
date = datetime.datetime(year=2022, month=6, day=1, tzinfo=ZoneInfo("US/Eastern"))
metadata = processed_converter.get_metadata() # use processed b/c it has everything
try:
date = datetime.datetime.strptime(data_dir_path.name, "%Y-%m-%d").replace(tzinfo=ZoneInfo("US/Eastern"))
except:
date = datetime.datetime(year=2022, month=6, day=1, tzinfo=ZoneInfo("US/Eastern"))
metadata["NWBFile"]["session_start_time"] = date
metadata["NWBFile"]["session_id"] = session_id

Expand Down Expand Up @@ -95,13 +143,21 @@ def session_to_nwb(data_dir_path: Union[str, Path], output_dir_path: Union[str,
metadata = dict_deep_update(metadata, editable_metadata)

# Run conversion
converter.run_conversion(metadata=metadata, nwbfile_path=nwbfile_path, conversion_options=conversion_options)
processed_converter.run_conversion(
metadata=metadata, nwbfile_path=processed_nwbfile_path, conversion_options=processed_conversion_options
)

raw_converter = WattersNWBConverter(source_data=raw_source_data, sync_dir=str(data_dir_path / "sync_pulses"))
raw_converter.run_conversion(
metadata=metadata, nwbfile_path=raw_nwbfile_path, conversion_options=raw_conversion_options
)


if __name__ == "__main__":

# Parameters for conversion
data_dir_path = Path("/shared/catalystneuro/JazLab/monkey0/2022-06-01/")
# data_dir_path = Path("/shared/catalystneuro/JazLab/monkey1/2022-06-05/")
output_dir_path = Path("~/conversion_nwb/jazayeri-lab-to-nwb/watters_perle_combined/").expanduser()
stub_test = True

Expand Down
64 changes: 60 additions & 4 deletions src/jazayeri_lab_to_nwb/watters/wattersnwbconverter.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,11 @@
from neuroconv.basetemporalalignmentinterface import BaseTemporalAlignmentInterface
from neuroconv.datainterfaces.text.timeintervalsinterface import TimeIntervalsInterface

from spikeinterface.core.waveform_tools import has_exceeding_spikes
from spikeinterface.curation import remove_excess_spikes

from jazayeri_lab_to_nwb.watters import (
WattersDatRecordingInterface,
WattersEyePositionInterface,
WattersPupilSizeInterface,
WattersTrialsInterface,
Expand All @@ -26,8 +30,12 @@ class WattersNWBConverter(NWBConverter):
"""Primary conversion class for my extracellular electrophysiology dataset."""

data_interface_classes = dict(
RecordingVP0=WattersDatRecordingInterface,
SortingVP0=KiloSortSortingInterface,
RecordingVP1=WattersDatRecordingInterface,
SortingVP1=KiloSortSortingInterface,
RecordingNP=SpikeGLXRecordingInterface,
LFP=SpikeGLXRecordingInterface,
LF=SpikeGLXRecordingInterface,
SortingNP=KiloSortSortingInterface,
EyePosition=WattersEyePositionInterface,
PupilSize=WattersPupilSizeInterface,
Expand All @@ -44,6 +52,15 @@ def __init__(
super().__init__(source_data=source_data, verbose=verbose)
self.sync_dir = sync_dir

unit_name_start = 0
for name, data_interface in self.data_interface_objects.items():
if isinstance(data_interface, BaseSortingExtractorInterface):
unit_ids = np.array(data_interface.sorting_extractor.unit_ids)
data_interface.sorting_extractor.set_property(
key="unit_name", values=(unit_ids + unit_name_start).astype(str)
)
unit_name_start += np.max(unit_ids) + 1

def temporally_align_data_interfaces(self):
if self.sync_dir is None:
return
Expand All @@ -53,18 +70,57 @@ def temporally_align_data_interfaces(self):
with open(sync_dir / "mworks" / "open_source_minus_processed", "r") as f:
bias = float(f.read().strip())

# openephys alignment
with open(sync_dir / "open_ephys" / "recording_start_time") as f:
start_time = float(f.read().strip())
with open(sync_dir / "open_ephys" / "transform", "r") as f:
transform = json.load(f)
for i in [0, 1]:
if f"RecordingVP{i}" in self.data_interface_objects:
orig_timestamps = self.data_interface_objects[f"RecordingVP{i}"].get_timestamps()
aligned_timestamps = bias + transform["intercept"] + transform["coef"] * (start_time + orig_timestamps)
self.data_interface_objects[f"RecordingVP{i}"].set_aligned_timestamps(aligned_timestamps)
# openephys sorting alignment
if f"SortingVP{i}" in self.data_interface_objects:
if has_exceeding_spikes(
recording=self.data_interface_objects[f"RecordingVP{i}"].recording_extractor,
sorting=self.data_interface_objects[f"SortingVP{i}"].sorting_extractor,
):
print(
f"Spikes exceeding recording found in SortingVP{i}! Removing with `spikeinterface.curation.remove_excess_spikes()`"
)
self.data_interface_objects[f"SortingVP{i}"].sorting_extractor = remove_excess_spikes(
recording=self.data_interface_objects[f"RecordingVP{i}"].recording_extractor,
sorting=self.data_interface_objects[f"SortingVP{i}"].sorting_extractor,
)
self.data_interface_objects[f"SortingVP{i}"].register_recording(
self.data_interface_objects[f"RecordingVP{i}"]
)

# neuropixel alignment
orig_timestamps = self.data_interface_objects["RecordingNP"].get_timestamps()
with open(sync_dir / "spikeglx" / "transform", "r") as f:
transform = json.load(f)
aligned_timestamps = bias + transform["intercept"] + transform["coef"] * orig_timestamps
self.data_interface_objects["RecordingNP"].set_aligned_timestamps(aligned_timestamps)
# neuropixel LFP alignment
orig_timestamps = self.data_interface_objects["LFP"].get_timestamps()
orig_timestamps = self.data_interface_objects["LF"].get_timestamps()
aligned_timestamps = bias + transform["intercept"] + transform["coef"] * orig_timestamps
self.data_interface_objects["LFP"].set_aligned_timestamps(aligned_timestamps)
self.data_interface_objects["LF"].set_aligned_timestamps(aligned_timestamps)
# neuropixel sorting alignment
self.data_interface_objects["SortingNP"].register_recording(self.data_interface_objects["RecordingNP"])
if "SortingNP" in self.data_interface_objects:
if has_exceeding_spikes(
recording=self.data_interface_objects[f"RecordingNP"].recording_extractor,
sorting=self.data_interface_objects[f"SortingNP"].sorting_extractor,
):
print(
"Spikes exceeding recording found in SortingNP! Removing with `spikeinterface.curation.remove_excess_spikes()`"
)
self.data_interface_objects[f"SortingNP"].sorting_extractor = remove_excess_spikes(
recording=self.data_interface_objects[f"RecordingNP"].recording_extractor,
sorting=self.data_interface_objects[f"SortingNP"].sorting_extractor,
)
self.data_interface_objects[f"SortingNP"].register_recording(self.data_interface_objects[f"RecordingNP"])

# align recording start to 0
aligned_start_times = []
Expand Down
Loading

0 comments on commit ad6eb70

Please sign in to comment.