Add hdf5 backend support for`NwbRecordingExtractor` #2298

h-mayorquin · 2023-12-06T09:30:47Z

Depends on #2294. This makes the NWB extraction faster as it avoids the nwb validation.

@magland If you have some time to sparse it would be great to get your pair of eyes.

magland · 2023-12-06T13:23:54Z

@h-mayorquin Overall this looks very good.

I am a bit concerned about the electrical_series_name / electrical_series_location. I understand you are trying to maximize the chance that it will find the correct electrical series object, and provide helpful output when it is not found. But I think it's best to be explicit about the procedure that is used to determine this. Perhaps you could simplify wherever possible? And more importantly spell out the procedure in the docs for the class?

h-mayorquin · 2023-12-06T17:38:39Z

@magland Thanks for taking a look.

I added further explanation in the docstring concerning electrical_series_location. I will iterate to improve even more.

I am a bit concerned about the electrical_series_name / electrical_series_location. I understand you are trying to maximize the chance that it will find the correct electrical series object, and provide helpful output when it is not found. But I think it's best to be explicit about the procedure that is used to determine this.

Yes, so to describe it here. The intention is for It to iterate over all the groups in the hdf5 file and find the one which name matches with electrical_series_name. As you said, it is a convenience function whose main goal is that users can keep accessing their data by specifying the electrical_series_name as they are doing with the current extractor in the main branch. Do you think it would be better to restrict users to specify the location when they are using hdf5-fast-mode instead? That certaintly would be easier for us to mantain. @alejoe91 what do you think?

h-mayorquin · 2023-12-06T17:40:25Z

I did some changes in the last commit as changing all the behavior is too drastic. I instead moved to a factory pattern where the old behavior is called when a flag "fast_mode" is set to False. That is, if fast_mode=True the class _NWBHDF5RecordingExtractor is used to build the recording using the h5py, otherwise, the class _NwbPureRecordingExtractor (the original recording) is used instead and in turn uses pynwb to build the recording.

The advantages of the factory pattern are:

Keeps the old behavior as a fallback.
Keeps the code organized without having to use if all over the place within the same __init__ constructor.
Paves the way for a third option in the future to add support to reading zarr nwb files in fast mode.
Will be easier to profile between modes.
Will be easier to detect inconsistencies by changing mode.

The disavantages are that is a tad more complex and it introduces more code to mantain. There is some code-duplication now between the _NWBHDF5RecordingExtractor and _NwbPureRecordingExtractor but I think that's fine for the moment. I will reduce this in a Future PR to keep things on this one from blowing up more than it already is.

I added tests for the two modes now. Also I feel unsatisfied with the name fast_mode for the flag, I welcome any suggestion for a better flag name.

h-mayorquin · 2023-12-07T09:06:26Z

OK, I changed fast_mode to io_backend which is literal that can be "pynwb" or "hdf5". I think this is more precise and will be useful when we start getting zarr files. It has the downside that is less clear about the purpose of using the hdf5 backend so I have indicated this in the docstring.

src/spikeinterface/extractors/nwbextractors.py

Co-authored-by: Alessio Buccino <[email protected]>

h-mayorquin · 2023-12-13T11:57:08Z

remfile and hdf5 with electrical series name: 1 minute 15 seconds
remfile and hdf5 with electrical series location: 10.9 seconds
remfile and pynwb with electrical series name: 1 minute 23 seconds

fsspec just hangs for all the cases with this files. 50 minutes and still was not done.

src/spikeinterface/extractors/nwbextractors.py

h-mayorquin · 2023-12-14T09:10:26Z

After a discussion with @alejoe91 we decided to change the name to "use_pynwb" and the default should be False. When we add support for Zarr we should identify on the fly if the file is Zarr and use either h5py or the zarr library to extract the data and modify the documentation when use_pynwb=False. Meanwhile, the reading of zarr files will require use_pynwb=True.

alejoe91

Thansk for this amazing work @h-mayorquin

I have a couple of very minor final requests!

src/spikeinterface/extractors/nwbextractors.py

alejoe91 · 2023-12-14T10:52:17Z

src/spikeinterface/extractors/nwbextractors.py

+            sampling_frequency = electrical_series["starting_time"].attrs["rate"]
+        elif "timestamps" in electrical_series.keys():
+            t_start = electrical_series["timestamps"][0]
+            sampling_frequency = 1 / np.median(np.diff(electrical_series["timestamps"][:1000]))


Suggested change

sampling_frequency = 1 / np.median(np.diff(electrical_series["timestamps"][:1000]))

sampling_frequency = 1 / np.median(np.diff(electrical_series["timestamps"][:samples_for_rate_estimation]))

Let's change this.

I also think that the samples_for_rate_estimation default is too large. I think tha 1000 samples should be OK.

agreed. Can you change to 1000?

Should be done.

src/spikeinterface/extractors/nwbextractors.py

alejoe91 · 2023-12-14T10:55:46Z

src/spikeinterface/extractors/tests/test_nwbextractors.py

+@pytest.mark.parametrize("electrical_series_name", ["ElectricalSeries1", "ElectricalSeries2"])
+def test_that_hdf5_and_pynwb_extractors_return_the_same_data(path_to_nwbfile, electrical_series_name):
+    recording_extractor_hdf5 = NwbRecordingExtractor(
+        path_to_nwbfile,
+        electrical_series_name=electrical_series_name,
+        use_pynwb=False,
+    )
+
+    recording_extractor_pynwb = NwbRecordingExtractor(
+        path_to_nwbfile,
+        electrical_series_name=electrical_series_name,
+        use_pynwb=True,
+    )
+
+    check_recordings_equal(recording_extractor_hdf5, recording_extractor_pynwb)


Co-authored-by: Alessio Buccino <[email protected]>

magland · 2023-12-15T16:33:14Z

Thanks @h-mayorquin and @alejoe91 !

h-mayorquin self-assigned this Dec 6, 2023

fast recorder

d70f524

h-mayorquin marked this pull request as ready for review December 6, 2023 11:27

Merge branch 'main' into fast_mode_hdf5_recording

329d83f

zm711 added the extractors Related to extractors module label Dec 6, 2023

factory pattern

6b99160

h-mayorquin added 2 commits December 6, 2023 18:40

docstring

7649972

change fast_mode to bakcned and improve docstring

be135ee

This was referenced Dec 7, 2023

[Backend Configuration IIIa] Configure all datasets in-memory to be written in specific way to disk catalystneuro/neuroconv#571

Merged

Refactor NWBSortingSegment #2313

Merged

Add hdf5 backend support for NwbSortingExtractor #2314

Closed

h-mayorquin changed the title ~~Faster NWBExtractor NwbRecordingExtractor~~ Add hdf5 backend support forNwbRecordingExtractor Dec 8, 2023

h-mayorquin added 2 commits December 13, 2023 10:13

reverse extractor

85d138f

more helfpul messages

96b07c1

alejoe91 reviewed Dec 13, 2023

View reviewed changes

src/spikeinterface/extractors/nwbextractors.py Outdated Show resolved Hide resolved

Update src/spikeinterface/extractors/nwbextractors.py

c58547d

Co-authored-by: Alessio Buccino <[email protected]>

h-mayorquin commented Dec 13, 2023

View reviewed changes

src/spikeinterface/extractors/nwbextractors.py Outdated Show resolved Hide resolved

h-mayorquin added 2 commits December 13, 2023 17:17

modify name add extra test

52740ed

Merge branch 'main' into fast_mode_hdf5_recording

edef081

alejoe91 requested changes Dec 14, 2023

View reviewed changes

h-mayorquin and others added 4 commits December 14, 2023 12:01

Update src/spikeinterface/extractors/nwbextractors.py

d0af78b

Co-authored-by: Alessio Buccino <[email protected]>

alessio suggestions

bb801c3

Merge branch 'main' into fast_mode_hdf5_recording

bbb3ae2

stuff

a474c4c

lowe rates for sample estimation

ef248db

alejoe91 approved these changes Dec 14, 2023

View reviewed changes

h-mayorquin merged commit 0a0570b into SpikeInterface:main Dec 14, 2023
10 of 11 checks passed

h-mayorquin deleted the fast_mode_hdf5_recording branch December 14, 2023 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hdf5 backend support for`NwbRecordingExtractor` #2298

Add hdf5 backend support for`NwbRecordingExtractor` #2298

h-mayorquin commented Dec 6, 2023 •

edited

Loading

magland commented Dec 6, 2023

h-mayorquin commented Dec 6, 2023 •

edited

Loading

h-mayorquin commented Dec 6, 2023

h-mayorquin commented Dec 7, 2023

h-mayorquin commented Dec 13, 2023 •

edited

Loading

h-mayorquin commented Dec 14, 2023

alejoe91 left a comment

alejoe91 Dec 14, 2023

h-mayorquin Dec 14, 2023

h-mayorquin Dec 14, 2023

alejoe91 Dec 14, 2023

h-mayorquin Dec 14, 2023

alejoe91 Dec 14, 2023

magland commented Dec 15, 2023

	sampling_frequency = 1 / np.median(np.diff(electrical_series["timestamps"][:1000]))
	sampling_frequency = 1 / np.median(np.diff(electrical_series["timestamps"][:samples_for_rate_estimation]))

Add hdf5 backend support forNwbRecordingExtractor #2298

Add hdf5 backend support forNwbRecordingExtractor #2298

Conversation

h-mayorquin commented Dec 6, 2023 • edited Loading

magland commented Dec 6, 2023

h-mayorquin commented Dec 6, 2023 • edited Loading

h-mayorquin commented Dec 6, 2023

h-mayorquin commented Dec 7, 2023

h-mayorquin commented Dec 13, 2023 • edited Loading

h-mayorquin commented Dec 14, 2023

alejoe91 left a comment

Choose a reason for hiding this comment

alejoe91 Dec 14, 2023

Choose a reason for hiding this comment

h-mayorquin Dec 14, 2023

Choose a reason for hiding this comment

h-mayorquin Dec 14, 2023

Choose a reason for hiding this comment

alejoe91 Dec 14, 2023

Choose a reason for hiding this comment

h-mayorquin Dec 14, 2023

Choose a reason for hiding this comment

alejoe91 Dec 14, 2023

Choose a reason for hiding this comment

magland commented Dec 15, 2023

Add hdf5 backend support for`NwbRecordingExtractor` #2298

Add hdf5 backend support for`NwbRecordingExtractor` #2298

h-mayorquin commented Dec 6, 2023 •

edited

Loading

h-mayorquin commented Dec 6, 2023 •

edited

Loading

h-mayorquin commented Dec 13, 2023 •

edited

Loading