Zarr: extract time vector once and for all! #2828

alejoe91 · 2024-05-10T09:10:19Z

If the zarr folder has timestamps, they were re-extracted and uncompressed every time the get_times() was called.

This PR loads the zarr array as a numpy array at instantiation to speed things up

zm711 · 2024-05-10T11:24:32Z

src/spikeinterface/core/zarrextractors.py

@@ -72,7 +72,7 @@ def __init__(self, folder_path: Path | str, storage_options: dict | None = None)
            time_kwargs = {}
            time_vector = self._root.get(f"times_seg{segment_index}", None)
            if time_vector is not None:
-                time_kwargs["time_vector"] = time_vector
+                time_kwargs["time_vector"] = time_vector[:]


Maybe I'm misunderstanding the compression-decompression, but how does switching to a view change this?

Without the [:] the time_vector is set to a zarr.Array object.

This object performs a decompression/retrieval from file upon slicing, so if I call get_times() 100 times, the data will be accessed from the file (and optionally decompressed) 100 times.

This simple change slices the array, so that time_kwargs["time_vector"] is now a numpy array and not a zarr object anymore.

Hope this is clear :)

Oh yeah that makes sense. Cool.

samuelgarcia · 2024-05-13T13:45:02Z

As discuss with Alessio this morning.
Reading big vectors in the init is not a very good idea.
We must have a system that delay and cache the time vector reading. In short read the time vector only on demand.

Zarr: extract time vector once and for all!

7fcfb76

alejoe91 added bug Something isn't working core Changes to core module labels May 10, 2024

alejoe91 requested a review from zm711 May 10, 2024 09:10

alejoe91 added a commit to alejoe91/spikeinterface that referenced this pull request May 10, 2024

Propagate SpikeInterface#2828

696e6e1

alejoe91 mentioned this pull request May 10, 2024

Prepare 0.100.7 release #2827

Merged

test_templates -> test_templates.zarr

37a01d3

zm711 reviewed May 10, 2024

View reviewed changes

zm711 approved these changes May 10, 2024

View reviewed changes

alejoe91 merged commit cd55f91 into SpikeInterface:main May 10, 2024
11 checks passed

h-mayorquin mentioned this pull request May 15, 2024

[Discussion] Add method to force loading Templates into memory #2853

Draft

alejoe91 mentioned this pull request Aug 20, 2024

Lazy loading of zarr timestamps #3318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zarr: extract time vector once and for all! #2828

Zarr: extract time vector once and for all! #2828

alejoe91 commented May 10, 2024

zm711 May 10, 2024

alejoe91 May 10, 2024

zm711 May 10, 2024

samuelgarcia commented May 13, 2024

Zarr: extract time vector once and for all! #2828

Zarr: extract time vector once and for all! #2828

Conversation

alejoe91 commented May 10, 2024

zm711 May 10, 2024

Choose a reason for hiding this comment

alejoe91 May 10, 2024

Choose a reason for hiding this comment

zm711 May 10, 2024

Choose a reason for hiding this comment

samuelgarcia commented May 13, 2024