Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr: extract time vector once and for all! #2828

Merged
merged 2 commits into from
May 10, 2024

Conversation

alejoe91
Copy link
Member

If the zarr folder has timestamps, they were re-extracted and uncompressed every time the get_times() was called.

This PR loads the zarr array as a numpy array at instantiation to speed things up

@alejoe91 alejoe91 added bug Something isn't working core Changes to core module labels May 10, 2024
@alejoe91 alejoe91 requested a review from zm711 May 10, 2024 09:10
alejoe91 added a commit to alejoe91/spikeinterface that referenced this pull request May 10, 2024
@@ -72,7 +72,7 @@ def __init__(self, folder_path: Path | str, storage_options: dict | None = None)
time_kwargs = {}
time_vector = self._root.get(f"times_seg{segment_index}", None)
if time_vector is not None:
time_kwargs["time_vector"] = time_vector
time_kwargs["time_vector"] = time_vector[:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding the compression-decompression, but how does switching to a view change this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the [:] the time_vector is set to a zarr.Array object.

This object performs a decompression/retrieval from file upon slicing, so if I call get_times() 100 times, the data will be accessed from the file (and optionally decompressed) 100 times.

This simple change slices the array, so that time_kwargs["time_vector"] is now a numpy array and not a zarr object anymore.

Hope this is clear :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah that makes sense. Cool.

@alejoe91 alejoe91 merged commit cd55f91 into SpikeInterface:main May 10, 2024
11 checks passed
@samuelgarcia
Copy link
Member

As discuss with Alessio this morning.
Reading big vectors in the init is not a very good idea.
We must have a system that delay and cache the time vector reading. In short read the time vector only on demand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core Changes to core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants