Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy loading of zarr timestamps #3318

Merged
merged 3 commits into from
Aug 21, 2024

Conversation

alejoe91
Copy link
Member

This was previously discussed here: #2828

To sum up, loading and decompressing zarr timestamps for very long recordings can be quite time consuming, so we want to avoid doing that at init. When fetching the timestamps though, if they ar not a numpy array they are cast and cached as np.arryas, to avoid re-reading and re-decompressing at every call

@alejoe91 alejoe91 added the core Changes to core module label Aug 20, 2024
@alejoe91 alejoe91 requested a review from h-mayorquin August 20, 2024 13:25
Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is better. Zarr is not the format were we want the data to be read when called.

Can you add the typing to get_times() -> np.ndaarray, this will make it clear that is returning and in memory object and also we should add a docstring describing this behavior.

return self.time_vector
else:
return np.array(self.time_vector)
if not isinstance(self.time_vector, np.ndarray):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just always call np.asaray() which by default will just pass the data along if it is already and np.ndarray but will create a copy if it is hdf5, zarr or a memmap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! Great suggestion!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, you suggest doing this?

def get_times(self) -> np.ndarray:
        if self.time_vector is not None:
            self.time_vector = np.asarray(self.time_vector)
            return self.time_vector
        else:
            time_vector = np.arange(self.get_num_samples(), dtype="float64")
            time_vector /= self.sampling_frequency
            if self.t_start is not None:
                time_vector += self.t_start
            return time_vector

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then done in last commit :)

@alejoe91 alejoe91 merged commit a2f157c into SpikeInterface:main Aug 21, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants