-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy loading of zarr timestamps #3318
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is better. Zarr is not the format were we want the data to be read when called.
Can you add the typing to get_times()
-> np.ndaarray, this will make it clear that is returning and in memory object and also we should add a docstring describing this behavior.
return self.time_vector | ||
else: | ||
return np.array(self.time_vector) | ||
if not isinstance(self.time_vector, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just always call np.asaray() which by default will just pass the data along if it is already and np.ndarray but will create a copy if it is hdf5, zarr or a memmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! Great suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, you suggest doing this?
def get_times(self) -> np.ndarray:
if self.time_vector is not None:
self.time_vector = np.asarray(self.time_vector)
return self.time_vector
else:
time_vector = np.arange(self.get_num_samples(), dtype="float64")
time_vector /= self.sampling_frequency
if self.t_start is not None:
time_vector += self.t_start
return time_vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then done in last commit :)
This was previously discussed here: #2828
To sum up, loading and decompressing zarr timestamps for very long recordings can be quite time consuming, so we want to avoid doing that at init. When fetching the timestamps though, if they ar not a numpy array they are cast and cached as np.arryas, to avoid re-reading and re-decompressing at every call