-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: reading unit table arrays via string keys much slower than reading as properties #141
Comments
Thanks for the helpful issue. I had to modify your script slightly to add anonymous access These calls are actually taking different code paths. If you print the type of the objects that are being sliced into via:
you should see:
I.e., Selection via slicing (e.g., As a result, the output from these operations is also different. When slicing into the I'll need to do some more in-depth profiling, but my guess is that the difference in timing between these operations is then likely due to:
The latter part may be something that could potentially be improved by reading all the data into memory first and then segmenting the array into its parts, rather than reading the array for each unit separately. Profiling script
|
Thanks @oruebel. This makes sense but it's pretty surprising behavior given the shape of the API. If there are opportunities to improve and standardize performance that would be great. |
@dyf Just FYI, I've created a separate issue for this here NeurodataWithoutBorders/nwb_benchmarks#13 We are working on setting up a nwb_benchmarks suite for performance tests, with an initial focus on cloud read. This is still early stage, but I just wanted to let you know that we are working on evaluating performance. |
What happened?
I am trying to read information from the units table in this file:
s3://aind-open-data/ecephys_625098_2022-08-15_08-51-36_nwb_2023-05-16_16-28-23/ecephys_625098_2022-08-15_08-51-36_nwb/ecephys_625098_2022-08-15_08-51-36_experiment1_recording1.nwb.zarr
.I notice a large difference in read times when accessing
units.spike_times
vsunits['spike_times']
:This is surprising behavior.
Steps to Reproduce
Traceback
Operating System
Linux
Python Executable
Conda
Python Version
3.9
Package Versions
I am using @alejoe91's branch for reading from S3:
https://github.com/alejoe91/hdmf-zarr/tree/fix-linking-in-remote-zarr
Code of Conduct
The text was updated successfully, but these errors were encountered: