-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inefficient reading of slices of a Dataset #6
Labels
Comments
I was looking into pyfive to read cloud-hosted data (we have Python file objects for S3, GCS, Azure) and was sad to learn that slicing doesn't happen cleverly. |
I believe the indexing code used in zarr could be adapted for use in pyfive to provide more efficient slicing. |
bnlawrence
pushed a commit
to NCAS-CMS/pyfive
that referenced
this issue
Feb 22, 2024
eventually, hopefully, address both the needs of pyactivestorage (which needs access to the b-tree chunk index) and jjhelmus#6
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When reading data from a Dataset,
pyfive
currently loads all chunks into memory before slicing the requested data. This behavior is inefficient when only a small region of the data is required which could be extracted from a small number or even a single chunk.The code used for slicing dask arrays may be helpful for determining which chunks need to be read for the given slice.
The text was updated successfully, but these errors were encountered: