-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using hidefix to determine byte ranges in HDF files? #38
Comments
Hi, That is definitely possible to do. hidefix has multiple implementations of readers, some cached, some direct (the fastest one, so no need to use the cached one), some async. My plan was to just add another reader based on S3 (#26). That would allow you to use the slicing code in hidefix and get some help from the traits. But you can also use the slicing code directly with only using the Using the faster HDF5 code for building the index makes a big difference on indexing speed, otherwise it might be nice to serialize the index. Regards, Gaute |
Thanks for the quick reply! Okay that's exciting. I'll have to try it out and see if I can get just the byte ranges from the |
I'm building VirtualiZarr, an evolution of kerchunk, that allows you to determine byte ranges of chunks in netCDF files, but then concatenate the virtual representation of those chunks using xarray's API.
This works by creating a
ChunkManifest
object in-memory (one per netCDF Variable per file initially), then defining ways to merge those manifests.What I'm wondering is if hidefix's
Index
class could be useful to me as a way to generate theChunkManifest
for a netCDF file without usingkerchunk
/fsspec
(see this issue). In other words I use hidefix only to determine the byte ranges, not for actually reading the data. (I plan to actually read the bytes later using the rustobject-store
crate, see zarr-developers/zarr-python#1661).Q's:
hidefix.Index
contain the byte range information I'm assuming it does?h5py
directly?cc @norlandrhagen
xref pydata/xarray#7446
The text was updated successfully, but these errors were encountered: