-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify zarr codec when creating a dataset #35
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #35 +/- ##
==========================================
- Coverage 82.33% 81.53% -0.80%
==========================================
Files 25 25
Lines 1715 1755 +40
==========================================
+ Hits 1412 1431 +19
- Misses 303 324 +21 ☔ View full report in Codecov by Sentry. |
This looks reasonable to me, but how this would be used? It seems like this would be used for write outside of PyNWB, which is fine, but I am not totally seeing the use case. |
@rly I am thinking specifically about three codecs I would like to use. It's true that this would not be possible via pynwb at this point (it would need to be manual creating of datasets) but maybe at some point in the future this could be somehow supported by pynwb. The three codecs are:
tag @bendichter |
With the |
note: made a plan with ryan... |
After chatting with @rly , we decided not to introduce a new special function, but to allow the compression parameter (which is supported by h5py) to be either a string or a numcodecs Codec. In the case of a Codec, this is passed through to zarr as the compressor parameter. In the case of a string, the following logic is used: elif isinstance(compression, str):
if compression == 'gzip':
if compression_opts is None:
level = 4 # default for h5py
elif isinstance(compression_opts, int):
level = compression_opts
else:
raise Exception(f'Unexpected type for compression_opts: {type(compression_opts)}')
_zarr_compressor = numcodecs.GZip(level=level)
else:
raise Exception(f'Compression {compression} is not supported') Then _zarr_compressor is passed through to zarr. |
Looks good. We can add support for other h5py compressions, including from hdf5plugin, later as needed. |
The goal here is to allow writing datasets with Zarr-supported codecs that are not available with hdf5 (e.g., zstd and custom codecs). Obviously, the h5py API does not have such a mechanism.
I considered adding a
zarr_compressor
parameter tocreate_dataset()
but I thought it's best not to make the interface of that function differ between LindiH5pyFile and h5py.File. So instead I made a new function calledcreate_dataset_with_zarr_compressor(..., compressor=codec)