-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
H5 context management #24
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the general direction.
track_order
may need special handling. If it is set, perhaps just distribute it to the File, Group, and Dataset as needed.
Consider allowing path to be empty or None
. In the case of empty, returning h5f
may make sense. None
is used to create anonymous datasets.
Anonymous datasets allow for temporary storage within the file, the create of spacers, and has been used by HDF5-EOS for storing some Zarr compat metadata.
"allow_unknown_filter", | ||
) | ||
|
||
H5_GROUP_KWDS = ("name", "track_order") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since track_order
is a common keyword to Datasets, Group's, and Files perhaps this parameter should be passed to all objects when they are created.
for key in H5_DATASET_KWDS: | ||
if key in file_kwargs: | ||
for key in kwargs: | ||
if key in H5_DATASET_KWDS: | ||
dataset_kwargs[key] = file_kwargs.pop(key) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider keeping track_order
as a File keyword. Also return Group keywords that may contain track_order
.
dataset_kwargs[key] = file_kwargs.pop(key) | ||
|
||
return file_kwargs, dataset_kwargs | ||
|
||
|
||
def access_h5( | ||
store: Pathlike, path: Pathlike, mode: str, **kwargs | ||
store: Union[h5py.File, Pathlike], path: Pathlike, **kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store: Union[h5py.File, Pathlike], path: Pathlike, **kwargs | |
store: Union[h5py.File, Pathlike], path: Union[Pathlike, NoneType], **kwargs |
None
is a valid dataset name and is used to create anonymous datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NoneType
returned in 3.10 https://docs.python.org/3/library/types.html?highlight=nonetype#types.NoneType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, how is that different from naming a dataset with the empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For one, providing an empty string results in a TypeError
in h5py:
In [22]: with h5py.File("test.hdf5", "w") as h5f:
...: h5f.create_dataset(None, data = np.zeros((5,5)))
...: print(list(h5f.keys()))
...:
...:
[]
In [23]: with h5py.File("test.hdf5", "w") as h5f:
...: h5f.create_dataset("", data = np.zeros((5,5)))
...: print(list(h5f.keys()))
...:
...:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-23-5230edf15b52> in <module>
1 with h5py.File("test.hdf5", "w") as h5f:
----> 2 h5f.create_dataset("", data = np.zeros((5,5)))
3 print(list(h5f.keys()))
4
5
~\.julia\conda\3\lib\site-packages\h5py\_hl\group.py in create_dataset(self, name, shape, dtype, data, **kwds)
147 group = self.require_group(parent_path)
148
--> 149 dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
150 dset = dataset.Dataset(dsid)
151 return dset
~\.julia\conda\3\lib\site-packages\h5py\_hl\dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, allow_unknown_filter)
140
141
--> 142 dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl)
143
144 if (data is not None) and (not isinstance(data, Empty)):
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\h5d.pyx in h5py.h5d.create()
TypeError: expected bytes, str found
In [24]: with h5py.File("test.hdf5", "w") as h5f:
...: h5f.create_dataset(b"", data = np.zeros((5,5)))
...: print(list(h5f.keys()))
...:
...:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-24-7f1869f34ebe> in <module>
1 with h5py.File("test.hdf5", "w") as h5f:
----> 2 h5f.create_dataset(b"", data = np.zeros((5,5)))
3 print(list(h5f.keys()))
4
5
~\.julia\conda\3\lib\site-packages\h5py\_hl\group.py in create_dataset(self, name, shape, dtype, data, **kwds)
147 group = self.require_group(parent_path)
148
--> 149 dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
150 dset = dataset.Dataset(dsid)
151 return dset
~\.julia\conda\3\lib\site-packages\h5py\_hl\dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, allow_unknown_filter)
140
141
--> 142 dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl)
143
144 if (data is not None) and (not isinstance(data, Empty)):
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\h5d.pyx in h5py.h5d.create()
ValueError: Unable to create dataset (no name given)
if "name" in dataset_kwargs: | ||
warnings.warn( | ||
'"Name" was provided to this function as a keyword argument. This value will be replaced with the second argument to this function.' | ||
) | ||
dataset_kwargs["name"] = path | ||
result = h5f.create_dataset(**dataset_kwargs) | ||
else: | ||
result = h5f.require_group(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow track_order
to be passed to the group as well.
else: | ||
h5f = h5py.File(store, **file_kwargs) | ||
|
||
if mode in ("r", "r+", "a"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps if path is empty, just return the File, h5f
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are in r+
or a
mode, perhaps the path does not exist because we are trying to create it. In those cases, perhaps you should catch the error.
result.attrs.update(**attrs) | ||
|
||
return result | ||
if isinstance(result, h5py.Group): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A File
is also a Group
. In the case of a File
, you do not need a new context manager.
**dataset_kwargs, **file_kwargs | ||
) | ||
assert file_kwargs == file_kwargs_out | ||
assert dataset_kwargs == dataset_kwargs_out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test for common keyword arguments like track_order
.
Here is the reference for the use of anonymous datasets to store Zarr chunk information: |
h5py.Dataset
andh5py.Group
to have__enter__
and__exit__
methods, where__exit__
closes the file.access_h5
can optionally take anh5py.File
instance as the first argument