-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: ZarrIO cannot resolve Builders #113
Comments
In
|
Recap: HDF5 (How HDMF works) I commented through read (which calls read_builder). At the end of the day, io.read() will return the file (in container form from the file builder). The question that still remains is whether the containers within have been "resolved to containers" or do the remain in builder form until called?
Questions:
HDMF_Zarr
This is me thinking out loud and will continue to dig. |
Within construct for
Once you have the mapper, we call construct (method in obj_mappper as well). It seems that the construct from obj_mapper return the actual new container. |
Thanks for investigating @mavaylon1 For datasets, loading of object_references is done here: Which in turn calls: hdmf-zarr/src/hdmf_zarr/backend.py Line 1175 in 6c13e14
to resolve the references. At the end of that function the references are being resolved by calling hdmf-zarr/src/hdmf_zarr/backend.py Lines 1206 to 1209 in 6c13e14
When resolving references, the corresponding builder will often have already been read from file, in which case hdmf-zarr/src/hdmf_zarr/backend.py Lines 1106 to 1109 in 6c13e14
hdmf-zarr/src/hdmf_zarr/backend.py Lines 1052 to 1055 in 6c13e14
Either way, this means we now have a dataset of Builders. After I.e., the Containers are being constructed for the Builders since they are being used elsewhere. However, the Dataset of object_references are not being resolved, i.e., we still have a dataset of Builders instead of a dataset of Containers for datasets with references. As I mentioned above:
However, while this may be the simplest approach, the disadvantage is that this still requires reading all datasets with object references. To avoid reading the data, it would be useful to:
However, this will be more involved to implement and it may be easiest to do 1. first and then implement 2. |
High Level
The idea is to implement what HDF5IO has into ZarrIO. Start with looking at h5tools read_dataset |
What happened?
Within HDMF (using HDF5) we wrap a dataset of references. When using NWBHDF5IO (which inherits from HDMFIO), when we read a file, the
read
method callsread_builder
.read_builder
takes the wrapped dataset of references and returns builders for each reference. These builders are resolved to containers; however, these containers are not "mapped/pointed" to the object/dataset/attribute within the file until a user makes a call, i.e., if a user calls nwbfile.electrodes[0] (then it will make the connection from "0" index electrode to the correct container).Within HDMF-Zarr, using ZarrIO, a different approach was taken. On read, we have a dataset of references (unwrapped). ZarrIO takes these references and makes a dataset of builders. These builders are not resolved to containers (not intended).
Steps to reproduce:
This will return an array of builders instead of the actual data.
(Notes on connected issue #105)
This was discovered when investigating a fix for #105 in which
when converting from zarr to hdf5.
This is due to the cache being cleared within export(). Why?
We are trying to match builders, which is an object, to a string ==> TypeError
Either way (1 or 2), we get a zarr file that does not resolve the builders to containers. The idea is once that is fixed, the issue from #105 (resulting from the cache) should be fixed (possibly).
Steps to Reproduce
Traceback
Operating System
Windows
Python Executable
Conda
Python Version
3.7
Package Versions
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: