Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read multiscale groups as datatrees #45

Merged
merged 16 commits into from
May 4, 2023
Merged

read multiscale groups as datatrees #45

merged 16 commits into from
May 4, 2023

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Apr 27, 2023

This PR brings the ability to use read_xarray on zarr / n5 groups to create an instance of datatree.DataTree, provided all the arrays in that group can be resolved to xarray.DataArrays.

Additionally, this PR fixes an error in cosem-flavored metadata handling.

@tlambert03 Once it's done, I'm going to bring this new read_xarray functionality to bear on #42

@tlambert03
Copy link
Contributor

cool, looks great! and will be handy in #42 for sure.
I'm just recovering from being away for a while, but look forward to digging back into #42 soon. And will be nice to have this to work with.

@d-v-b
Copy link
Contributor Author

d-v-b commented May 4, 2023

changes of note:

  • File-format specific code now lives in modules located at io/$format.py, e.g. io/zarr.py.
  • read_xarray(foo.suffix), defined in io/core.py, dispatches based on the suffix of its argument to a to_xarray function defined in the module for the format associated with that suffix. The same logic applies to io/core.py::to_dask and access. There no more dict of daskifiers or accessors.
  • the handling of attrs and name properties for dataarrays reading has been changed to conform to the behavior of xarray.DataArray construction. xarray.DataArray(foo) construction checks for attrs and name properties on foo, and if they exist, it uses those values. I copied this behavior in create_dataarray. Complications: MRCFile instances don't have .name or .attrs properties, so I use the filename and a dictified version of the mrc header instead. Also, the name property of a zarr array / group is its full path in its store object, but DataTree converts "/" separated strings into levels of hierarchy, which we don't want. So instead of using zarr.Array.name as the default name for DataArrays / DataTrees, I use zarr.Array.basename, which is the final component of the path to the array / group.

@d-v-b
Copy link
Contributor Author

d-v-b commented May 4, 2023

the core API changes are done, but the full extension of that API to formats like tif and .dat are either incomplete or untested, but that's acceptable and we can fix that later. going ahead with a merge.

@d-v-b d-v-b merged commit d41790e into main May 4, 2023
@d-v-b d-v-b deleted the xarray_groups branch May 4, 2023 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants