[Feature]: write `xarray`-compatible Zarr files #176

alejoe91 · 2024-03-21T08:21:34Z

What would you like to see added to HDMF-ZARR?

Xarray supports the Zarr backend, but requires the _ARRAY_DIMENSIONS attribute to be set with a list of names for the array dimensions (e.g. [samples, channels]) - see https://docs.xarray.dev/en/stable/internals/zarr-encoding-spec.html#zarr-encoding

It would be great to add these attributes as default for known data types (e.g. ElectricalSeries)

@jsiegle

Is your feature request related to a problem?

NWB-Zarr files cannot be opened by xarray.open_zarr

What solution would you like?

Adding the _ARRAY_DIMENSIONS attributes to all "known" neurodata_types

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this change was not already requested?

The text was updated successfully, but these errors were encountered:

bendichter · 2024-03-21T16:41:28Z

from xarray import DataArray
import numpy as np
from pynwb.testing.mock.ecephys import mock_ElectricalSeries
from h5py import Dataset
from pynwb import get_type_map
import json

dset_types = (np.ndarray, Dataset)  # etc.


def get_dimension_labels(cls, ndims, dataset_name):
    spec = get_type_map().namespace_catalog.get_spec(cls.namespace, cls.neurodata_type)
    data = next(x for x in spec["datasets"] if x["name"] == dset_name)
    dims = data["dims"]
    
    if isinstance(dims[0], str):  # only one shape spec
        return dims
    
    for i_dims in  dims:
        if len(i_dims) == ndims:
            return i_dims
        
def load_dset_as_xarray(obj, dset_name):
    dset = obj.fields[dset_name]
    cls = obj.__class__
    ndims = len(dset.shape)
    
    dim_labels = get_dimension_labels(cls, ndims, dset_name)
    
    coords = dict(num_channels=electrical_series.electrodes.data)
    if obj.timestamps is not None:
        coords.update(num_times=obj.timestamps)
        
    attrs = {k: v for k, v in obj.fields.items() if not isinstance(v, dset_types)}
    return DataArray(dset, dims=dim_labels, coords=coords, attrs=attrs)
    
        

electrical_series = mock_ElectricalSeries(timestamps=np.arange(10), rate=None)

load_dset_as_xarray(electrical_series, "data")

bendichter · 2024-03-21T16:43:18Z

^ Code that solves a related problem and might be helpful

oruebel · 2024-03-21T17:40:42Z

^ Code that solves a related problem and might be helpful

I think this could be useful to have in some form available in PyNWB. Maybe as a utility method and/or as a method on TimeSeries, since a common use-case for this is probably representing TimeSeries.data as xarray.

oruebel · 2024-03-21T18:01:12Z

It would be great to add these attributes as default for known data types (e.g. ElectricalSeries)

Adding the _ARRAY_DIMENSIONS attribute for cases where we know the dimensions seems like a good idea 👍

oruebel · 2024-03-21T18:03:43Z

In terms of implementation, I think this will require changes in HDMF as well. Here a rough plan of how this could be implemented:

Add dimension_labels as an attribute on the DatasetBuilder(which may be None of the labels are unknown) https://github.com/hdmf-dev/hdmf/blob/5c8506216995f995b891da1e6b596ee42b7dd948/src/hdmf/build/builders.py#L321
Enhance BuildManger.build to set the dimension_labels for DatasetBuilders https://github.com/hdmf-dev/hdmf/blob/5c8506216995f995b891da1e6b596ee42b7dd948/src/hdmf/build/manager.py#L148
Update ZarrIO.write_dataset to add the _ARRAY_DIMENSIONS to the attributes of the dataset if builder.dimension_labels are present.

hdmf-zarr/src/hdmf_zarr/backend.py

Line 954 in 6e946da

attributes = builder.attributes

@rly does that plan sound reasonable or what this also require changes in the ObjectMapper to determine the dimensions there instead of in the BuildManager?

rly · 2024-03-23T05:38:59Z

@rly does that plan sound reasonable or what this also require changes in the ObjectMapper to determine the dimensions there instead of in the BuildManager?

That sounds reasonable, except that all the building / creation of DatasetBuilder objects happen in the ObjectMapper. We'd probably do it in __add_attributes or the constructor.

oruebel · 2024-03-24T20:07:10Z

@mavaylon1 could you take a look at this?

rly · 2024-03-24T20:12:34Z

I'll work on the HDMF side

mavaylon1 · 2024-03-24T21:42:43Z

Can do

rly mentioned this issue Mar 21, 2024

Document special attributes and mapping NeurodataWithoutBorders/lindi#13

Open

oruebel assigned mavaylon1 Mar 24, 2024

rly mentioned this issue Mar 24, 2024

[Feature]: Store dimension labels in DatasetBuilder for backends to write hdmf-dev/hdmf#1077

Closed

mavaylon1 added this to the Future milestone Apr 6, 2024

rly modified the milestones: Future, Next Release Apr 11, 2024

mavaylon1 mentioned this issue Jul 24, 2024

Xarray Support #207

Merged

6 tasks

mavaylon1 closed this as completed in #207 Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: write `xarray`-compatible Zarr files #176

[Feature]: write `xarray`-compatible Zarr files #176

alejoe91 commented Mar 21, 2024

bendichter commented Mar 21, 2024 •

edited

Loading

bendichter commented Mar 21, 2024

oruebel commented Mar 21, 2024

oruebel commented Mar 21, 2024

oruebel commented Mar 21, 2024

rly commented Mar 23, 2024 •

edited

Loading

oruebel commented Mar 24, 2024

rly commented Mar 24, 2024

mavaylon1 commented Mar 24, 2024

[Feature]: write xarray-compatible Zarr files #176

[Feature]: write xarray-compatible Zarr files #176

Comments

alejoe91 commented Mar 21, 2024

What would you like to see added to HDMF-ZARR?

Is your feature request related to a problem?

What solution would you like?

Do you have any interest in helping implement the feature?

Code of Conduct

bendichter commented Mar 21, 2024 • edited Loading

bendichter commented Mar 21, 2024

oruebel commented Mar 21, 2024

oruebel commented Mar 21, 2024

oruebel commented Mar 21, 2024

rly commented Mar 23, 2024 • edited Loading

oruebel commented Mar 24, 2024

rly commented Mar 24, 2024

mavaylon1 commented Mar 24, 2024

[Feature]: write `xarray`-compatible Zarr files #176

[Feature]: write `xarray`-compatible Zarr files #176

bendichter commented Mar 21, 2024 •

edited

Loading

rly commented Mar 23, 2024 •

edited

Loading