Error while reading a dandiset using NWBHDF5IO that has ImagingVolume #126

craterkamath · 2024-01-17T00:46:29Z

Bug description

I'm trying to read stream/download dandisets from dandihub and the ones that have ImagingVolume, for example DANDI:000776, throw the below error:

Traceback (most recent call last):
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/build/objectmapper.py", line 1258, in construct
obj = self.new_container(cls, builder.source, parent, builder.attributes.get(self.__spec.id_key()),
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/build/objectmapper.py", line 1271, in new_container
obj.init(**kwargs)
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/utils.py", line 664, in func_call
return func(args[0], **pargs)
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/build/classgenerator.py", line 339, in init
setattr(self, f, arg_val)
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/container.py", line 528, in container_setter
ret[idx2](self, val) # call the previous setter
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/container.py", line 518, in container_setter
raise ValueError(msg)
ValueError: Field 'order_optical_channels' on ImagingVolume must be named 'order_optical_channels'.

The error is not present when I try to read other datasets that do not have ImagingVolumes.

How to reproduce

I'm using the below code to read the dandiset

from dandi.dandiapi import DandiAPIClient
import pynwb
import h5py
from pynwb import NWBHDF5IO
import remfile

dandi_id = '000776'
with DandiAPIClient() as client:
    dandiset = client.get_dandiset(dandi_id, 'draft')
    for asset in dandiset.get_assets():
        s3_url = asset.get_content_url(follow_redirects=1, strip_query=True)
        file = remfile.File(s3_url)

        with h5py.File(file, 'r') as f:
            with NWBHDF5IO(file=f, mode='r',load_namespaces=True) as io:
                read_nwb = io.read()
                identifier = read_nwb.identifier
                seg = read_nwb.processing['NeuroPAL']['NeuroPALSegmentation']['NeuroPALNeurons'].voxel_mask[:]
                labels = read_nwb.processing['NeuroPAL']['NeuroPALSegmentation']['NeuroPALNeurons']['ID_labels'][:]
                channels = read_nwb.acquisition['NeuroPALImageRaw'].RGBW_channels[:] #get which channels of the image correspond to which RGBW pseudocolors
                image = read_nwb.acquisition['NeuroPALImageRaw'].data[:]
                scale = read_nwb.imaging_planes['NeuroPALImVol'].grid_spacing[:] #get which channels of the image correspond to which RGBW pseudocolors
                imvol = read_nwb.imaging_planes['NeuroPALImVol']
                print(imvol)
        print(identifier)
        break

Your personal set up

OS:

Ubuntu 20.04.1

My package versions are below:

dandi==0.59.0
dandischema==0.8.4
pynwb==2.5.0
hdmf==3.11.0

Python environment to reproduce:

aiohttp==3.9.1
aiosignal==1.3.1
appdirs==1.4.4
arrow==1.3.0
asciitree==0.3.3
async-timeout==4.0.3
attrs==23.2.0
bidsschematools==0.7.2
blessed==1.20.0
boto3==1.34.20
botocore==1.34.20
certifi @ file:///croot/certifi_1700501669400/work/certifi
cffi==1.16.0
charset-normalizer==3.3.2
ci-info==0.3.0
click==8.1.7
click-didyoumean==0.3.0
cryptography==41.0.7
dandi==0.59.0
dandischema==0.8.4
dnspython==2.4.2
email-validator==2.1.0.post1
etelemetry==0.3.1
fasteners==0.19
fqdn==1.5.1
frozenlist==1.4.1
fscacher==0.4.0
fsspec==2023.12.2
h5py==3.10.0
hdmf==3.11.0
humanize==4.9.0
idna==3.6
importlib-metadata==7.0.1
importlib-resources==6.1.1
interleave==0.2.1
isodate==0.6.1
isoduration==20.11.0
jaraco.classes==3.3.0
jeepney==0.8.0
jmespath==1.0.1
joblib==1.3.2
jsonpointer==2.4
jsonschema==4.21.0
jsonschema-specifications==2023.12.1
keyring==24.3.0
keyrings.alt==5.0.0
more-itertools==10.2.0
multidict==6.0.4
natsort==8.4.0
numcodecs==0.12.1
numpy==1.24.4
nwbinspector==0.4.31
packaging==23.2
pandas==2.0.3
pkgutil_resolve_name==1.3.10
platformdirs==4.1.0
pycparser==2.21
pycryptodomex==3.20.0
pydantic==1.10.13
pynwb==2.5.0
pyout==0.7.3
python-dateutil==2.8.2
pytz==2023.3.post1
PyYAML==6.0.1
referencing==0.32.1
remfile==0.1.10
requests==2.31.0
rfc3339-validator==0.1.4
rfc3987==1.3.8
rpds-py==0.17.1
ruamel.yaml==0.18.5
ruamel.yaml.clib==0.2.8
s3fs==0.4.2
s3transfer==0.10.0
scipy==1.10.1
SecretStorage==3.3.3
semantic-version==2.10.0
six==1.16.0
tenacity==8.2.3
tqdm==4.66.1
types-python-dateutil==2.8.19.20240106
typing_extensions==4.9.0
tzdata==2023.4
uri-template==1.3.0
urllib3==1.26.18
wcwidth==0.2.13
webcolors==1.13
yarl==1.9.4
zarr==2.16.1
zarr-checksum==0.2.12
zipp==3.17.0

The text was updated successfully, but these errors were encountered:

CodyCBakerPhD · 2024-01-17T19:20:54Z

attn: @rly

Perhaps an extension issue?

rly · 2024-01-19T01:52:13Z

Hi @craterkamath , the issue appears to be with the NWB file.

The spec for ImagingVolume says:

{
      "neurodata_type_def": "ImagingVolume",
      "neurodata_type_inc": "ImagingPlane",
      "doc": "An Imaging Volume and its Metadata",
      "groups": [
        {
          "doc": "An optical channel used to record from an imaging volume",
          "quantity": "*",
          "neurodata_type_inc": "OpticalChannelPlus"
        },
        {
          "doc": "Ordered list of names of the optical channels in the data",
          "name": "order_optical_channels",
          "neurodata_type_inc": "OpticalChannelReferences"
        }
      ]
    },

which specifies that ImagingVolume has a subgroup of neurodata type OpticalChannelReferences with required name "order_optical_channels". However, in these files, the ImagingVolume objects contain a link to a group of neurodata type OpticalChannelReferences with name "OpticalChannelRefs" (it lives at the path /processing/NeuroPAL/OpticalChannelRefs). This means the file does not conform to the extension spec. Unfortunately, this is most likely an oversight by PyNWB to allow the file to be created in this way and a bug in the validator not to catch this. I will create tickets on PyNWB for these issues.

@dysprague Did you create these files? I think the ones with ImagingVolume objects will have to be fixed to be valid NWB files and readable by the NWB APIs. Sorry. You can do that by either:

adjusting the script to used to generate these files and re-generating the files from scratch, and/or
performing "data surgery" to move and rename the group in the right place for these files, without re-generating the files from scratch.

I know these are big files so doing both options may be best. (1 for fixing this in future runs of the script and 2 to fix existing files quickly). I can help with either option.

attn @oruebel @bendichter in case you see this issue elsewhere

dysprague · 2024-01-22T18:30:15Z

Hi @rly Thanks for the help on this. What you're saying mostly makes sense to me but I had a few questions.

I am able to open these files completely fine on my own laptop. There is also dandiest 000692 created by Kotaro Kimura using the same spec which I am also able to open fine. The other thing that is confusing to me is that in the Spec, the MultiChannelVolume object also has a subgroup 'order_optical_channels' which is defined and set in the exact same way as it is for the ImagingVolume, so I'm not sure why the error is only being thrown for the ImagingVolume object.

When creating the 'ImagingVolume' object, how would I add the OpticalChannelsReferences object as a subgroup rather than as a link?

I can definitely update the code used to generate these files, but as you said these files are large so might be better to perform targeted data updates rather than fully regenerating files. I would appreciate some help figuring out how to do that.

Thanks,
Daniel

rly · 2024-02-02T08:35:21Z

To follow up, @dysprague and I connected over Slack. @dysprague adjusted the script and ndx-multichannel-volume extension used to generate the files. I wrote a script to do the following data surgery steps for existing files:

Replace the "order_optical_channels" link from all ImagingVolume objects with a subgroup
that is the link target (at /processing/NeuroPAL/OpticalChannelRefs)
Add the "ndx-multichannel-volume" version 0.1.12 schema
Remove the "ndx-multichannel-volume" version 0.1.9 schema
Remove the "order_optical_channels" group from all "MultiChannelVolume" objects
Remove the "OpticalChannelRefs" group within "/processing"

The next steps are to run a script that checks each NWB file in dandiset 000776 for this issue and for each of those files, download the file, run the above script, and re-upload the file.

We will also want to adjust the NWB files in dandisets 000715, 000565, 000541, 000472, 000714, and possibly 000692.

This PR in HDMF hdmf-dev/hdmf#1050 will catch these errors in the future during validation. I opened an issue in HDMF hdmf-dev/hdmf#1051 to note that these name mismatch issues, or more generally, all validation issues, should raise an error on write.

craterkamath · 2024-02-05T03:35:14Z

Thanks @rly and @dysprague . Hoping to see the updated dataset on dandi Archive soon!

craterkamath added the bug Something isn't working label Jan 17, 2024

rly mentioned this issue Feb 2, 2024

[Bug]: Error not raised on write if container fields do not match required names in spec hdmf-dev/hdmf#1051

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while reading a dandiset using NWBHDF5IO that has ImagingVolume #126

Error while reading a dandiset using NWBHDF5IO that has ImagingVolume #126

craterkamath commented Jan 17, 2024 •

edited

Loading

CodyCBakerPhD commented Jan 17, 2024

rly commented Jan 19, 2024 •

edited

Loading

dysprague commented Jan 22, 2024

rly commented Feb 2, 2024

craterkamath commented Feb 5, 2024

Error while reading a dandiset using NWBHDF5IO that has ImagingVolume #126

Error while reading a dandiset using NWBHDF5IO that has ImagingVolume #126

Comments

craterkamath commented Jan 17, 2024 • edited Loading

Bug description

How to reproduce

Your personal set up

CodyCBakerPhD commented Jan 17, 2024

rly commented Jan 19, 2024 • edited Loading

dysprague commented Jan 22, 2024

rly commented Feb 2, 2024

craterkamath commented Feb 5, 2024

craterkamath commented Jan 17, 2024 •

edited

Loading

rly commented Jan 19, 2024 •

edited

Loading