Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while reading a dandiset using NWBHDF5IO that has ImagingVolume #126

Open
craterkamath opened this issue Jan 17, 2024 · 5 comments
Open
Labels
bug Something isn't working

Comments

@craterkamath
Copy link

craterkamath commented Jan 17, 2024

Bug description

I'm trying to read stream/download dandisets from dandihub and the ones that have ImagingVolume, for example DANDI:000776, throw the below error:

Traceback (most recent call last):
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/build/objectmapper.py", line 1258, in construct
obj = self.new_container(cls, builder.source, parent, builder.attributes.get(self.__spec.id_key()),
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/build/objectmapper.py", line 1271, in new_container
obj.init(**kwargs)
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/utils.py", line 664, in func_call
return func(args[0], **pargs)
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/build/classgenerator.py", line 339, in init
setattr(self, f, arg_val)
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/container.py", line 528, in container_setter
ret[idx2](self, val) # call the previous setter
File "/home/vinayaka/anaconda3/envs/dandi/lib/python3.8/site-packages/hdmf/container.py", line 518, in container_setter
raise ValueError(msg)
ValueError: Field 'order_optical_channels' on ImagingVolume must be named 'order_optical_channels'.

The error is not present when I try to read other datasets that do not have ImagingVolumes.

How to reproduce

I'm using the below code to read the dandiset

from dandi.dandiapi import DandiAPIClient
import pynwb
import h5py
from pynwb import NWBHDF5IO
import remfile

dandi_id = '000776'
with DandiAPIClient() as client:
    dandiset = client.get_dandiset(dandi_id, 'draft')
    for asset in dandiset.get_assets():
        s3_url = asset.get_content_url(follow_redirects=1, strip_query=True)
        file = remfile.File(s3_url)

        with h5py.File(file, 'r') as f:
            with NWBHDF5IO(file=f, mode='r',load_namespaces=True) as io:
                read_nwb = io.read()
                identifier = read_nwb.identifier
                seg = read_nwb.processing['NeuroPAL']['NeuroPALSegmentation']['NeuroPALNeurons'].voxel_mask[:]
                labels = read_nwb.processing['NeuroPAL']['NeuroPALSegmentation']['NeuroPALNeurons']['ID_labels'][:]
                channels = read_nwb.acquisition['NeuroPALImageRaw'].RGBW_channels[:] #get which channels of the image correspond to which RGBW pseudocolors
                image = read_nwb.acquisition['NeuroPALImageRaw'].data[:]
                scale = read_nwb.imaging_planes['NeuroPALImVol'].grid_spacing[:] #get which channels of the image correspond to which RGBW pseudocolors
                imvol = read_nwb.imaging_planes['NeuroPALImVol']
                print(imvol)
        print(identifier)
        break

Your personal set up

OS:

  • Ubuntu 20.04.1

My package versions are below:

dandi==0.59.0
dandischema==0.8.4
pynwb==2.5.0
hdmf==3.11.0

Python environment to reproduce:

aiohttp==3.9.1
aiosignal==1.3.1
appdirs==1.4.4
arrow==1.3.0
asciitree==0.3.3
async-timeout==4.0.3
attrs==23.2.0
bidsschematools==0.7.2
blessed==1.20.0
boto3==1.34.20
botocore==1.34.20
certifi @ file:///croot/certifi_1700501669400/work/certifi
cffi==1.16.0
charset-normalizer==3.3.2
ci-info==0.3.0
click==8.1.7
click-didyoumean==0.3.0
cryptography==41.0.7
dandi==0.59.0
dandischema==0.8.4
dnspython==2.4.2
email-validator==2.1.0.post1
etelemetry==0.3.1
fasteners==0.19
fqdn==1.5.1
frozenlist==1.4.1
fscacher==0.4.0
fsspec==2023.12.2
h5py==3.10.0
hdmf==3.11.0
humanize==4.9.0
idna==3.6
importlib-metadata==7.0.1
importlib-resources==6.1.1
interleave==0.2.1
isodate==0.6.1
isoduration==20.11.0
jaraco.classes==3.3.0
jeepney==0.8.0
jmespath==1.0.1
joblib==1.3.2
jsonpointer==2.4
jsonschema==4.21.0
jsonschema-specifications==2023.12.1
keyring==24.3.0
keyrings.alt==5.0.0
more-itertools==10.2.0
multidict==6.0.4
natsort==8.4.0
numcodecs==0.12.1
numpy==1.24.4
nwbinspector==0.4.31
packaging==23.2
pandas==2.0.3
pkgutil_resolve_name==1.3.10
platformdirs==4.1.0
pycparser==2.21
pycryptodomex==3.20.0
pydantic==1.10.13
pynwb==2.5.0
pyout==0.7.3
python-dateutil==2.8.2
pytz==2023.3.post1
PyYAML==6.0.1
referencing==0.32.1
remfile==0.1.10
requests==2.31.0
rfc3339-validator==0.1.4
rfc3987==1.3.8
rpds-py==0.17.1
ruamel.yaml==0.18.5
ruamel.yaml.clib==0.2.8
s3fs==0.4.2
s3transfer==0.10.0
scipy==1.10.1
SecretStorage==3.3.3
semantic-version==2.10.0
six==1.16.0
tenacity==8.2.3
tqdm==4.66.1
types-python-dateutil==2.8.19.20240106
typing_extensions==4.9.0
tzdata==2023.4
uri-template==1.3.0
urllib3==1.26.18
wcwidth==0.2.13
webcolors==1.13
yarl==1.9.4
zarr==2.16.1
zarr-checksum==0.2.12
zipp==3.17.0
@craterkamath craterkamath added the bug Something isn't working label Jan 17, 2024
@CodyCBakerPhD
Copy link

attn: @rly

Perhaps an extension issue?

@rly
Copy link

rly commented Jan 19, 2024

Hi @craterkamath , the issue appears to be with the NWB file.

The spec for ImagingVolume says:

{
      "neurodata_type_def": "ImagingVolume",
      "neurodata_type_inc": "ImagingPlane",
      "doc": "An Imaging Volume and its Metadata",
      "groups": [
        {
          "doc": "An optical channel used to record from an imaging volume",
          "quantity": "*",
          "neurodata_type_inc": "OpticalChannelPlus"
        },
        {
          "doc": "Ordered list of names of the optical channels in the data",
          "name": "order_optical_channels",
          "neurodata_type_inc": "OpticalChannelReferences"
        }
      ]
    },

which specifies that ImagingVolume has a subgroup of neurodata type OpticalChannelReferences with required name "order_optical_channels". However, in these files, the ImagingVolume objects contain a link to a group of neurodata type OpticalChannelReferences with name "OpticalChannelRefs" (it lives at the path /processing/NeuroPAL/OpticalChannelRefs). This means the file does not conform to the extension spec. Unfortunately, this is most likely an oversight by PyNWB to allow the file to be created in this way and a bug in the validator not to catch this. I will create tickets on PyNWB for these issues.

@dysprague Did you create these files? I think the ones with ImagingVolume objects will have to be fixed to be valid NWB files and readable by the NWB APIs. Sorry. You can do that by either:

  1. adjusting the script to used to generate these files and re-generating the files from scratch, and/or
  2. performing "data surgery" to move and rename the group in the right place for these files, without re-generating the files from scratch.

I know these are big files so doing both options may be best. (1 for fixing this in future runs of the script and 2 to fix existing files quickly). I can help with either option.

attn @oruebel @bendichter in case you see this issue elsewhere

@dysprague
Copy link

Hi @rly Thanks for the help on this. What you're saying mostly makes sense to me but I had a few questions.

I am able to open these files completely fine on my own laptop. There is also dandiest 000692 created by Kotaro Kimura using the same spec which I am also able to open fine. The other thing that is confusing to me is that in the Spec, the MultiChannelVolume object also has a subgroup 'order_optical_channels' which is defined and set in the exact same way as it is for the ImagingVolume, so I'm not sure why the error is only being thrown for the ImagingVolume object.

When creating the 'ImagingVolume' object, how would I add the OpticalChannelsReferences object as a subgroup rather than as a link?

I can definitely update the code used to generate these files, but as you said these files are large so might be better to perform targeted data updates rather than fully regenerating files. I would appreciate some help figuring out how to do that.

Thanks,
Daniel

@rly
Copy link

rly commented Feb 2, 2024

To follow up, @dysprague and I connected over Slack. @dysprague adjusted the script and ndx-multichannel-volume extension used to generate the files. I wrote a script to do the following data surgery steps for existing files:

  1. Replace the "order_optical_channels" link from all ImagingVolume objects with a subgroup
    that is the link target (at /processing/NeuroPAL/OpticalChannelRefs)
  2. Add the "ndx-multichannel-volume" version 0.1.12 schema
  3. Remove the "ndx-multichannel-volume" version 0.1.9 schema
  4. Remove the "order_optical_channels" group from all "MultiChannelVolume" objects
  5. Remove the "OpticalChannelRefs" group within "/processing"

The next steps are to run a script that checks each NWB file in dandiset 000776 for this issue and for each of those files, download the file, run the above script, and re-upload the file.

We will also want to adjust the NWB files in dandisets 000715, 000565, 000541, 000472, 000714, and possibly 000692.

This PR in HDMF hdmf-dev/hdmf#1050 will catch these errors in the future during validation. I opened an issue in HDMF hdmf-dev/hdmf#1051 to note that these name mismatch issues, or more generally, all validation issues, should raise an error on write.

@craterkamath
Copy link
Author

Thanks @rly and @dysprague . Hoping to see the updated dataset on dandi Archive soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants