-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: [0.7.0, 0.8.0]
Fails to open file with consolidated metadata from S3
#205
Comments
I am able to open a file with consolidated metadata using s3. I'll look into this. |
Open works, but read does not for a certain file, but the error is not the same. I do not believe it is related. |
@bjhardcastle I'm not able to reproduce the same error. Is it possible for you to create a public example? (a zarr file I can use) |
We have tests that read consolidate metadata files that are passing so I am not sure what you are facing. |
@bjhardcastle I think this is due to some On our end, all export functions have been now updated to force |
@mavaylon1 Are you talking about these tests? They aren't sufficient:
Here's an MRE with reading added to your test: # python -m venv .venv-hdmf-test
# .venv-hdmf-test/scripts/activate (Windows)
# source .venv-hdmf-test/bin/activate (Linux)
# python -m ensurepip
# python -m pip install hdmf-zarr fsspec s3fs
import hdmf_zarr
import zarr
s3_path = "https://dandiarchive.s3.amazonaws.com/zarr/ccefbc9f-30e7-4a4c-b044-5b59d300040b/"
with hdmf_zarr.NWBZarrIO(s3_path, mode='r') as read_io:
read_io.open()
assert isinstance(read_io.file.store, zarr.storage.ConsolidatedMetadataStore)
try:
# this fails:
nwb = read_io.read()
except Exception as exc:
print(repr(exc))
# ValueError: No data_type found for builder root
with hdmf_zarr.NWBZarrIO(s3_path, mode='-r') as read_io:
read_io.open()
assert isinstance(read_io.file.store, zarr.storage.FSStore)
try:
# this fails:
nwb = read_io.read()
except Exception as exc:
print(repr(exc))
# hdmf.backends.errors.UnsupportedOperation: Cannot build data. There are no values.
# the zarr file is empty:
z = zarr.open(s3_path, mode='r')
assert not tuple(z.keys()) |
@bjhardcastle I am talking about our roundtrip tests, such as https://github.com/hdmf-dev/hdmf-zarr/blob/8ca578733db5078d1ff6d2dfb47407a680c58caf/tests/unit/base_tests_zarrio.py#L331C9-L331C40 In hdmf-zarr our default is to consolidate metadata, the test passes to read a file with a zarr.storage.ConsolidatedMetadataStore. Thanks for the code. I will look into this. |
Is that writing to an S3 bucket though? |
@bjhardcastle No, but my point is that locally ConsolidatedMetadata works. We don't have a read test and I couldn't reproduce it without a s3 file. But you gave one, so this should be enough for me to reproduce the error and fix it. :) |
Yes it works locally because When the file is remote, The file I gave you is the one from your tests - but it's empty. |
I have to reproduce the error to comment on your findings regarding is_remote(). I will see what I can do to help on the DANDI side |
@bjhardcastle At any point did you export? |
Export what? |
Did you export a file to get to your current file. |
Sorry I don't know what you mean. Could you be more specific? |
Can't you see that the code doesn't make sense? The type of zarr object is not an indicator of whether the data is remote or not. As a result, lines like this make paths that are completely wrong: hdmf-zarr/src/hdmf_zarr/backend.py Lines 719 to 721 in 8ca5787
|
That could be the case. I have not taken a deep look at what the issue is. I usually like to reproduce the error so that any changes I make to address that error can be tested. In any case, I will find some time next week to look at this as I have higher priority issues I've already committed to. Let me know when you figure out uploading your file. I will look into this as well. Within NWB you can edit files and export them into a new file. If you did not make the file and are unsure, that is fine. It could be helpful to know, and I would suggest finding out if possible. I invite you to also review our Code of Conduct. |
@bjhardcastle @mavaylon1 @alejoe91 thank you for the discussion to help us better understand the issue. We'll review the issue in our team meeting next week to make a plan to develop a fix for this issue and get back. I appreciate everyone's patients and hope you have a great weekend. |
@mavaylon1 if this can help, I got the same error on an NWB file produced by a double export (first adding electrical series, second adding units) and when opening the file locally. Here's a link with the file so you can download it: https://drive.google.com/drive/folders/1f8_92cXrEvOdJWJvghNVNR_sDjpejmEO?usp=sharing |
Another update: switching the units export to an append mechanism seems to produce NWB files without the issue. To summarize, we currently produce NWB files in multiple steps:
Here are the NWB zarr files produced:
So it seems that the issue is with adding the Units table with export. I hope this helps! |
Thanks you @alejoe91 for debugging this issue further and providing these additional details. We'll take a look. |
@bjhardcastle we are actively investigating adjacent issues that may fix your ticket. Our goal is to either have a solution or a clear path to one by the end of next week. |
@mavaylon1 any lead on this? |
@alejoe91 @bjhardcastle The issue the is stalling this is our implementation of export. I would find out who generated the files/exported the files as the files in general might not be valid. |
What happened?
#185 highlights the need for a test for this, so maybe you're already aware of this issue.
The problem stems from
self.is_remote()
incorrectly returning False.Adding
ConsolidatedMetadataStore
to the type check here is enough to get it working again:hdmf-zarr/src/hdmf_zarr/backend.py
Line 185 in afaab5f
Steps to Reproduce
Traceback
Operating System
Windows
Python Executable
Python
Python Version
3.11
Package Versions
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: