-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update backend.py #105
Update backend.py #105
Conversation
Update I/O for Zarr to support the linkage of ExternalResources.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## dev #105 +/- ##
==========================================
- Coverage 81.97% 81.78% -0.19%
==========================================
Files 11 11
Lines 2663 2663
==========================================
- Hits 2183 2178 -5
- Misses 480 485 +5
☔ View full report in Codecov by Sentry. |
Fix #106 |
@oruebel After the updates, the gallery is failing with
Should we make the fix for that separate on its own ticket? I also believe this is linked to the external links workflow failing. |
That looks like another error due to hdmf-dev/hdmf#848 triggering the rebuild of Containers such that we have to determine the data type rather than remembering the existing type. I suspect that this will most likely require a fix in the |
Right that makes sense, but I was suggesting we merge this and handle it separately before a release. @oruebel |
Let me check if I can figure this one out tomorrow so we can at least get an estimate of how hard it will be to fix this bug. |
@oruebel thoughts? |
When running |
Now upon initial inspection of the traceback
When you add a breakpoint in line 1467 of h5tools.py, you are confronted with an np.array of group builders. These builders are for the electrode group ADunit_32. The fact that it's a np.array is not correct (via the advice of Ryan) instead of a dataset of references. Update/Edit: Looking at the same set of builders, it is a np.array of Builders either way and so I don't think the format is wrong. You can verify by the original breakpoint at 1467 on h5tools and then by removing the cache and setting a breakpoint at 1209 on h5tools and looking at 'data'. Both are np.arrays of Builders. What's even funnier is the _dtype for dset is "dtype('O')" (type n a few times), which to me is weird since that's what were trying to avoid (hence Oliver's change). Now 1467 is within "list_fill" which is called by "write_dataset" (still within h5tools). The part of write_dataset you'd think we'd be in would be line 1265 to write an array of object references, BUT we are in line 1301. The options['dtype'] returned is utf8, which remind me of Oliver's addition of
This options['dtype'] shouldn't be utf8 from my understanding. Now what about of we removed the clear cache. What would the options['dtype'] be? It is object. (Add a breakpoint at 1208 and see) So they are different and we need to figure out what is setting the dtype. |
Update: |
I compared the two log files (with and without the cache) and there differences (one makes a new root while the non-cache reuses). I will take a look more tomorrow, but any insight would be great. @oruebel |
On export when the cache is being cleared, it means that HDMF is now constructing a new set of builders. On the "original" builders that were read from file (e.g., HDF5) the dtype is already set and so when we export the builders we don't need to determine the dtype. However, when the cache is being cleared and the builders are being rebuilt, then that means that we now need to determine the dtype from the array (e.g, the
Correct, I believe the issue is that when export added to clear the cache it now means we are hitting corner cases in the logic to determine dtypes in the ObjectMapping that we did not encounter before. If I understand it correctly, you are seeing that the dtype is set to the wrong type by the code I added to handle the case where we have an object-type numpy array, i.e., it seems like we either: 1) need some logic here to be able to distinguish between an numpy array of references vs. a numpy array of strings or 2) need to figure out how to keep information about the original dtypes that are being used around so that we don't have to guess the dtype (e.g., either be removing the clear cache or having an option to rebuild builders but keeping dtype information from the previous builders). |
@mavaylon1 Now that removing python 3.7 testing was done in a separate PR, is this PR still necessary? I merged dev into here and the changes do not look relevant anymore. |
How to test the behavior?
Checklist
ruff
from the source directory.