Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address warning during export: Could not determine source_object_id #245

Open
rly opened this issue Nov 27, 2024 · 3 comments
Open

Address warning during export: Could not determine source_object_id #245

rly opened this issue Nov 27, 2024 · 3 comments
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)
Milestone

Comments

@rly
Copy link
Contributor

rly commented Nov 27, 2024

Looking through the test suite, four tests give this warning:

tests/unit/test_zarrio.py::BaseTestExportZarrToZarr::test_external_link_group
tests/unit/test_zarrio.py::TestExportZarrToZarrDefaultStore::test_external_link_group
tests/unit/test_zarrio.py::TestExportZarrToZarrDirectoryStore::test_external_link_group
tests/unit/test_zarrio.py::TestExportZarrToZarrNestedDirectoryStore::test_external_link_group
  /home/runner/work/hdmf-zarr/hdmf-zarr/.tox/py313/lib/python3.13/site-packages/hdmf_zarr/backend.py:811: UserWarning: Could not determine source_object_id for builder with path: /buckets/bucket1/foo_holder/foo1
    warnings.warn(warn_msg)

From this line:

curr = builder
while curr is not None and curr.name != ROOT_NAME:
curr = curr.parent
if curr:
source_object_id = curr.get('object_id', None)
# We did not find ROOT_NAME as a parent. This should only happen if we have an invalid
# file as a source, e.g., if during testing we use an arbitrary builder. We check this
# anyways to avoid potential errors just in case
else:
source_object_id = None
warn_msg = "Could not determine source_object_id for builder with path: %s" % path
warnings.warn(warn_msg)

The tests involve a File 2 that has a link to a Foo object in File 1. Then File 2 is exported to File 3. The Foo object in File 1 has a builder but that builder has no parent, because File 1 was not built; just Foo in File 1 was built. As a result, the warning is raised. The Foo builder has a source and the method has a rel_link_source. So the source for the created ZarrReference is correct, but the source_object_id is None. Is the object ID of the source file (the root of File 1) necessary for creating a ZarrReference? Should it be necessary to build the root of File 1 (which means all of File 1 gets built) in order to write a link to an object in File 1?

Of all the uses of export, the above use case is probably quite common. It is confusing and a bit alarming to have this warning appear on export.

@rly rly added category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of users labels Nov 27, 2024
@rly rly added this to the 1.0.0 milestone Nov 27, 2024
@oruebel
Copy link
Contributor

oruebel commented Nov 27, 2024

Is the object ID of the source file (the root of File 1) necessary for creating a ZarrReference?

The source_object_id and object_id fields are not necessary for resolving links. These were added to support validation of links, i.e., to make sure a link actually points to where it should point to. However, I don't believe they are used in the resolution of links.

Should it be necessary to build the root of File 1 (which means all of File 1 gets built) in order to write a link to an object in File 1?

It looks like the code is searching the parent hierarchy to find the root. Since you are looking for the object_id of the file, I think this could be short-cut by just getting or opening the file and looking up the object_id directly.

Of all the uses of export, the above use case is probably quite common. It is confusing and a bit alarming to have this warning appear on export.

The comment seems to indicate that this case may be specific to testing, where some file is not setup completely. If this is true, should the tests be updated or is this an actual problem in practice where we have real files on disk?

@rly
Copy link
Contributor Author

rly commented Nov 27, 2024

I think this could be short-cut by just getting or opening the file and looking up the object_id directly.

That makes sense.

The comment seems to indicate that this case may be specific to testing, where some file is not setup completely. If this is true, should the tests be updated or is this an actual problem in practice where we have real files on disk?

I take back what I said. I do not think it is that common to export a file that contains a link to another file. External links are generally uncommon in my experience. I think we should still address this use case in a better manner, but it is not urgent.

@rly rly modified the milestones: 1.0.0, Future Nov 27, 2024
@rly rly added priority: low alternative solution already working and/or relevant to only specific user(s) and removed priority: medium non-critical problem and/or affecting only a small set of users labels Nov 27, 2024
@oruebel
Copy link
Contributor

oruebel commented Nov 27, 2024

I do not think it is that common to export a file that contains a link to another file. External links are generally uncommon in my experience.

I think the Allen Institute for Neural Dynamics uses export with external links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

No branches or pull requests

2 participants