[Feature]: Reusing Keys and Entities should be automatic #961

mavaylon1 · 2023-10-05T19:10:12Z

What would you like to see added to HDMF?

When we first made HERD, it was not built for "bulk" adding. You could loop add_ref; however, when reusing keys and entities you need to change the parameters of add_ref. For keys, you would need to use get_key to use the key object in add_ref. For entity, you would need to remove the uri parameter.

Say we wanted to use a DANDI set of nwbfiles and add references for subject and experimenter. It's not user friendly to have to have a try/except set up based on whether they key or entity exists.

from pynwb.resources import HERD
from pynwb import NWBHDF5IO, NWBFile
from glob import glob
from tqdm import tqdm

# Path to all the files
path = '/Users/mavaylon/Research/NWB/000015/sub*'

# Create HERD
herd = HERD()

# populate iteratively
folders = glob(path)
for folder in folders:
    for file in tqdm(glob(folder+'/*')):
        io = NWBHDF5IO(file, mode='r')
        read_file = io.read()
        #Add HERD for Subject
        try:
            entity = herd.get_entity(entity_id='NCBI_TAXON:10090')
            if entity is not None:
                raise ValueError()
            else:
                herd.add_ref(file=read_file,
                             container=read_file.subject,
                             key=read_file.subject.species,
                             entity_id = 'NCBI_TAXON:10090', # this assumes the same species for each file
                             entity_uri = 'https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=NCBI_TAXON:10090'
                             )
        except ValueError: # after the first use of an entity_id and key, you are required to reuse them
            herd.add_ref(file=read_file,
                         container=read_file.subject,
                         key=read_file.subject.species,
                         entity_id = 'NCBI_TAXON:10090'
                         )


        # Add HERD for Experimenter
        try:
            if len(read_file.experimenter)>1:
                breakpoint()
            herd.get_entity(entity_id='0000-0001-6782-3819')
            if entity is not None:
                raise ValueError()
            else:
                herd.add_ref(file=read_file,
                             container=read_file,
                             attribute="experimenter",
                             key=read_file.experimenter[0], # this assumes the experimenter is the same for each file
                             entity_id = '0000-0001-6782-3819',
                             entity_uri = 'https://orcid.org/0000-0001-6782-3819'
                             )

        except ValueError:
            herd.add_ref(file=read_file,
                          container=read_file,
                          attribute="experimenter",
                          key=read_file.experimenter[0],
                          entity_id = '0000-0001-6782-3819'
                          )
            io.close()

As of now, our "bulk" method is to use the TermSetWrapper, but we haven't actually tested duplicate data that would need to use a key object. This will fail adding to HERD.

We need to either

Have a way to modify add_ref to support resolving the right key if it needs to be be reused and not rely on a manual call to get_key from the user.
Even though entity_id somewhat resolves on its own, an error will still be raised to remove the "uri" if reusing the entity_id. Now this would also hinder bulk adding (having to manually remove). Should we make a strong assumption that when reusing an "id" to always ignore the URI. I think this is fine.

Without some form of 1 and 2, we don't support a seamless bulk adding of references.

Is your feature request related to a problem?

No response

What solution would you like?

Read Above.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this change was not already requested?

The text was updated successfully, but these errors were encountered:

oruebel · 2023-10-05T21:19:25Z

Have a way to modify add_ref to support resolving the right key if it needs to be be reused and not rely on a manual call to get_key from the user.

Sounds reasonable. There may need to be some logic to specify behavior to, e.g., reuse: 1) any matching key, 2) reuse key only if the neurodata_type and relative path match, 3) reuse key only if the object_id matches.

2. Even though entity_id somewhat resolves on its own, an error will still be raised to remove the "uri" if reusing the entity_id. Now this would also hinder bulk adding (having to manually remove). Should we make a strong assumption that when reusing an "id" to always ignore the URI. I think this is fine.

I think here we should raise a warning if the URI is different from what is already in HERD.

mavaylon1 self-assigned this Oct 5, 2023

mavaylon1 mentioned this issue Oct 15, 2023

[Feature]: HERD/TermSet Expansion Tracker #966

Closed

11 tasks

mavaylon1 mentioned this issue Oct 31, 2023

HERD Updates #968

Merged

13 tasks

rly added category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) labels Jan 24, 2024

mavaylon1 closed this as completed Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Reusing Keys and Entities should be automatic #961

[Feature]: Reusing Keys and Entities should be automatic #961

mavaylon1 commented Oct 5, 2023 •

edited

Loading

oruebel commented Oct 5, 2023

[Feature]: Reusing Keys and Entities should be automatic #961

[Feature]: Reusing Keys and Entities should be automatic #961

Comments

mavaylon1 commented Oct 5, 2023 • edited Loading

What would you like to see added to HDMF?

Is your feature request related to a problem?

What solution would you like?

Do you have any interest in helping implement the feature?

Code of Conduct

oruebel commented Oct 5, 2023

mavaylon1 commented Oct 5, 2023 •

edited

Loading