Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EbrainsPublicDatasetConnector cannot handle files with identical names #508

Open
ch-schiffer opened this issue Nov 14, 2023 · 0 comments
Open
Assignees

Comments

@ch-schiffer
Copy link
Contributor

I just tried to use EbrainsPublicDatasetConnector to retrieve some data from the KG, using the following code:

datasets = set([profile.datasets[1] for profile in siibra.features.cellular.CellDensityProfile.get_instances()])
for dataset in datasets:
    repo = siibra.retrieval.repositories.EbrainsPublicDatasetConnector(dataset.id)
    files = repo.search_files(suffix='.png', recursive=True)
    image_files = [fname for fname in files if "image" in fname]
    images = [repo.get_loader(image).data for image in image_files]

Here, I would expect to get all .png files in the selected dataset that have image in their name.
In each dataset, there are multiple image.png files. However, as EbrainsPublicDatasetConnector uses a dictionary to store the files, it cannot handle multiple files with the same name:

# https://github.com/FZJ-INM1-BDA/siibra-python/blob/main/siibra/retrieval/repositories.py#L571C1-L578C22
@property
def _files(self):
    if self.use_version in self.versions:
        return {
            f["name"]: f["url"] for f in self.versions[self.use_version]["files"]
        }
    else:
        return {}

Hence, it returns only one image.png per dataset. I think this logic needs to be changed to include the full path, so that multiple files with the same name can be properly handled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants