Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] attempt to fetch volumes with multiple sources can occassionally result in undeterministic behaviour #318

Closed
xgui3783 opened this issue Mar 30, 2023 · 15 comments
Labels
bug Something isn't working important

Comments

@xgui3783
Copy link
Member

xgui3783 commented Mar 30, 2023

as of 0.4a35

When volume with multiple providers (e.g. nii + neuroglancer/precomputed) calls fetch, network conditions will determine which provider is used.

e.g. for mni152 jb29 labelled where both nii and precomputed are provided, nii will be attempted first (per https://github.com/FZJ-INM1-BDA/siibra-python/blob/01d3d09/siibra/volumes/volume.py#L40-L44 ). If it fails (src is unreachable, in maintenance, being rate limited), neuroglancer precomputed source will be used.

precomputed provider fetch seems to default to lowest resolution.

As a result, voxel count will likely be a factor of 8 less.

code to reproduce:

caveat:

  • need to install test dep requests-mock
  • needs to run siibra.cache.clear() between each run
import siibra
import numpy as np
import requests_mock

siibra.cache.clear()

url = "https://neuroglancer.humanbrainproject.eu/precomputed/data-repo-ng-bot/20210714-julichbrain_v290_correctaffine/hbp-d000007_julichbrain-cytoatlas_dev/MPMs/29.0/GapMapPublicMPMAtlas_l_N10_nlin2StdICBM152asym2009c_29_publicDOI_b1bcc0b5a127f5a917f0d8b0e869be5d.nii.gz"

def main():
    region = siibra.get_region("2.9", "area 1 (postcg) left")
    rmap = region.fetch_regional_map("mni152")
    print(np.count_nonzero(rmap.get_fdata()))

def mocked():
    with requests_mock.Mocker(real_http=True) as mocker:
        mocker.register_uri(url=url, method="get", status_code=404, text="oh no")
        main()

main() # prints 6203

clear siibra cache

import siibra
import numpy as np
import requests_mock

siibra.cache.clear()

url = "https://neuroglancer.humanbrainproject.eu/precomputed/data-repo-ng-bot/20210714-julichbrain_v290_correctaffine/hbp-d000007_julichbrain-cytoatlas_dev/MPMs/29.0/GapMapPublicMPMAtlas_l_N10_nlin2StdICBM152asym2009c_29_publicDOI_b1bcc0b5a127f5a917f0d8b0e869be5d.nii.gz"

def main():
    region = siibra.get_region("2.9", "area 1 (postcg) left")
    rmap = region.fetch_regional_map("mni152")
    print(np.count_nonzero(rmap.get_fdata()))

def mocked():
    with requests_mock.Mocker(real_http=True) as mocker:
        mocker.register_uri(url=url, method="get", status_code=404, text="oh no")
        main()

mocked() # prints 757
@AhmetNSimsek
Copy link
Collaborator

Using df51a05, I cannot reproduce #318. I always get 6203.

@xgui3783
Copy link
Member Author

I can still reproduce the bug from df51a05

the key is, one need to clear the cache between each run, otherwise, the cached map skew the test result.

I have clarified that in the original issue.

@AhmetNSimsek
Copy link
Collaborator

AhmetNSimsek commented May 11, 2023

the key is, one need to clear the cache between each run, otherwise, the cached map skew the test result.

Yes, I cleared the cache between each run and tested 10 times 😅

@xgui3783
Copy link
Member Author

xgui3783 commented May 11, 2023

hmm did you restart your interpreter between each run?

yea, I don't know what to say. The error is still occuring here.

could it be OS specific?

@AhmetNSimsek
Copy link
Collaborator

Yes, I have. I tried again now and always gives me 6203. I tried with the current main (051e7ec) and it then sometimes prints 757 but for both.

It could be OS-specific indeed.

@xgui3783
Copy link
Member Author

Yes, I have. I tried again now and always gives me 6203. I tried with the current main (051e7ec) and it then sometimes prints 757 but for both.

It could be OS-specific indeed.

I suspect this is because the cache was not cleared.

I was not super clear on how to reproduce the issue when I first posted it. I have since updated it.

Can you try the following code:

import siibra
import numpy as np
import requests_mock

siibra.cache.clear()

url = "https://neuroglancer.humanbrainproject.eu/precomputed/data-repo-ng-bot/20210714-julichbrain_v290_correctaffine/hbp-d000007_julichbrain-cytoatlas_dev/MPMs/29.0/GapMapPublicMPMAtlas_l_N10_nlin2StdICBM152asym2009c_29_publicDOI_b1bcc0b5a127f5a917f0d8b0e869be5d.nii.gz"

def main():
    region = siibra.get_region("2.9", "area 1 (postcg) left")
    rmap = region.fetch_regional_map("mni152")
    print(np.count_nonzero(rmap.get_fdata()))

def mocked():
    with requests_mock.Mocker(real_http=True) as mocker:
        mocker.register_uri(url=url, method="get", status_code=404, text="oh no")
        main()

mocked() # prints 757

@AhmetNSimsek
Copy link
Collaborator

AhmetNSimsek commented May 11, 2023

Can you try the following code:

This produces 757

@AhmetNSimsek
Copy link
Collaborator

How about adding a base_resolution parameter in json (under provider section) for each precomputed neuroglancer image? If there is nothing, then it will still do default behaviour (go to the lowest). But if there is a value, then it will use it.
Since we have to compute the neuroglancer volumes anyway, this could be outputted during the pipeline.

What do you think @xgui3783 and @dickscheid?

@xgui3783
Copy link
Member Author

I do not like that we are expanding the base schema so much.

in no time we will not remember why we added the attributes.

I proposed the following to resolve the issue:

rather than trying the volume sources one by one, if no spec is provided try and retry the first in the order of preference, only.

we do not introduce more schema we need to maintain, and improve the reproducibility.

@AhmetNSimsek
Copy link
Collaborator

I see what you mean. Do you think the retry window would be enough for a network issue?

@xgui3783
Copy link
Member Author

I see what you mean. Do you think the retry window would be enough for a network issue?

I argue that thrown error is better than silent breakdown of reproducibility

@AhmetNSimsek
Copy link
Collaborator

Ahhh you mean never actually try other sources and throw an error?

@xgui3783
Copy link
Member Author

Ahhh you mean never actually try other sources and throw an error?

Correct. Unless the user specifically requests format=neuroglancer/precomputed

@AhmetNSimsek
Copy link
Collaborator

Okay now I get what you mean. Makes sense. I'll make the change

@AhmetNSimsek
Copy link
Collaborator

AhmetNSimsek commented Jun 1, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working important
Projects
None yet
Development

No branches or pull requests

2 participants