Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request for complex query support #48

Open
dkp opened this issue Jul 8, 2021 · 4 comments
Open

Feature request for complex query support #48

dkp opened this issue Jul 8, 2021 · 4 comments

Comments

@dkp
Copy link

dkp commented Jul 8, 2021

I was hoping to pull data from the database with specific qualities (TR, MB, manufacturer etc.). I tried the following URL: https://mriqc.nimh.nih.gov/api/v1/bold?max_results=10&where=bids_meta.MultibandAccelerationFactor%3C4&RepetitionTime=2.0&Manufacturer=Siemens&MagneticFieldStrength=3

Although this runs, the results are not filtered by the BIDS meta data. Am I missing something about how to do this? Is this a planned feature? It would be very useful.

Thank you.

@oesteban
Copy link
Member

oesteban commented Jul 8, 2021

pinging @rwblair - should this work already? I'm guessing so right?

@rwblair
Copy link
Contributor

rwblair commented Jul 8, 2021

@dkp is there an object id that multiband acceleration factor is 4 or greater that is returned by the query you posted? Playing with just that part of the query and switching it from a less than to a greater than seems to return appropriate results:

curl 'https://mriqc.nimh.nih.gov/api/v1/bold?max_results=10&where=bids_meta.MultibandAccelerationFactor%3E4'

Please let me know if I'm misunderstanding the issue.

@dkp
Copy link
Author

dkp commented Jul 9, 2021

@rwblair Thanks so much for the prompt reply!

Is the URL syntax documented anywhere?

I'm afraid I am new to navigating your undocumented web API and I'm finding it very confusing. For example, your query string does work for me but uses where=, which I have not seen in any other of the few example URLs that I have been able to find. I have now found several other query strings which work without where=. For example:

curl -s 'https://mriqc.nimh.nih.gov/api/v1/bold?max_results=59&RepetitionTime=2.5' -H "accept: application/json"

Also note that this query does not require a bids_meta. prefix on RepetitionTime, whereas your query seems to require it on MultibandAccelerationFactor or it fails. So when and where do I need to use bids_meta.?

When and why do I need to use where=? AND, if I use the where=, then how can I ask for a MultibandAccelerationFactor of exactly N (e.g. 3, or 4, or 8)? If I use another equal sign, this causes a server 400 error (which makes sense since it violates REST convention by saying where=key=value). The following does not work, for example:

curl -s 'https://mriqc.nimh.nih.gov/api/v1/bold?max_results=10&where=bids_meta.MultibandAccelerationFactor%3D4' -H "Accept: application/json"

I also note that there is apparently an undocumented limit of 50 records we can retrieve ; (

 curl -s 'https://mriqc.nimh.nih.gov/api/v1/bold?max_results=33' -H "accept: application/json" | json_pp | grep '"_id" :' | wc -l
      33

curl -s 'https://mriqc.nimh.nih.gov/api/v1/bold?max_results=51' -H "accept: application/json" | json_pp | grep '"_id" :' | wc -l
      50

@rwblair
Copy link
Contributor

rwblair commented Jul 9, 2021

So we use software called Eve to automatically generate the API from a schema. More information on how it handles these query parameters can be found under the filtering section on page 16 of this PDF:
https://readthedocs.org/projects/eve/downloads/pdf/latest/

The 'where' comes from mongodb style queries. I can't find good information on how exactly parameters outside of the where clause get converted into queries against the database, but I was not able to get querying on a nested field to work. With regards to "where=key=value" there are two ways that a query can assert equality, one is the mongodb style which is verbose and I'll omit for not, and the other is pythonic in nature where it expects a double equals sign to assert equality like ?where=key==value

Here is where the schema we use is defined if you want to get an idea of its structure:
https://github.com/poldracklab/mriqcwebapi/blob/master/dockereve-master/eve-app/settings.py

This shows which entries can be inside the nested bids_meta structure (but none are required):
https://github.com/poldracklab/mriqcwebapi/blob/8d4660591ec7e35fcb3edf417a8027f0d8b3ca8a/dockereve-master/eve-app/settings.py#L4

This is schema is the root object for the bold endpoints:
https://github.com/poldracklab/mriqcwebapi/blob/8d4660591ec7e35fcb3edf417a8027f0d8b3ca8a/dockereve-master/eve-app/settings.py#L107

My suspicion is that the RepetitionTime part of the queries is being ignored without the bids_meta prefix in a where clause.

Returning to your original query if we do it in their "pythonic" style we get this where clause before url encoding:

?where=bids_meta.MultibandAccelerationFactor<4 and bids_meta.RepetitionTime==2.0 and bids_meta.Manufacturer=="Siemens" and bids_meta.MagneticFieldStrength

with encoding:

curl -g https://mriqc.nimh.nih.gov/api/v1/bold?where=bids_meta.MultibandAccelerationFactor%3C4%20and%20bids_meta.RepetitionTime%3D%3D2.0%20and%20bids_meta.Manufacturer%3D%3D%22Siemens%22%20and%20bids_meta.MagneticFieldStrength%3D%3D3

Which seems to work coming back with 1024 results, much fewer than querying on any one of the terms.

Finally, apologies on the 50 limit not being documented, not sure why we set it that way or if it was a default. Are you able to paginate over the results ok?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants