-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OPTIMADE] Deep pagination of NOMAD OPTIMADE API by clients #45
Comments
You do not need to cheat it like this. You can use value-based pagination in elasticsearch with unlimited depth. The OPTIMADE standard offers We could put this into our OPTIMADE implementation (I guess also into the ES impl of OPT). Currently only offset-based pagination with the described limitations are implemented. To implement this clean and easy, the OPT function |
Maybe another remark. The 10000 entries limit has a reason. To do offsets, ES internally always has to produce the full result set. This is why its capped. The larger your offset, the more ES has to scrape through. Therefore, your "fix" would also be quite stressful for our ES. |
If that would work then lets do it! We only have number-based pagination via
As far as I understand it my fix is not offset-based? I am just paging through what I can, then generating a new filter with offset 0 and starting again. It would be stressful in that every set of 10000 results will need a new query, but that doesn't seem too bad to me? (In fact I think my fix maps pretty much onto the |
Well, we need to update our OPT version anyways, when you have this result-pydandic-validation-optional-switch in it. Does this exist now? If you could squeeze
Yes, but under the hood it is translated into offset-based pagination in ES (this is the only thing implemented). In this discussion offset and page based are synonyms. |
I'll see what I can do -- I can at least patch in the query params for now and release as you suggest. I never ended up making a PR for disabling validation as I couldn't generate a test case where there was any signficant perf. improvement, but in the case of allowing "slightly" wrong data through then it is useful ;) |
Hi devs, as I've discussed in the past with @markus1978, we'd like to be able to do deep pagination of the NOMAD OPTIMADE API. Currently pagination is limited to the first 10000 results of a query, until it hits an ES limitation. This can be circumvented with some work on the ES backend (https://www.elastic.co/guide/en/elasticsearch//reference/current/paginate-search-results.html#search-after) but I think this might a common problem so instead would like to resolve it in my client code at optimade-python-tools.
The "fix" would be to paginate into a query until failure, then use the last page of results to generate a new query that limits results to the remaining pages. This requires a field to sort over, and ideally I would sort over
id
/immutable_id
or_nmd_upload_time
. Unfortunately these sorts are not available in the current NOMAD OPTIMADE API, though I think the default sort is_nmd_upload_time
.Of course, this workaround can lead to inconsistent data between paginated queries, but I think basically every other OPTMADE API is already affected by this...
So, I have a few questions:
search_after
fix within optimade-python-tools?The text was updated successfully, but these errors were encountered: