-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Don't scroll if max docs < scroll size (update by query/delete by query) #13704
Comments
@luisfavila If you bring in a fix, please make sure not to be copying/looking at non-APLv2 code. |
@luisfavila will this help speed up those queries or just bring down the scroll count? |
@harshavamsi I'm not entirely sure whether it speeds up the queries. It helps immensively if you're running a big number of queries that affect a small number of documents (<10k) as you won't hit the 500 max within the scroll window defined. I'm also under the impression scrolls aren't cleared when the query finishes, only when they expire, but am not entirely sure (would have to check the actual code) |
Discussed this briefly in our weekly search meetup (https://www.meetup.com/opensearch/events/300759986/). I'm not sure if we have a great way of preventing the scroll from being created in the first place, but if the scrolls aren't being cleaned up once we hit the end (especially for a Why can't we prevent the scroll from being created in the first place? As far as I know, the flow should be roughly:
If we fetch page 1 and it's not the last page and then create a scroll, it might not match the same reader state as the page that we fetched. That could lead to bizarre outcomes (docs skipped or visited twice). AFAIK, that's why we always create the scroll first. |
Will it also be useful to identify if its technically feasible to know upfront if there are less number of documents - based on if max docs is less than scroll size, then skip using the scroll altogether and avoid creating the max_open_scroll_context? Else we fallback to aggressively clean up the context as Froh suggested above. |
Oh... I was thinking about the internals of how a search works anyway. For every search, we create a Maybe the solution is to fix the scroll logic to let the |
@msfroh Would it still add to the scroll context limit that way? It'd be nice if it didn't unless it needs to, so there's no actual limit on how many UpdateByQuery / DeleteByQuery you can run concurrently, given they don't need to scroll. Separately, is this still planned to be implemented? |
Is your feature request related to a problem? Please describe
Using update_by_query and delete_by_query over a small amount of docs, where the max docs is less than scroll size, still uses a scroll and contributes to the max_open_scroll_context limit.
Describe the solution you'd like
Don't scroll unless necessary. This has been patched in ES as of 7.17.0
Related component
Search:Performance
Describe alternatives you've considered
I'm aware of #12923 which should also eventually solve this problem, but it may take a while to get there. Would be great to have this patched before then.
Additional context
No response
The text was updated successfully, but these errors were encountered: