Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Search request with PIT + searchAfter and 2 sort fields loses some hits #15869

Closed
Anastasiia186 opened this issue Sep 9, 2024 · 3 comments
Labels
bug Something isn't working Search:Query Capabilities

Comments

@Anastasiia186
Copy link

Anastasiia186 commented Sep 9, 2024

What is the bug?

Executing a search request with a PIT + searchAfter and 2 sort fields (first one is not uniq, second one is uniq) returns less hits in total that for track_total_hits. The smaller the page size, the more data is lost, if you use scrolling, there is no problem. if you change the sorting values (only 1 uniq or first one is uniq and second one is not uniq), the problem also disappears

How can one reproduce the bug?

create a search request with a PIT and 2 sort fields like

 PointInTimeBuilder pointInTimeBuilder = new PointInTimeBuilder(pitId);
 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder()
                .pointInTimeBuilder(pointInTimeBuilder);
        searchSourceBuilder.query(<query>);
        searchSourceBuilder.sort(SortBuilders
                .fieldSort(<not uniq field>)
                .order(SortOrder.ASC));
        searchSourceBuilder.sort(SortBuilders
                .fieldSort(<uniq fiels>)
                .order(SortOrder.ASC));
        searchSourceBuilder.size(10_000)

run this query several times using searchAfter like:

searchSourceBuilder.searchAfter(prevSearch.getHits().getHits()[prevSearch.getHits().getHits().length - 1].getSortValues());
SearchRequest searchRequest = new SearchRequest();
searchRequest.source(searchSourceBuilder);
restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT)

and notice that in total you have less hits that returns in track_total_hits:
total using PIT = 1999953
"total": {
"value": 2000000,
"relation": "eq"
}

What is the expected behavior?

A search response in total returns the same number of hits as for track_total_hits

What is your host/environment?

SpringBoot application

@Anastasiia186 Anastasiia186 added bug Something isn't working untriaged labels Sep 9, 2024
@Xtansia
Copy link
Contributor

Xtansia commented Sep 9, 2024

@opensearch-project/triage Please transfer this to the core repo as this pertains to behaviour of OpenSearch itself (and example is using RHLC).

@msfroh
Copy link
Collaborator

msfroh commented Sep 11, 2024

[Search Triage] This one needs some investigation. With the unique second sort field acting as a tie-breaker, there should be a total order on the documents, so the search_after should continue exactly after the previous doc. One possibility is that the second field may not be in all documents, such that "missing" is a duplicated value.

@Anastasiia186 -- can you please check if the second field is present in all documents? Also, are you able to produce a minimal test case that reproduces the issue? Thanks!

@msfroh msfroh removed the untriaged label Sep 11, 2024
@Anastasiia186
Copy link
Author

@msfroh yes, we rechecked the second field and found some duplicates...we removed them and now it works fine
thank you!

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Query Capabilities
Projects
Archived in project
Development

No branches or pull requests

3 participants