-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] IndexOutOfBoundsException in Hybrid search for some queries only #497
Comments
@tiagoshin can share the query which you are using? |
I can see the exception is coming from this: https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java#L64 @tiagoshin please share your query skeleton, so that it can better help us debug the issue here. |
Thank you @navneet1v, I shared the query skeleton with David Fowler from AWS customer support, did you receive the query? |
@tiagoshin Looking at logs which are shared, I can see that HybridQueryPhaseSearcher which is responsible for running the query is not invoked. This let me believe that either the hybrid query clause was not the top level clause, or there are some nested fields in the index which lead to wrapping of hybrid query clause with other query clauses(This is OpenSearch default behavior). We are already working on a fix for nested query clauses, as part of this github issue: #466. |
Hi @navneet1v, I see the HybridQueryPhaseSearcher invoked in the following line, isn't it?
|
@tiagoshin if you look at the code: https://github.com/opensearch-project/neural-search/blob/2.11/src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java#L66 Line 66 will hit if the query is not the top level query is not hybrid query. |
That makes sense, thank you @navneet1v! |
We have pushed a code change that should fix this issue, please check details in this issue comment: #466 (comment) |
I'm getting a similar, but different exception, on OS 2.11.1 (6b1986e964d440be9137eba1413015c31c5a7752):
Full exception: aioobe.txt Unfortunately I'm not familiar enough with the subject matter to know if this is the same exception or if it has been patched. I get this error more reproducibly on my single-node cluster with only 8800 documents and the following search pipeline and query: {
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean",
"parameters": {
"weights": [
0.6,
0.3,
0.1
]
}
}
}
}
]
} Query: {
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"_source": {
"exclude": [
"text_embedding"
]
},
"query": {
"hybrid": {
"queries": [
{
"match_phrase": {
"text": {
"query": "foo"
}
}
},
{
"match": {
"text": {
"query": "foo"
}
}
},
{
"neural": {
"text_embedding": {
"query_text": "foo",
"model_id": "--------",
"k": 5
}
}
}
]
}
}
} I have narrowed down the issue to occurring when one or more of the sub-queries return effectively 0 results after normalizastion. That is - the scores are so low after normalization that they are completely discarded. If I remove two of the sub-queries and disable the search pipeline, the query works. Or if I make a more specific query where the sub-queries return a similar number of results, the query also works. I'm happy to provide more information if needed, or make a new issue if it's not the same one as this/#466. I'm running in Docker, so not quite sure how to test the RC build from that thread. Edit: also tried on 2.12.0, still happening. Is this new issue material? |
@Lemmmy so what you are saying that you tried on the tar provided here in this comment: #466 (comment) and it is still not working. cc: @martin-gaievski |
@Lemmmy the CIs of Opensearch publishes the builds everyday in Opensearch staging repo of Docker: https://hub.docker.com/r/opensearchstaging/opensearch/tags You can use this: |
@Lemmmy I did some more deep-dive and I am able to reproduce the issue. I also tested with different queries where one query clause doesn't yield any result. That use case is working perfectly. But I able to figure out the root cause of the exception you are getting. Here are the steps to reproduce: Setup
Output of Search
Stacktrace
Root CauseSo, what happening here is if we look at the queries provided in the hybrid clause, I have deliberately put my 2 text search queries exactly same.
We create a map of Query to the index(key being the query object) here and use that map here to create PQ and to assign the scorers created for each query. Because both the text queries are same, the map we are creating instead of having size 3(as we have 3 queries) it is getting created with size 2. Which is leading to the exception. Now, in production I don't expect users to provide two exactly same queries. But this is a bug. Please let me know if removing the duplicate queries solves your issue. Proposed SolutionWe should go ahead and throw out an exception with proper message to the user that the queries defined have duplicates in it. @Lemmmy Please let me know your thoughts on this. cc: @martin-gaievski |
@tiagoshin I some deep-dive here: #497 (comment) can you check on your side for you also this was the issue? if not can you provide the query skeleton so that I can make sure that all bugs provided in this issue are resolved. I understand that your query contained nested fields which we have already fixed for 2.12. But is there any other issue that you are facing please do comment, so that it can be fixed in 2.12 |
Thanks for the quick investigation! To clarify, am I supposed to avoid combining "query": {
"hybrid": {
"queries": [
{
"match": {
"text": {
"query": "Hi world"
}
}
},
{
"neural": {
"passage_embedding": {
"query_text": "Hi world",
"model_id": "aVeif4oB5Vm0Tdw8zYO2",
"k": 5
}
}
}
]
}
} Or is it just because of my use of both When changing this line: neural-search/src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java Line 140 in 5daddfd
To: -DisiPriorityQueue subScorersPQ = new DisiPriorityQueue(queryToIndex.size());
+DisiPriorityQueue subScorersPQ = new DisiPriorityQueue(subScorers.size()); The query I provided in #497 (comment) no longer errors and the results look roughly as I'd expect. |
"query": { This is okay.. But in your case:
The match_phrase and match are actually boiling down to same queries and hence the issue was happening. |
Ah, that makes a lot more sense, I will fix that then. Thanks for all your help. |
Sure, I am planning to add an exception signature if we found out queries are same and then throw the exception out from here: https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/query/HybridQueryBuilder.java#L297 like this:
This will ensure that queries are not run, because if we do this change
it has some other side effects in the code. |
Hi @navneet1v, thank you very much for your attention.
So I increased
Here are the logs:
|
@tiagoshin can you share the query skeleton with me so that I can reproduce the issue. BTW are you setting |
@navneet1v I shared the query and artifacts with David Fowler. Could you please get them with him? |
@navneet1v I got the same issue that I reported before about the IndexOutOfBoundsException on version 2.12.0 when increasing the
On the logs I see:
However, if I decrease the |
The IndexOutOfBoundsException exception fix is not there in 2.12, the 2.12 contains only the fix for nestedQueries. If you look at my RCA done here: #497 (comment) it provides the info that if you have 2 queries which are same then in that case the issue will happen. So, check your array of hybrid queries and see if there are duplicates. If yes remove them and this can be a short fix from your side. Meanwhile we deicide how to handle the duplicate queries. |
Thanks for the response. I am working on that issue. Doing some more validations before I put a Root cause and the fix for the issue. |
So I was able to get to the rootcause of the issue mentioned here(#497 (comment)):
} So the first issue where we are seeing neural-search/src/main/java/org/opensearch/neuralsearch/search/HitsThresholdChecker.java Lines 27 to 29 in 63fe67f
This case happen when we are adding For the second issue where was
|
|
Actually you are using 1 shard. the other shard is a replica of the first shard. But thanks for this information. The code path which is resulting in this issue that you are getting when you set Just for resolving the issue for now, can you try with more than 1 primary shards. and see if you still face the issue when |
@navneet1v It worked when increasing shards to 2, thank you very much! |
Replicas will have no impact. You can keep it whatever you want. Just to put it 1 more time, i am still going to do deep-dive to fix the issue with 1 shard too. But for now happy to know you are unblocked |
* Allow multiple identical sub-queries in hybrid query, removed validation for total hits Signed-off-by: Martin Gaievski <[email protected]>
…-project#524) * Allow multiple identical sub-queries in hybrid query, removed validation for total hits Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit 585fbbe)
…-project#524) * Allow multiple identical sub-queries in hybrid query, removed validation for total hits Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit 585fbbe) Signed-off-by: Martin Gaievski <[email protected]>
* Allow multiple identical sub-queries in hybrid query, removed validation for total hits (cherry picked from commit 585fbbe) Signed-off-by: Martin Gaievski <[email protected]>
Hi @navneet1v!
I shared with David Fowler the artifacts for reproduction of the issue |
Hi @tiagoshin for the issue 2 I think what's happening is that some results are having exactly same score after normalization and when combined some of them may be pushed down out of the final result list. As this depends on order of execution of individual sub-queries every time result of such re-arrangement will look differently. for the issue 3 what can happen is that a doc may receive higher combined score if it appears in results of let's say 2 sub-queries rather than in only one sub-query, even if in that one result this doc is high. For example, let's say we have sub-query A and B, and each return following results: A = [doc1: 0.7, doc2: 0.6, doc3: 0.5] and B = [doc4: 0.7, doc5: 0.6, doc3: 0.5]. If we use |
@martin-gaievski Here are the logs: |
@martin-gaievski 1st hybrid search run
2nd hybrid search run
3rd hybrid search run
In conclusion:
I'll send you all the queries and results privately, so you can check yourself. |
Workaround for a hybrid query bug in OpenSearch - opensearch-project/neural-search#497
@martin-gaievski can we close this issue as the bug is resolved. |
yes, code wise we took care of the problem in #524 |
What is the bug?
I'm using Hybrid search in Opensearch version 2.11, and I'm getting the following error in some queries:
I get these logs:
How can one reproduce the bug?
Honestly, it's very hard to reproduce the bug. As I'm using my company's data, I cannot share it publicly. However, we can work on enabling this privately.
What is the expected behavior?
The expectancy is to not get the error for the hybrid search.
What is your host/environment?
MacOS Ventura 13.3.1, I'm running on Docker compose.
Do you have any additional context?
When I search on the exact same index for semantic search or lexical search, it works properly. It only happens for Hybrid search.
I observe a pattern that queries with more than one word tend to be more likely to have this error than simple queries. Queries that failed are like "horror movies", "teen mom", "news radio".
However, I observed that when I changed the combination technique, some queries started working, and other queries started failing.
I also observed that when I changed the index data, some queries started working, and other queries started failing.
However, for the same data and same settings, results are idempotent.
The text was updated successfully, but these errors were encountered: