-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowing execution of hybrid query on index alias with filters #670
Allowing execution of hybrid query on index alias with filters #670
Conversation
389ee6b
to
c23b638
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #670 +/- ##
============================================
+ Coverage 84.04% 84.42% +0.37%
- Complexity 744 750 +6
============================================
Files 59 59
Lines 2313 2324 +11
Branches 374 375 +1
============================================
+ Hits 1944 1962 +18
+ Misses 214 213 -1
+ Partials 155 149 -6 ☔ View full report in Codecov by Sentry. |
BWC will keep failing unless all dependent repos are switched to 2.14 snapshot version, in particular: knn, ml-commons, common-utils. Also this PR should be merged for neural-search repo: #653 |
2561ea1
to
273fe4e
Compare
Signed-off-by: Martin Gaievski <[email protected]>
273fe4e
to
fce70fb
Compare
src/test/java/org/opensearch/neuralsearch/query/HybridQueryIT.java
Outdated
Show resolved
Hide resolved
&& clause.getQuery() instanceof FieldExistsQuery | ||
&& SeqNoFieldMapper.PRIMARY_TERM_NAME.equals(((FieldExistsQuery) clause.getQuery()).getField()) | ||
); | ||
} else if (hasAliasFilter(query, searchContext)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than if else I think we should have if cases for different conditions. I want to know the case when we have both nested fields and alias filter in the query what we will happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test this scenario, let me check how the query will be constructed by core. I hope it's not double wrapped, but both filters are added to one parent bool query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were right @navneet1v , we're not handling scenario with both nested fields and index alias filter properly. Basically nested fields that is first check that catches the query, kicks in and then query is failing. I'm reworking that logic now, in case of either of nested fields or alias filter or both we'll be doing same: get all filters from the parent bool and apply all those filters to every sub-query of the hybrid query.
Example: if both nested and alias filter are there system sends us this query
bool : {
must: [
hybrid_query
],
filter: {
alias_filter1,
alias_filter2,
field_exists (added for nested field)
}
}
we'll rewrite it to following form:
hybrid: {
queries: [
bool : {
must: [
<sub_query_1>
],
filter: {
alias_filter1,
alias_filter2,
field_exists (added for nested field)
}
},
bool : {
must: [
<sub_query_2>
],
filter: {
alias_filter1,
alias_filter2,
field_exists (added for nested field)
}
}
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good we are able to catch this. :) Lets fix this. Can we somehow build our code in a fashion that we don't face this issue again. :D
Signed-off-by: Martin Gaievski <[email protected]>
5a6669e
to
c58ea08
Compare
private static boolean isWrappedHybridQuery(final Query query) { | ||
return query instanceof BooleanQuery | ||
&& ((BooleanQuery) query).clauses().stream().anyMatch(clauseQuery -> clauseQuery.getQuery() instanceof HybridQuery); | ||
} | ||
|
||
@VisibleForTesting | ||
protected Query extractHybridQuery(final SearchContext searchContext, final Query query) { | ||
if (hasNestedFieldOrNestedDocs(query, searchContext) | ||
if ((hasAliasFilter(query, searchContext) || hasNestedFieldOrNestedDocs(query, searchContext)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processing is similar for both cases - we need to rewrite query and add filter(s) to each sub query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool..
src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
fbbca1a
into
opensearch-project:main
* Add support for index alias with filter Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit fbbca1a)
…#672) * Add support for index alias with filter Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit fbbca1a) Co-authored-by: Martin Gaievski <[email protected]>
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.13 2.13
# Navigate to the new working tree
cd .worktrees/backport-2.13
# Create a new branch
git switch --create backport/backport-670-to-2.13
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 fbbca1a2126316b71d7a8ff183b4e79d5625432e
# Push it to GitHub
git push --set-upstream origin backport/backport-670-to-2.13
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.13 Then, create a pull request where the |
* Add support for index alias with filter Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit fbbca1a)
* Add support for index alias with filter Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit fbbca1a) Signed-off-by: Martin Gaievski <[email protected]>
…#676) * Add support for index alias with filter (cherry picked from commit fbbca1a) Signed-off-by: Martin Gaievski <[email protected]> Co-authored-by: Martin Gaievski <[email protected]>
Description
Allowing scenario when hybrid query is executed for index alias that has filter. Today we're blocking such queries because of high level bool query that wraps the hybrid query to incorporate filtering logic. By design we are blocking such queries, hybrid query must be a high level query.
To change this we doing following:
Part of this change is enhanced logic of query phase searcher for checking if the incoming query is "hybrid query". We need to add a special case when index alias filter is present and incoming query is a bool query that wraps hybrid query.
We're changing logic for nested fields case by making it similar to alias filters. Instead of simply removing filter query we're rewriting hybrid query and adding filter query to every sub-query
In addition to new integ tests I've run few scenarios manually. Below is example that is similar to one reported in original GH issue:
Create index with keyword and integer fields, ingest following 8 documents:
create new index alias with filter:
run following query against index alias:
following is result of the query. filter has been applied, but it's not part of the query itself, it's pulled based on the alias name:
Issues Resolved
#627
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.