Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing execution of hybrid query on index alias with filters #670

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Apr 4, 2024

Description

Allowing scenario when hybrid query is executed for index alias that has filter. Today we're blocking such queries because of high level bool query that wraps the hybrid query to incorporate filtering logic. By design we are blocking such queries, hybrid query must be a high level query.

To change this we doing following:

  • extract hybrid query from compound query with the alias filter
  • wrap each sub query of the hybrid query into bool query in a following form:
  bool : {
     must: [
       <original_sub_query>
     ],
     filter: {
       <alias_filter>
     }
  }
  • rewrite hybrid query to following form:
 hybrid: {
      queries: [
          { sub_query1_wrapped_into_bool },
          { sub_query2_wrapped_into_bool }
      ]
 }

Part of this change is enhanced logic of query phase searcher for checking if the incoming query is "hybrid query". We need to add a special case when index alias filter is present and incoming query is a bool query that wraps hybrid query.

We're changing logic for nested fields case by making it similar to alias filters. Instead of simply removing filter query we're rewriting hybrid query and adding filter query to every sub-query

In addition to new integ tests I've run few scenarios manually. Below is example that is similar to one reported in original GH issue:

Create index with keyword and integer fields, ingest following 8 documents:

POST /_bulk

{ "index": { "_index": "my-nlp-index" } }
{ "category": "permission", "doc_keyword": "workable", "doc_index": 4976, "doc_price": 100}
{ "index": { "_index": "my-nlp-index" } }
{ "category": "sister", "doc_keyword": "angry", "doc_index": 2231, "doc_price": 200 }
{ "index": { "_index": "my-nlp-index" } }
{ "category": "hair", "doc_keyword": "likeable", "doc_price": 25 }
{ "index": { "_index": "my-nlp-index" } }
{ "category": "editor", "doc_index": 9871, "doc_price": 30 }
{ "index": { "_index": "my-nlp-index" } }
{ "category": "statement", "doc_keyword": "entire", "doc_index": 8242, "doc_price": 350  } 
{ "index": { "_index": "my-nlp-index" } }
{ "category": "statement", "doc_keyword": "idea", "doc_index": 5212, "doc_price": 200  } 
{ "index": { "_index": "my-nlp-index" } }
{ "category": "editor", "doc_keyword": "bubble", "doc_index": 1298, "doc_price": 130 } 
{ "index": { "_index": "my-nlp-index" } }
{ "category": "editor", "doc_keyword": "bubble", "doc_index": 521, "doc_price": 75  } 

create new index alias with filter:

POST /_aliases
{
    "actions": [
        {
            "add": {
                "index": "my-nlp-index",
                "alias": "alias_filter_1",
                "filter": {
                    "bool": {
                        "must_not": {
                            "term": {
                                "category": "statement"
                            }
                        }
                    }
                }
            }
        }
    ]
}

run following query against index alias:

GET /alias_filter_1/_search?search_pipeline=nlp-search-pipeline
{
    "query": {
        "hybrid": {
            "queries": [
                {
                    "range": {
                        "doc_index": {
                            "gte": 20,
                            "lte": 100
                        }
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "match": {
                                    "doc_keyword": "likeable"
                                }
                            },
                            {
                                "term": {
                                    "category": "statement"
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

following is result of the query. filter has been applied, but it's not part of the query itself, it's pulled based on the alias name:

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.5,
        "hits": [
            {
                "_index": "my-nlp-index",
                "_id": "jNJ7p44B1ntc6ZjxQK8q",
                "_score": 0.5,
                "_source": {
                    "category": "hair",
                    "doc_keyword": "likeable",
                    "doc_price": 25
                }
            }
        ]
    }
}

Issues Resolved

#627

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@martin-gaievski martin-gaievski added backport 2.x Label will add auto workflow to backport PR to 2.x branch Bug Fixes Changes to a system or product designed to handle a programming bug/glitch v2.14.0 labels Apr 4, 2024
@martin-gaievski martin-gaievski force-pushed the add_support_alias_with_filter_to_hybrid_query branch from 389ee6b to c23b638 Compare April 4, 2024 04:48
Copy link

codecov bot commented Apr 4, 2024

Codecov Report

Attention: Patch coverage is 92.59259% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 84.42%. Comparing base (cc6a6b2) to head (c17a0ff).

Files Patch % Lines
...org/opensearch/neuralsearch/query/HybridQuery.java 92.30% 0 Missing and 1 partial ⚠️
...lsearch/search/query/HybridQueryPhaseSearcher.java 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #670      +/-   ##
============================================
+ Coverage     84.04%   84.42%   +0.37%     
- Complexity      744      750       +6     
============================================
  Files            59       59              
  Lines          2313     2324      +11     
  Branches        374      375       +1     
============================================
+ Hits           1944     1962      +18     
+ Misses          214      213       -1     
+ Partials        155      149       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@martin-gaievski
Copy link
Member Author

BWC will keep failing unless all dependent repos are switched to 2.14 snapshot version, in particular: knn, ml-commons, common-utils. Also this PR should be merged for neural-search repo: #653

@martin-gaievski martin-gaievski force-pushed the add_support_alias_with_filter_to_hybrid_query branch 2 times, most recently from 2561ea1 to 273fe4e Compare April 4, 2024 16:01
@martin-gaievski martin-gaievski marked this pull request as ready for review April 4, 2024 16:02
@martin-gaievski martin-gaievski changed the title Fixed exception when running hybrid query on index alias with filter Running hybrid query on index alias with filters Apr 4, 2024
@martin-gaievski martin-gaievski removed the Bug Fixes Changes to a system or product designed to handle a programming bug/glitch label Apr 4, 2024
@martin-gaievski martin-gaievski changed the title Running hybrid query on index alias with filters Allowing execution of hybrid query on index alias with filters Apr 5, 2024
@martin-gaievski martin-gaievski force-pushed the add_support_alias_with_filter_to_hybrid_query branch from 273fe4e to fce70fb Compare April 5, 2024 16:33
&& clause.getQuery() instanceof FieldExistsQuery
&& SeqNoFieldMapper.PRIMARY_TERM_NAME.equals(((FieldExistsQuery) clause.getQuery()).getField())
);
} else if (hasAliasFilter(query, searchContext)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than if else I think we should have if cases for different conditions. I want to know the case when we have both nested fields and alias filter in the query what we will happen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test this scenario, let me check how the query will be constructed by core. I hope it's not double wrapped, but both filters are added to one parent bool query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right @navneet1v , we're not handling scenario with both nested fields and index alias filter properly. Basically nested fields that is first check that catches the query, kicks in and then query is failing. I'm reworking that logic now, in case of either of nested fields or alias filter or both we'll be doing same: get all filters from the parent bool and apply all those filters to every sub-query of the hybrid query.
Example: if both nested and alias filter are there system sends us this query

  bool : {
     must: [
       hybrid_query
     ],
     filter: {
       alias_filter1,
       alias_filter2,
       field_exists (added for nested field)
     }
  }

we'll rewrite it to following form:

 hybrid: {
      queries: [
          bool : {
              must: [
                 <sub_query_1>
               ],
               filter: {
                  alias_filter1,
                  alias_filter2,
                  field_exists (added for nested field)
                }
          },
          bool : {
              must: [
                 <sub_query_2>
               ],
               filter: {
                  alias_filter1,
                  alias_filter2,
                  field_exists (added for nested field)
                }
          }
      ]
 }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good we are able to catch this. :) Lets fix this. Can we somehow build our code in a fashion that we don't face this issue again. :D

@martin-gaievski martin-gaievski force-pushed the add_support_alias_with_filter_to_hybrid_query branch from 5a6669e to c58ea08 Compare April 5, 2024 22:10
private static boolean isWrappedHybridQuery(final Query query) {
return query instanceof BooleanQuery
&& ((BooleanQuery) query).clauses().stream().anyMatch(clauseQuery -> clauseQuery.getQuery() instanceof HybridQuery);
}

@VisibleForTesting
protected Query extractHybridQuery(final SearchContext searchContext, final Query query) {
if (hasNestedFieldOrNestedDocs(query, searchContext)
if ((hasAliasFilter(query, searchContext) || hasNestedFieldOrNestedDocs(query, searchContext))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processing is similar for both cases - we need to rewrite query and add filter(s) to each sub query

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool..

@martin-gaievski martin-gaievski merged commit fbbca1a into opensearch-project:main Apr 8, 2024
54 of 60 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 8, 2024
* Add support for index alias with filter

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit fbbca1a)
martin-gaievski added a commit that referenced this pull request Apr 8, 2024
…#672)

* Add support for index alias with filter

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit fbbca1a)

Co-authored-by: Martin Gaievski <[email protected]>
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.13 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.13 2.13
# Navigate to the new working tree
cd .worktrees/backport-2.13
# Create a new branch
git switch --create backport/backport-670-to-2.13
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 fbbca1a2126316b71d7a8ff183b4e79d5625432e
# Push it to GitHub
git push --set-upstream origin backport/backport-670-to-2.13
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.13

Then, create a pull request where the base branch is 2.13 and the compare/head branch is backport/backport-670-to-2.13.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 8, 2024
* Add support for index alias with filter

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit fbbca1a)
martin-gaievski added a commit that referenced this pull request Apr 8, 2024
* Add support for index alias with filter

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit fbbca1a)
Signed-off-by: Martin Gaievski <[email protected]>
martin-gaievski added a commit that referenced this pull request Apr 8, 2024
…#676)

* Add support for index alias with filter


(cherry picked from commit fbbca1a)

Signed-off-by: Martin Gaievski <[email protected]>
Co-authored-by: Martin Gaievski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch backport 2.13 v2.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants