Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Reciprocal Rank Fusion (RRF) in hybrid query #1086

Merged
merged 16 commits into from
Jan 14, 2025

Conversation

ryanbogan
Copy link
Member

@ryanbogan ryanbogan commented Jan 9, 2025

Description

Merges the feature branch for Reciprocal Rank Fusion (RRF) now that we have App Sec sign-off

Contains changes from the following PR's:

Related Issues

#659

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • [] API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@yuye-aws
Copy link
Member

@ryanbogan Can you resolve the conflicting files?

@yuye-aws
Copy link
Member

@martin-gaievski I remember there was a PR towards the feature branch. Do you know how to compare these two PRs?

@martin-gaievski
Copy link
Member

@martin-gaievski I remember there was a PR towards the feature branch. Do you know how to compare these two PRs?

while feature is in development we merge every PR to feature branch. Once we're code complete and other things like app sec are done we merge everything from feature branch to main. This PR is exactly that, @ryanbogan has listed all previously merged to feature branch PRs in the description

Johnsonisaacn and others added 7 commits January 13, 2025 11:50
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <[email protected]>

Co-authored-by: Varun Jain <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <[email protected]>

Co-authored-by: Varun Jain <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Initial unit test implementation

Signed-off-by: Ryan Bogan <[email protected]>

---------
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Integrate explainability for hybrid query into RRF processor

Signed-off-by: Martin Gaievski <[email protected]>
* add impl

Signed-off-by: zhichao-aws <[email protected]>

* add UT

Signed-off-by: zhichao-aws <[email protected]>

* rename pruneType; UT

Signed-off-by: zhichao-aws <[email protected]>

* changelog

Signed-off-by: zhichao-aws <[email protected]>

* ut

Signed-off-by: zhichao-aws <[email protected]>

* add it

Signed-off-by: zhichao-aws <[email protected]>

* change on 2-phase

Signed-off-by: zhichao-aws <[email protected]>

* UT

Signed-off-by: zhichao-aws <[email protected]>

* it

Signed-off-by: zhichao-aws <[email protected]>

* rename

Signed-off-by: zhichao-aws <[email protected]>

* enhance: more detailed error message

Signed-off-by: zhichao-aws <[email protected]>

* refactor to prune and split

Signed-off-by: zhichao-aws <[email protected]>

* changelog

Signed-off-by: zhichao-aws <[email protected]>

* fix UT cov

Signed-off-by: zhichao-aws <[email protected]>

* address review comments

Signed-off-by: zhichao-aws <[email protected]>

* enlarge score diff range

Signed-off-by: zhichao-aws <[email protected]>

* address comments: check lowScores non null instead of flag

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
yizheliu-amazon and others added 9 commits January 13, 2025 13:20
…nested objects (#1040)

* Fix bug where ingestion failed for input document containing list of nested objects

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments to use better method name/implementation

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments: modify the test case to have doc with various fields

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
* add support for builder constructor in neural query builder

Signed-off-by: will-hwang <[email protected]>

* create custom builder class to enforce valid neural query builder instantiation

Signed-off-by: will-hwang <[email protected]>

* refactor code to remove duplicate

Signed-off-by: will-hwang <[email protected]>

* include new constructor in qa packages

Signed-off-by: will-hwang <[email protected]>

* refactor code to remove unnecessary code

Signed-off-by: will-hwang <[email protected]>

* fix bug in neural query builder instantiation

Signed-off-by: will-hwang <[email protected]>

---------

Signed-off-by: will-hwang <[email protected]>
* add hybrid search with rescore IT

Signed-off-by: will-hwang <[email protected]>

* remove rescore in hybrid search IT

Signed-off-by: will-hwang <[email protected]>

* remove previous version checks in build file

Signed-off-by: will-hwang <[email protected]>

* removing version checks only in rolling upgrade tests

Signed-off-by: will-hwang <[email protected]>

* remove newly added tests in restart test

Signed-off-by: will-hwang <[email protected]>

* Revert "remove newly added tests in restart test"

This reverts commit 0987831.

Signed-off-by: will-hwang <[email protected]>

---------

Signed-off-by: will-hwang <[email protected]>
…t has dot in field name (#1062)

* Fix bug where document embedding fails to be generated due to document has dot in field name

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <[email protected]>

Co-authored-by: Varun Jain <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Initial unit test implementation

Signed-off-by: Ryan Bogan <[email protected]>

---------
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Integrate explainability for hybrid query into RRF processor

Signed-off-by: Martin Gaievski <[email protected]>
@martin-gaievski martin-gaievski force-pushed the feature/rrf-score-normalization-v2 branch from a661ca3 to 312c7f7 Compare January 13, 2025 23:17
Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 87.66234% with 19 lines in your changes missing coverage. Please review.

Project coverage is 80.51%. Comparing base (b084838) to head (312c7f7).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...rocessor/normalization/ScoreNormalizationUtil.java 41.17% 6 Missing and 4 partials ⚠️
...pensearch/neuralsearch/processor/RRFProcessor.java 84.21% 0 Missing and 6 partials ⚠️
...rmalization/MinMaxScoreNormalizationTechnique.java 60.00% 2 Missing ⚠️
...essor/normalization/RRFNormalizationTechnique.java 97.72% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1086      +/-   ##
============================================
+ Coverage     80.38%   80.51%   +0.12%     
- Complexity     1157     1211      +54     
============================================
  Files            87       93       +6     
  Lines          4018     4141     +123     
  Branches        682      697      +15     
============================================
+ Hits           3230     3334     +104     
- Misses          533      540       +7     
- Partials        255      267      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@martin-gaievski martin-gaievski added backport 2.x Label will add auto workflow to backport PR to 2.x branch v2.19.0 labels Jan 14, 2025
@martin-gaievski martin-gaievski merged commit f6d8a12 into main Jan 14, 2025
77 of 79 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1086-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f6d8a12de3dca307b8f2638bbeb6fcd930ee304e
# Push it to GitHub
git push --set-upstream origin backport/backport-1086-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1086-to-2.x.

@ryanbogan ryanbogan added backport 2.x Label will add auto workflow to backport PR to 2.x branch and removed backport 2.x Label will add auto workflow to backport PR to 2.x branch labels Jan 14, 2025
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 14, 2025
* Reciprocal Rank Fusion (RRF) normalization technique in hybrid query (#874)

---------
Signed-off-by: Isaac Johnson <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit f6d8a12)
ryanbogan added a commit to ryanbogan/neural-search that referenced this pull request Jan 14, 2025
…ct#1086)

* Reciprocal Rank Fusion (RRF) normalization technique in hybrid query (opensearch-project#874)

---------
Signed-off-by: Isaac Johnson <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
vibrantvarun pushed a commit that referenced this pull request Jan 15, 2025
…1103)

* Adding Reciprocal Rank Fusion (RRF) in hybrid query (#1086)

* Reciprocal Rank Fusion (RRF) normalization technique in hybrid query (#874)

---------
Signed-off-by: Isaac Johnson <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>

* Fix failing compile

Signed-off-by: Ryan Bogan <[email protected]>

* Fix test compile

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch v2.19.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants