Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/rrf score normalization v2 #1089

Closed
wants to merge 26 commits into from

Conversation

martin-gaievski
Copy link
Member

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Johnsonisaacn and others added 26 commits December 17, 2024 15:49
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <[email protected]>

Co-authored-by: Varun Jain <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <[email protected]>

Co-authored-by: Varun Jain <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Initial unit test implementation

Signed-off-by: Ryan Bogan <[email protected]>

---------
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Integrate explainability for hybrid query into RRF processor

Signed-off-by: Martin Gaievski <[email protected]>
* add impl

Signed-off-by: zhichao-aws <[email protected]>

* add UT

Signed-off-by: zhichao-aws <[email protected]>

* rename pruneType; UT

Signed-off-by: zhichao-aws <[email protected]>

* changelog

Signed-off-by: zhichao-aws <[email protected]>

* ut

Signed-off-by: zhichao-aws <[email protected]>

* add it

Signed-off-by: zhichao-aws <[email protected]>

* change on 2-phase

Signed-off-by: zhichao-aws <[email protected]>

* UT

Signed-off-by: zhichao-aws <[email protected]>

* it

Signed-off-by: zhichao-aws <[email protected]>

* rename

Signed-off-by: zhichao-aws <[email protected]>

* enhance: more detailed error message

Signed-off-by: zhichao-aws <[email protected]>

* refactor to prune and split

Signed-off-by: zhichao-aws <[email protected]>

* changelog

Signed-off-by: zhichao-aws <[email protected]>

* fix UT cov

Signed-off-by: zhichao-aws <[email protected]>

* address review comments

Signed-off-by: zhichao-aws <[email protected]>

* enlarge score diff range

Signed-off-by: zhichao-aws <[email protected]>

* address comments: check lowScores non null instead of flag

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
* Allow empty string for field in field map

Signed-off-by: Yizhe Liu <[email protected]>

* Allow empty string when validation

Signed-off-by: Yizhe Liu <[email protected]>

* Add to change log

Signed-off-by: Yizhe Liu <[email protected]>

* Update CHANGELOG to: Support empty string for fields in text embedding processor

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
…nested objects (#1040)

* Fix bug where ingestion failed for input document containing list of nested objects

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments to use better method name/implementation

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments: modify the test case to have doc with various fields

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
…es (#1043)

* Fixed mismatch between document source and score fields when sorting is enabled in hybrid query

Signed-off-by: Martin Gaievski <[email protected]>
* add support for builder constructor in neural query builder

Signed-off-by: will-hwang <[email protected]>

* create custom builder class to enforce valid neural query builder instantiation

Signed-off-by: will-hwang <[email protected]>

* refactor code to remove duplicate

Signed-off-by: will-hwang <[email protected]>

* include new constructor in qa packages

Signed-off-by: will-hwang <[email protected]>

* refactor code to remove unnecessary code

Signed-off-by: will-hwang <[email protected]>

* fix bug in neural query builder instantiation

Signed-off-by: will-hwang <[email protected]>

---------

Signed-off-by: will-hwang <[email protected]>
* add hybrid search with rescore IT

Signed-off-by: will-hwang <[email protected]>

* remove rescore in hybrid search IT

Signed-off-by: will-hwang <[email protected]>

* remove previous version checks in build file

Signed-off-by: will-hwang <[email protected]>

* removing version checks only in rolling upgrade tests

Signed-off-by: will-hwang <[email protected]>

* remove newly added tests in restart test

Signed-off-by: will-hwang <[email protected]>

* Revert "remove newly added tests in restart test"

This reverts commit 0987831.

Signed-off-by: will-hwang <[email protected]>

---------

Signed-off-by: will-hwang <[email protected]>
…t has dot in field name (#1062)

* Fix bug where document embedding fails to be generated due to document has dot in field name

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
* Add reindex integration tests

Signed-off-by: Andy Qin <“[email protected]”>
* Fix github CI by adding eclipse dependency in formatting.gradle

Signed-off-by: Varun Jain <[email protected]>

* Add changelog

Signed-off-by: Varun Jain <[email protected]>

---------

Signed-off-by: Varun Jain <[email protected]>
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <[email protected]>

Co-authored-by: Varun Jain <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Initial unit test implementation

Signed-off-by: Ryan Bogan <[email protected]>

---------
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Integrate explainability for hybrid query into RRF processor

Signed-off-by: Martin Gaievski <[email protected]>
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 87.33333% with 19 lines in your changes missing coverage. Please review.

Project coverage is 80.47%. Comparing base (b4cb267) to head (a661ca3).

Files with missing lines Patch % Lines
...rocessor/normalization/ScoreNormalizationUtil.java 41.17% 6 Missing and 4 partials ⚠️
...pensearch/neuralsearch/processor/RRFProcessor.java 83.78% 0 Missing and 6 partials ⚠️
...rmalization/MinMaxScoreNormalizationTechnique.java 60.00% 2 Missing ⚠️
...essor/normalization/RRFNormalizationTechnique.java 97.72% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1089      +/-   ##
============================================
+ Coverage     80.19%   80.47%   +0.28%     
- Complexity     1139     1198      +59     
============================================
  Files            87       93       +6     
  Lines          3953     4077     +124     
  Branches        666      681      +15     
============================================
+ Hits           3170     3281     +111     
- Misses          531      536       +5     
- Partials        252      260       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants