-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for generic re-ranker interface and opensearch ml re-ranker for improving search relavancy. #494
Adding support for generic re-ranker interface and opensearch ml re-ranker for improving search relavancy. #494
Conversation
@navneet1v @vamshin Reranking |
@HenryL27 before I can review this PR, can we make sure that GH actions are successful |
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/CrossEncoderRerankProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/RescoringRerankProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/rerank/RerankProcessor.java
Outdated
Show resolved
Hide resolved
Sure. It's blocked behind opensearch-project/ml-commons#1615, but once that gets merged, which should be soon (right @ylwu-amzn?) this should hopefully do better |
Please go ahead and resolve the conflict too. |
@HenryL27 this PR is not updated with the recent comments I added on your RFC here: #485 (comment) I don't see any response from your side on the interface changes that were recommended. Hence pasting the comment here. Please check those comments. |
So sorry! Thank you for reminding me about this |
bug that I came across: if the reranking_context_field doesn't exist in one of the search results, this fails (with npe). I'm thinking the correct behavior in this case is to assign the lowest seen score to such docs? @martin-gaievski wdyt? |
do you know why reranking_context doesn't exist? without knowing more info it's hard to decide on what score we should assign, lowest seen score maybe not a best option in some cases, say missing context means there are no matches but lowest score mean - there is a hit with lowest score. |
context field doesn't exist because it simply wasn't present in that particular document - was doing a parent-children index, and the parent doesn't have a |
merged opensearch-project/ml-commons#1615 so can probly run the workflow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took first pass on the PR and was able to only complete till DocumentContextSourceFetcher.java. Will do the next review once the above comments are resolved and code is updated based on the suggestions provided on the RFC.
src/main/java/org/opensearch/neuralsearch/processor/rerank/RerankType.java
Outdated
Show resolved
Hide resolved
* @param label label of a RerankType | ||
* @return RerankType represented by the label | ||
*/ | ||
public static RerankType from(String label) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create this function?
can we use RerankType.valueOf() function provided in Enum classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a capitalization thing... .valueOf()
wants an exact match which would mean that I either lowercase my RerankTypes or uppercase the API. Would it be easier to digest this if I used a hash instead? I don't think I should require that I call .upper()
on all my strings
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RerankProcessorFactory.java
Outdated
Show resolved
Hide resolved
|
||
private String contextFromSearchHit(final SearchHit hit, final String field) { | ||
if (hit.getFields().containsKey(field)) { | ||
return (String) hit.field(field).getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this type casting work for the integers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope! but String.valueOf(.)
does the right thing, right?
src/main/java/org/opensearch/neuralsearch/processor/rerank/DocumentContextSourceFetcher.java
Outdated
Show resolved
Hide resolved
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
Signed-off-by: HenryL27 <[email protected]>
@navneet1v do we need to add BWC test here? |
as this is a first release of the feature we don't need it. But after the release BWC tests needs to be added. |
looks like knn things are causing integ tests to fail. What's going on here? |
There is codec upgrade which is happened in Opensearch due to lucene upgrade and impacted k-NN. The PR for k-NN is already raised and will be merged soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall code looks good to me. Approving the PR.
As a next step I will add more details on the RFC around what is the next steps after the PR is approved.
thanks! |
"Only those with write access to this repository can merge pull requests." |
3a7903f
into
opensearch-project:feature/reranker
Will merge it now, it goes to a feature branch. We'll need to perform certain intake activities like review with security team, that's going to be based on a feature branch, only once that completed code can be merged to |
ofc, thanks |
…anker for improving search relavancy. (opensearch-project#494) * Add rerank processor interfaces Signed-off-by: HenryL27 <[email protected]> * add cross-encoder specific logic and factory Signed-off-by: HenryL27 <[email protected]> * add unittests Signed-off-by: HenryL27 <[email protected]> * add integration test Signed-off-by: HenryL27 <[email protected]> * use string.format() instead of concatenation Signed-off-by: HenryL27 <[email protected]> * rename generateScoringContext to generateRerankingContext Signed-off-by: HenryL27 <[email protected]> * add name change in test too. whoops Signed-off-by: HenryL27 <[email protected]> * start refactoring with contextSaourceFetchers Signed-off-by: HenryL27 <[email protected]> * refactor to use contextSourceFetchers to get context Signed-off-by: HenryL27 <[email protected]> * rename CrossEncoder to TextSimilarity Signed-off-by: HenryL27 <[email protected]> * add query_context layer to search ext Signed-off-by: HenryL27 <[email protected]> * add javadocs Signed-off-by: HenryL27 <[email protected]> * update to new asyncProcessResponse api Signed-off-by: HenryL27 <[email protected]> * rename reranktype to ML_OPENSEARCH Signed-off-by: HenryL27 <[email protected]> * improve error messages for bad rerank type config Signed-off-by: HenryL27 <[email protected]> * simplify configuration/factory logic Signed-off-by: HenryL27 <[email protected]> * improve handling for non-flat-string context fields Signed-off-by: HenryL27 <[email protected]> * rename TextSimilarity files to MLOpenSearch files Signed-off-by: HenryL27 <[email protected]> * apply spotless after rebase Signed-off-by: HenryL27 <[email protected]> * update changelog Signed-off-by: HenryL27 <[email protected]> * after rebase Signed-off-by: HenryL27 <[email protected]> * Address pr comments and fix XContent in search ext Signed-off-by: HenryL27 <[email protected]> * move contextSourceFetchers to their own subdirectory Signed-off-by: HenryL27 <[email protected]> * Apply suggestions from code review Co-authored-by: Martin Gaievski <[email protected]> Signed-off-by: HenryL27 <[email protected]> * CR changes Signed-off-by: HenryL27 <[email protected]> * finish CR comments and fix broken unittest Signed-off-by: HenryL27 <[email protected]> * fix unittest names Signed-off-by: HenryL27 <[email protected]> --------- Signed-off-by: HenryL27 <[email protected]> Co-authored-by: Martin Gaievski <[email protected]>
…anker for improving search relavancy. (opensearch-project#494) * Add rerank processor interfaces Signed-off-by: HenryL27 <[email protected]> * add cross-encoder specific logic and factory Signed-off-by: HenryL27 <[email protected]> * add unittests Signed-off-by: HenryL27 <[email protected]> * add integration test Signed-off-by: HenryL27 <[email protected]> * use string.format() instead of concatenation Signed-off-by: HenryL27 <[email protected]> * rename generateScoringContext to generateRerankingContext Signed-off-by: HenryL27 <[email protected]> * add name change in test too. whoops Signed-off-by: HenryL27 <[email protected]> * start refactoring with contextSaourceFetchers Signed-off-by: HenryL27 <[email protected]> * refactor to use contextSourceFetchers to get context Signed-off-by: HenryL27 <[email protected]> * rename CrossEncoder to TextSimilarity Signed-off-by: HenryL27 <[email protected]> * add query_context layer to search ext Signed-off-by: HenryL27 <[email protected]> * add javadocs Signed-off-by: HenryL27 <[email protected]> * update to new asyncProcessResponse api Signed-off-by: HenryL27 <[email protected]> * rename reranktype to ML_OPENSEARCH Signed-off-by: HenryL27 <[email protected]> * improve error messages for bad rerank type config Signed-off-by: HenryL27 <[email protected]> * simplify configuration/factory logic Signed-off-by: HenryL27 <[email protected]> * improve handling for non-flat-string context fields Signed-off-by: HenryL27 <[email protected]> * rename TextSimilarity files to MLOpenSearch files Signed-off-by: HenryL27 <[email protected]> * apply spotless after rebase Signed-off-by: HenryL27 <[email protected]> * update changelog Signed-off-by: HenryL27 <[email protected]> * after rebase Signed-off-by: HenryL27 <[email protected]> * Address pr comments and fix XContent in search ext Signed-off-by: HenryL27 <[email protected]> * move contextSourceFetchers to their own subdirectory Signed-off-by: HenryL27 <[email protected]> * Apply suggestions from code review Co-authored-by: Martin Gaievski <[email protected]> Signed-off-by: HenryL27 <[email protected]> * CR changes Signed-off-by: HenryL27 <[email protected]> * finish CR comments and fix broken unittest Signed-off-by: HenryL27 <[email protected]> * fix unittest names Signed-off-by: HenryL27 <[email protected]> --------- Signed-off-by: HenryL27 <[email protected]> Co-authored-by: Martin Gaievski <[email protected]>
* Adding support for generic re-ranker interface and opensearch ml re-ranker for improving search relavancy. (#494) Signed-off-by: HenryL27 <[email protected]> Co-authored-by: Heemin Kim <[email protected]> Signed-off-by: Martin Gaievski <[email protected]> --------- Signed-off-by: HenryL27 <[email protected]> Signed-off-by: Martin Gaievski <[email protected]> Co-authored-by: HenryL27 <[email protected]> Co-authored-by: Heemin Kim <[email protected]>
* Adding support for generic re-ranker interface and opensearch ml re-ranker for improving search relavancy. (opensearch-project#494) Signed-off-by: HenryL27 <[email protected]> Co-authored-by: Heemin Kim <[email protected]> Signed-off-by: Martin Gaievski <[email protected]> --------- Signed-off-by: HenryL27 <[email protected]> Signed-off-by: Martin Gaievski <[email protected]> Co-authored-by: HenryL27 <[email protected]> Co-authored-by: Heemin Kim <[email protected]> (cherry picked from commit 1bb48e2)
* Adding support for generic re-ranker interface and opensearch ml re-ranker for improving search relavancy. (opensearch-project#494) Signed-off-by: HenryL27 <[email protected]> Co-authored-by: Heemin Kim <[email protected]> Signed-off-by: Martin Gaievski <[email protected]> --------- Signed-off-by: HenryL27 <[email protected]> Signed-off-by: Martin Gaievski <[email protected]> Co-authored-by: HenryL27 <[email protected]> Co-authored-by: Heemin Kim <[email protected]> (cherry picked from commit 1bb48e2) Signed-off-by: Martin Gaievski <[email protected]>
* Adding support for generic re-ranker interface and opensearch ml re-ranker for improving search relavancy. (#494) (cherry picked from commit 1bb48e2) Signed-off-by: HenryL27 <[email protected]> Signed-off-by: Martin Gaievski <[email protected]> Co-authored-by: HenryL27 <[email protected]> Co-authored-by: Heemin Kim <[email protected]>
* Adding support for generic re-ranker interface and opensearch ml re-ranker for improving search relavancy. (opensearch-project#494) Signed-off-by: HenryL27 <[email protected]> Co-authored-by: Heemin Kim <[email protected]> Signed-off-by: Martin Gaievski <[email protected]> --------- Signed-off-by: HenryL27 <[email protected]> Signed-off-by: Martin Gaievski <[email protected]> Co-authored-by: HenryL27 <[email protected]> Co-authored-by: Heemin Kim <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Description
Adds a rerank processor interface and cross-encoder rerank processor implementation
Search with
Issues Resolved
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.