-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate max_token_score field of neural_sparse query #478
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## 2.x #478 +/- ##
============================================
+ Coverage 80.95% 85.53% +4.58%
- Complexity 512 516 +4
============================================
Files 41 40 -1
Lines 1591 1521 -70
Branches 247 238 -9
============================================
+ Hits 1288 1301 +13
+ Misses 197 112 -85
- Partials 106 108 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the coverage thing.
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Show resolved
Hide resolved
@zhichao-aws can you share some light around how lucene is taking care of this? Does lucene has added this capability? how does this impact the customer queries and search relevancy who is using this field. |
@zhichao-aws can you also add results of the tests which has been done, to ensure that if customer doesn't provide this deprecated field the queries are not impacted. |
@zhichao-aws can you also add details on how we tested the upgrades? |
@VisibleForTesting | ||
static final ParseField MAX_TOKEN_SCORE_FIELD = new ParseField("max_token_score"); | ||
static final ParseField MAX_TOKEN_SCORE_FIELD = new ParseField("max_token_score").withAllDeprecated(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add the @deprecated annotation on top of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
@@ -88,7 +93,6 @@ public void testFromXContent_whenBuiltWithOptionals_thenBuildSuccessfully() { | |||
"VECTOR_FIELD": { | |||
"query_text": "string", | |||
"model_id": "string", | |||
"max_token_score": 123.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep a unit test where we are providing the deprecated field and no impact on queries are happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already have this.
Please refer this lucene PR for more details. In short words, Lucene used So in our plugin, we don't need to provide the score upperbound for FeatureQuery after lucene 9.8. We conducted some tests, this does reduce neural_sparse latency at a large margin. For 8.8 million docs case the latency was reduced to about 1/3 compared with lucene 9.7 case. This optimization doesn't affect the search result, only improves the speed of searching in shards. So it won't hurt the search relevance. Since we still keep the api compatibility in 2.x, users can still use existing queries, we'll ignore the |
The deprecated field was optional in 2.11 release, so customers can always not privide this field. |
We have integ test for setting and doesn't setting this field: We also have unit test to check we still parse this field, but will log a warning. |
Hi @navneet1v , I've added some comments and commits, could you please help review again? |
BTW, this should be merged before we bump version to 2.12.0. For the old code there will be compilation errors due to the changes of lucene internal interface. |
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
force push to rebase 2.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
In
neural_sparse
query, max_token_score field was used for sub-clause pruning of WAND scorer (lucene 9.7). Since we'll upgrade to lucene 9.8 in the next release, the inner logic in lucene changed and we don't need this field any more. We need to deprecate this field. To be more specific, in 2.x, user can still set this field but we'll ignore this and give a warning log. In 3.x, we don't parse this field.Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.