-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable numeric sort optimisation for few numerical sort types #6321
Conversation
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Chaitanya, Can we add some tests around this change. Also might be good to see some perf numbers around this change
server/src/main/java/org/opensearch/index/fielddata/IndexNumericFieldData.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/fielddata/IndexNumericFieldData.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gashutos Can you fix the spotless errors?
server/src/main/java/org/opensearch/index/fielddata/IndexNumericFieldData.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/fielddata/IndexNumericFieldData.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/fielddata/IndexNumericFieldData.java
Outdated
Show resolved
Hide resolved
Bleh. This is a long standing todo from the Lucene 9 upgrade! The problem really originates from LUCENE-9280 which introduced an optimization that skips non-competitive docs by using the BKD index when sorting by fields other than The wider fix that @reta is referring to is much more involved as it's going to require logic for handling types when merging results coming from different indexes. So I'm +1 for a quick patch (w/ bugfix backport) to temporarily fix critical performance regressions like sorting by non |
Thanks for the review Bukhtawar ! The results I've added already in description of this PR. But they were hard to read since I didnt format those :) Below are the results from OS_2.3. (ran on managed AOS)
|
The worst case, newly introduced type would not able to get sort optimisation advantage. Correctness would be still intact. The plan is anyway to support all field types some time later on. |
Where will be the decider logic in that case ? if to apply optimisation or not ? You mean move the switch code block in that contract method ? |
There won't be switch because each numeric type (LONG/INT/...) would implement the method, the types which do not support optimization(s) would do nothing. |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
that would be similar to boolean in ctor right ? Its just 50% of types wont implement the methods. (depending on how many types we support at that point of time). How about adding a static set in NumericType,
And we will refer this here |
The comment here would easily hint dev who is adding new numerical type. |
I was thinking about something along these lines: It is similar to the flag but a) we don't have this boolean flag anymore b) it becomes cleaner in a sense that sort optimization are delegated directly to numeric type |
The difference between the comment and contract: the dev has to actually run into this comment somehow vs contract - the new numeric type just will have to implement the method, and it will force the dev to do something about it. |
Signed-off-by: Nicholas Walter Knize <[email protected]>
I agree with this. I pushed a commit to add a deprecated abstract |
Looks good to me. I was just afraid about the readability part, but that's fine. |
server/src/main/java/org/opensearch/index/fielddata/IndexNumericFieldData.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Nicholas Walter Knize <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this balance of numeric API while keeping the tech debt guardrails in place! Thanks to everyone that helped weigh in but a HUGE thanks to @gashutos for the benchmarks and jumping on this long running todo!! Huge contribution!
Thank you @reta & @nknize @andrross @Bukhtawar for all in this PR. I will keep you guys posted about my finding for rest of the numeric types which we are not supporting. The optimization we are gaining is pretty good to enable rest of types as well. |
Gradle Check (Jenkins) Run Completed with:
|
This commit restores the sort optimization to use BKD to skip non-competitive docs for numeric types whose BYTES size match between the BKD leaf and doc values encoding. For now this is only LONG, DOUBLE, DATE, and DATE_NANOSECONDS as the remaining NumericTypes use 64bit docvalue encoding while the BKD uses smaller byte encoded space. This also updates the QueryPhase to remove the long time unnecessary in order doc id check and minDoc boolean query for skipping non-competitive docs that is handled by all Lucene 7.0+ sorted indexes. Existing tests are updated. Signed-off-by: Nicholas Walter Knize <[email protected]> Signed-off-by: gashutos <[email protected]> Co-authored-by: Nicholas Walter Knize <[email protected]> Co-authored-by: Chaitanya Gohel <[email protected]> (cherry picked from commit 6bb9e3e) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…#6330) This commit restores the sort optimization to use BKD to skip non-competitive docs for numeric types whose BYTES size match between the BKD leaf and doc values encoding. For now this is only LONG, DOUBLE, DATE, and DATE_NANOSECONDS as the remaining NumericTypes use 64bit docvalue encoding while the BKD uses smaller byte encoded space. This also updates the QueryPhase to remove the long time unnecessary in order doc id check and minDoc boolean query for skipping non-competitive docs that is handled by all Lucene 7.0+ sorted indexes. Existing tests are updated. (cherry picked from commit 6bb9e3e) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Nicholas Walter Knize <[email protected]> Co-authored-by: Chaitanya Gohel <[email protected]>
Description
Enabling numeric sort optimisation for below 4 numeric ypes.
For above 4 types, we have same sort type and point type. So should not be any harm doing like that.
Lucene gives us in-built ability to optimise sorting on certain sort field types where its point type is matching. i.e for fields with data type Date or Long, we will able to use this optimisation.
There already exists a check in our code to enable this optimisation for only Date/Long data types. This was introduced as part of PR https://github.com/opensearch-project/OpenSearch/pull/1974/files
As part of PR (during upgrade of Lucene 9.0.0) https://github.com/opensearch-project/OpenSearch/pull/1109/files, we have removed the numeric sort optimisations thus causing a regression in sort performance. I think this has been removed due to its deprecated method.
but Lucene didn't say it's numeric optimisation would be deprecated, but the method introduced newly to enable/disable optimisation would be deprecated. apache/lucene@cc58c51. Since these optimisations are enabled by default, earlier these optimisations were disabled by default.
I tried adding back the code in https://github.com/opensearch-project/OpenSearch/pull/1974/files with sortField.setOptimizeSortWithPoints(true); for Date and Long data types and with that we were able to achieve similar performance (before the removal) in Open Search 2.3.
Below are the results from OS_2.3. (ran on managed AOS)
Units in ms.
Issues Resolved
[OpenSearch-Project-5534]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.