Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable numeric sort optimisation for few numerical sort types #6321

Merged
merged 10 commits into from
Feb 15, 2023

Conversation

gashutos
Copy link
Contributor

@gashutos gashutos commented Feb 14, 2023

Description

Enabling numeric sort optimisation for below 4 numeric ypes.

  1. DATE
  2. DATE_NANOSECONDS
  3. LONG
  4. DOUBLE

For above 4 types, we have same sort type and point type. So should not be any harm doing like that.

Lucene gives us in-built ability to optimise sorting on certain sort field types where its point type is matching. i.e for fields with data type Date or Long, we will able to use this optimisation.
There already exists a check in our code to enable this optimisation for only Date/Long data types. This was introduced as part of PR https://github.com/opensearch-project/OpenSearch/pull/1974/files
As part of PR (during upgrade of Lucene 9.0.0) https://github.com/opensearch-project/OpenSearch/pull/1109/files, we have removed the numeric sort optimisations thus causing a regression in sort performance. I think this has been removed due to its deprecated method.
but Lucene didn't say it's numeric optimisation would be deprecated, but the method introduced newly to enable/disable optimisation would be deprecated. apache/lucene@cc58c51. Since these optimisations are enabled by default, earlier these optimisations were disabled by default.

I tried adding back the code in https://github.com/opensearch-project/OpenSearch/pull/1974/files with sortField.setOptimizeSortWithPoints(true); for Date and Long data types and with that we were able to achieve similar performance (before the removal) in Open Search 2.3.

Below are the results from OS_2.3. (ran on managed AOS)
Units in ms.

Data Node Type. Sort Mode OS 2.3 OS 2.3 (before)
r6g.2xlarge asc 59 349
r6g.2xlarge desc 170 1646

Issues Resolved

[OpenSearch-Project-5534]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Chaitanya, Can we add some tests around this change. Also might be good to see some perf numbers around this change

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gashutos Can you fix the spotless errors?

@gashutos gashutos changed the title Enable numeric sort optimisation for few numerica sort types Enable numeric sort optimisation for few numerical sort types Feb 15, 2023
@nknize
Copy link
Collaborator

nknize commented Feb 15, 2023

Bleh. This is a long standing todo from the Lucene 9 upgrade!

The problem really originates from LUCENE-9280 which introduced an optimization that skips non-competitive docs by using the BKD index when sorting by fields other than _score. The reason the sortField.setOptimizeSortWithPoints line was introduced is because Lucene 9 expects the BYTE size for the BKD index and docvalues to match along with the byte size of the SortField when comparing for sort. OpenSearch violates this assumption by using the wider byte size for every SortField.Type in NumericType. We saw this same problem in several test failures when working on the Lucene 9 upgrade PR.

The wider fix that @reta is referring to is much more involved as it's going to require logic for handling types when merging results coming from different indexes. So I'm +1 for a quick patch (w/ bugfix backport) to temporarily fix critical performance regressions like sorting by non _score fields in already index sorted indices (e.g., time series). (gripe: it's SUPER frustrating not having nightly benchmarks to expose these regressions to the wider audience so I may just go ahead and adapt LuceneBench python scripts to use OpenSearch-Benchmarks - unless someone else out there wants to pick up that task, hint hint!!!!!)

@gashutos
Copy link
Contributor Author

gashutos commented Feb 15, 2023

Thanks Chaitanya, Can we add some tests around this change. Also might be good to see some perf numbers around this change

Thanks for the review Bukhtawar ! The results I've added already in description of this PR. But they were hard to read since I didnt format those :)

Below are the results from OS_2.3. (ran on managed AOS)
Units in ms.

Data Node Type.         Sort Mode             OS 2.3             OS 2.3(before)
r6g.2xlarge.               asc                264                  428
r6g.2xlarge.               desc               639                  2422

@gashutos
Copy link
Contributor Author

Not often (fingers crossed) but it might, u64 could be considered somewhat esoteric but it would allow people to migrate more easily from Elasticsearch. Also, I think we could go as far as supporting BigInteger for example at some point. To conclude - it might happen, and since we know there are such decisions points buried in code - making them explicit would 100% help later on.

The worst case, newly introduced type would not able to get sort optimisation advantage. Correctness would be still intact. The plan is anyway to support all field types some time later on.

@gashutos
Copy link
Contributor Author

gashutos commented Feb 15, 2023

Holding up on the merge until we sort out the optimization flag decision.

What about this option: add applySortOptimization(SortField field) to numeric type contract, the boolean is gone and each numeric type is in control of any possible optimization it could do?

Where will be the decider logic in that case ? if to apply optimisation or not ? You mean move the switch code block in that contract method ?

@reta
Copy link
Collaborator

reta commented Feb 15, 2023

Where will be the decider logic in that case ? if to apply optimisation or not ? You mean move the switch code block in that contract method ?

There won't be switch because each numeric type (LONG/INT/...) would implement the method, the types which do not support optimization(s) would do nothing.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationStatsIT.testSegmentReplicationStatsResponse

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@gashutos
Copy link
Contributor Author

Where will be the decider logic in that case ? if to apply optimisation or not ? You mean move the switch code block in that contract method ?

There won't be switch because each numeric type (LONG/INT/...) would implement the method, the types which do not support optimization(s) would do nothing.

that would be similar to boolean in ctor right ? Its just 50% of types wont implement the methods. (depending on how many types we support at that point of time).

How about adding a static set in NumericType,

// Lucene optimization enabled numeric types
public static final Set<NumericType> SORTOPTIMIZED_TYPES = new HashSet<>(Arrays.asList(DATE, 
        DATE_NANOSECONDS, 
        DOUBLE, 
        LONG)); 

And we will refer this here
sortField.setOptimizeSortWithPoints(NumericType.SORTOPTIMIZED_TYPES.contains(getNumericType()));

@gashutos
Copy link
Contributor Author

Where will be the decider logic in that case ? if to apply optimisation or not ? You mean move the switch code block in that contract method ?

There won't be switch because each numeric type (LONG/INT/...) would implement the method, the types which do not support optimization(s) would do nothing.

that would be similar to boolean in ctor right ? Its just 50% of types wont implement the methods. (depending on how many types we support at that point of time).

How about adding a static set in NumericType,

// Lucene optimization enabled numeric types
public static final Set SORTOPTIMIZED_TYPES = new HashSet<>(Arrays.asList(DATE,
DATE_NANOSECONDS,
DOUBLE,
LONG));
And we will refer this here
sortField.setOptimizeSortWithPoints(NumericType.SORTOPTIMIZED_TYPES.contains(getNumericType()));

The comment here would easily hint dev who is adding new numerical type.

@reta
Copy link
Collaborator

reta commented Feb 15, 2023

Where will be the decider logic in that case ? if to apply optimisation or not ? You mean move the switch code block in that contract method ?

There won't be switch because each numeric type (LONG/INT/...) would implement the method, the types which do not support optimization(s) would do nothing.

that would be similar to boolean in ctor right ? Its just 50% of types wont implement the methods. (depending on how many types we support at that point of time).

How about adding a static set in NumericType,

// Lucene optimization enabled numeric types
public static final Set<NumericType> SORTOPTIMIZED_TYPES = new HashSet<>(Arrays.asList(DATE, 
        DATE_NANOSECONDS, 
        DOUBLE, 
        LONG)); 

And we will refer this here sortField.setOptimizeSortWithPoints(NumericType.SORTOPTIMIZED_TYPES.contains(getNumericType()));

I was thinking about something along these lines: sortField.setOptimizeSortWithPoints(NumericType.SORTOPTIMIZED_TYPES.contains(getNumericType())); -> getNumericType().applySortOptimization(sortField)

It is similar to the flag but a) we don't have this boolean flag anymore b) it becomes cleaner in a sense that sort optimization are delegated directly to numeric type

@reta
Copy link
Collaborator

reta commented Feb 15, 2023

The comment here would easily hint dev who is adding new numerical type.

The difference between the comment and contract: the dev has to actually run into this comment somehow vs contract - the new numeric type just will have to implement the method, and it will force the dev to do something about it.

@nknize
Copy link
Collaborator

nknize commented Feb 15, 2023

The difference between the comment and contract: the dev has to actually run into this comment somehow vs contract

I agree with this. I pushed a commit to add a deprecated abstract shouldApplyPointSortOptimization method numeric data types have to implement. If set to false the optimization is explicitly disabled. I can add some better javadocs around this but wanted to get it in to see if y'all are good w/ this approach. I think it's less cryptic than the ctor boolean, while accomplishing @reta point that new types have to explicitly think about disabling or enabling the optimization.

@gashutos
Copy link
Contributor Author

The difference between the comment and contract: the dev has to actually run into this comment somehow vs contract

I agree with this. I pushed a commit to add a deprecated abstract shouldApplyPointSortOptimization method numeric data types have to implement. If set to false the optimization is explicitly disabled. I can add some better javadocs around this but wanted to get it in to see if y'all are good w/ this approach. I think it's less cryptic than the ctor boolean, while accomplishing @reta point that new types have to explicitly think about disabling or enabling the optimization.

Looks good to me. I was just afraid about the readability part, but that's fine.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this balance of numeric API while keeping the tech debt guardrails in place! Thanks to everyone that helped weigh in but a HUGE thanks to @gashutos for the benchmarks and jumping on this long running todo!! Huge contribution!

@gashutos
Copy link
Contributor Author

gashutos commented Feb 15, 2023

I like this balance of numeric API while keeping the tech debt guardrails in place! Thanks to everyone that helped weigh in but a HUGE thanks to @gashutos for the benchmarks and jumping on this long running todo!! Huge contribution!

Thank you @reta & @nknize @andrross @Bukhtawar for all in this PR. I will keep you guys posted about my finding for rest of the numeric types which we are not supporting. The optimization we are gaining is pretty good to enable rest of types as well.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@nknize nknize merged commit 6bb9e3e into opensearch-project:main Feb 15, 2023
@nknize nknize added the backport 2.x Backport to 2.x branch label Feb 15, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 15, 2023
This commit restores the sort optimization to use BKD to skip non-competitive
docs for numeric types whose BYTES size match between the BKD leaf and doc
values encoding. For now this is only LONG, DOUBLE, DATE, and DATE_NANOSECONDS
as the remaining NumericTypes use 64bit docvalue encoding while the BKD uses
smaller byte encoded space.

This also updates the QueryPhase to remove the long time unnecessary in order
doc id check and minDoc boolean query for skipping non-competitive docs that is
handled by all Lucene 7.0+ sorted indexes.

Existing tests are updated.

Signed-off-by: Nicholas Walter Knize <[email protected]>
Signed-off-by: gashutos <[email protected]>

Co-authored-by: Nicholas Walter Knize <[email protected]>
Co-authored-by: Chaitanya Gohel <[email protected]>
(cherry picked from commit 6bb9e3e)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
nknize added a commit that referenced this pull request Feb 15, 2023
…#6330)

This commit restores the sort optimization to use BKD to skip non-competitive
docs for numeric types whose BYTES size match between the BKD leaf and doc
values encoding. For now this is only LONG, DOUBLE, DATE, and DATE_NANOSECONDS
as the remaining NumericTypes use 64bit docvalue encoding while the BKD uses
smaller byte encoded space.

This also updates the QueryPhase to remove the long time unnecessary in order
doc id check and minDoc boolean query for skipping non-competitive docs that is
handled by all Lucene 7.0+ sorted indexes.

Existing tests are updated.

(cherry picked from commit 6bb9e3e)

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Walter Knize <[email protected]>
Co-authored-by: Chaitanya Gohel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch Indexing & Search Performance This is for any performance related enhancements or bugs skip-changelog v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants