Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Spark 3.5.1 #525

Merged
merged 13 commits into from
Aug 8, 2024
Merged

Conversation

penghuo
Copy link
Collaborator

@penghuo penghuo commented Aug 6, 2024

Description

Merge feature bracnh spark-3.5.1.

Test

sbt integtest/awsIntegration

[info] Run completed in 3 minutes, 13 seconds.
[info] Total number of tests run: 2
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0

Perf Test - direct query

Compared to Spark 3.3 (EMR-S 6.10), there is no significant difference observed.

output (1)

id emrs-p90 query-p90
1 347.619 169.222
2 256.612 151.244
3 241.754 148.445
4 196.452 110.116
5 257.463 164.604
6 286.919 175.081
7 317.325 219.362
8 256.688 153.681
9 257.500 153.506
10 256.969 155.818
11 256.854 160.758
12 286.868 190.403
13 256.809 153.911
14 271.859 167.041
15 317.254 217.635
16 332.309 225.800
17 241.652 149.872
18 256.916 146.927
19 242.132 139.627
20 287.283 182.994

Perf Test - directy query OpenSearch index

SQL Query Spark 3.3 Spark 3.5
SELECT COUNT(*) FROM dev.default.logs-181998 17021.0 20119.0
SELECT COUNT(*) FROM dev.default.logs-181998 WHERE status <> 0 18507.0 19686.0
SELECT COUNT(*), AVG(size) FROM dev.default.logs-181998 18281.0 21861.0
SELECT AVG(CAST(size AS BIGINT)) FROM dev.default.logs-181998 19109.0 199915.0
SELECT MIN(@timestamp), MAX(@timestamp) FROM dev.default.logs-181998 18420.0 19448.0
SELECT status, COUNT() FROM dev.default.logs-181998 WHERE status <> 0 GROUP BY status ORDER BY COUNT() DESC 19731.0 20694.0

Issues Resolved

#352

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

penghuo and others added 10 commits May 15, 2024 09:17
* Support Spark 3.4.1

Signed-off-by: Peng Huo <[email protected]>

* Ignore FlintSparkWindowingFunctionITSuite and IcebergIT

Signed-off-by: Peng Huo <[email protected]>

---------

Signed-off-by: Peng Huo <[email protected]>
* Fix IcebergIT and Refactor SessionCatalog

Signed-off-by: Peng Huo <[email protected]>

* update format

Signed-off-by: Peng Huo <[email protected]>

* fix UT

Signed-off-by: Peng Huo <[email protected]>

* address comments

Signed-off-by: Peng Huo <[email protected]>

---------

Signed-off-by: Peng Huo <[email protected]>
…earch-project#349)

* enable Iceberg IT

Signed-off-by: Peng Huo <[email protected]>

* push down read-padding on char type

Signed-off-by: Peng Huo <[email protected]>

---------

Signed-off-by: Peng Huo <[email protected]>
* Bump Spark version

Signed-off-by: Chen Dai <[email protected]>

* Ignore broken IT temporarily

Signed-off-by: Chen Dai <[email protected]>

* Fix broken IT

Signed-off-by: Chen Dai <[email protected]>

---------

Signed-off-by: Chen Dai <[email protected]>
Signed-off-by: Peng Huo <[email protected]>
Signed-off-by: Peng Huo <[email protected]>
Signed-off-by: Peng Huo <[email protected]>
@penghuo penghuo self-assigned this Aug 6, 2024
@penghuo penghuo added the 0.5 label Aug 6, 2024
@penghuo penghuo marked this pull request as ready for review August 7, 2024 03:24
Copy link
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave this PR as it is for author to decide whether to squash or not.

Copy link
Collaborator

@noCharger noCharger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for change. Any data can be shared for the below?

 Performance test
 Direct SparkSQL query performance
 Flint index building performance, including skipping/CV/MV
 Query acceleration performance, including skipping/CV

@penghuo penghuo merged commit d6e71fa into opensearch-project:main Aug 8, 2024
4 checks passed
@penghuo penghuo mentioned this pull request Aug 8, 2024
18 tasks
seankao-az added a commit that referenced this pull request Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants