Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test cases for ORC writing according to options orc.compress and compression [databricks] #8785

Merged
merged 6 commits into from
Jul 29, 2023

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Jul 24, 2023

closes #8781
closes #8782

ORC writing first honor "compression" option, then "orc.compress" option.

Signed-off-by: Chong Gao [email protected]

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

321cdh and 330cdh compile failed. Will fix later.

@res-life
Copy link
Collaborator Author

build

1 similar comment
@res-life
Copy link
Collaborator Author

build

@res-life res-life requested review from revans2 and jlowe July 25, 2023 10:56
@res-life res-life changed the title Add test cases for ORC writing according to options orc.compress and … Add test cases for ORC writing according to options orc.compress and compression [databricks] Jul 27, 2023
@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

build

@res-life res-life requested a review from mythrocks July 27, 2023 12:21
revans2
revans2 previously approved these changes Jul 27, 2023
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be upmerged to solve merge conflicts but otherwise lgtm.

@@ -92,4 +102,76 @@ class OrcQuerySuite extends SparkQueryCompareTestSuite {
) {
frame => frame
}

private def getOrcFilePostfix(compression: String): String =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private def getOrcFilePostfix(compression: String): String =
private def getOrcFileSuffix(compression: String): String =

// the reader is not a AutoCloseable for Spark CDH, so use `withResourceIfAllowed`
// 321cdh uses lower ORC: orc-core-1.5.1.7.1.7.1000-141.jar
// 330cdh uses lower ORC: orc-core-1.5.1.7.1.8.0-801.jar
withResourceIfAllowed(OrcFile.createReader(orcFilePath, conf)) { reader =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a useful utility. It makes the code quite streamlined.

checkCompressType(None, Some(orcCompress))
}

// make paris, e.g.: [("UNCOMPRESSED", "NONE"), ("NONE", "SNAPPY"), ("SNAPPY", "ZSTD") ... ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// make paris, e.g.: [("UNCOMPRESSED", "NONE"), ("NONE", "SNAPPY"), ("SNAPPY", "ZSTD") ... ]
// make pairs, e.g.: [("UNCOMPRESSED", "NONE"), ("NONE", "SNAPPY"), ("SNAPPY", "ZSTD") ... ]

mythrocks
mythrocks previously approved these changes Jul 27, 2023
Copy link
Collaborator

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff. Some minor nitpicks, but it looks pretty good to me.

@res-life res-life dismissed stale reviews from mythrocks and revans2 via a38b750 July 28, 2023 02:04
@res-life
Copy link
Collaborator Author

build

@res-life res-life merged commit f436da2 into NVIDIA:branch-23.08 Jul 29, 2023
27 checks passed
@sameerz sameerz added the test Only impacts tests label Jul 31, 2023
thirtiseven pushed a commit to thirtiseven/spark-rapids that referenced this pull request Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
5 participants