Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ORC writing test cases for dictionary compression [databricks] #8798

Merged
merged 5 commits into from
Jul 27, 2023

Conversation

res-life
Copy link
Collaborator

closes #8797

Add ORC writing test cases for dictionary compression

Signed-off-by: Chong Gao [email protected]

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

Should be able to pass the premerge.

revans2
revans2 previously approved these changes Jul 25, 2023
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just some nits and follow on issues that I think would be good to do.

@res-life
Copy link
Collaborator Author

build

@revans2 revans2 changed the title Add ORC writing test cases for dictionary compression Add ORC writing test cases for dictionary compression [databricks] Jul 26, 2023
@revans2
Copy link
Collaborator

revans2 commented Jul 26, 2023

build

revans2
revans2 previously approved these changes Jul 26, 2023
// get GPU encoding info
val gpuEncodings = withGpuSparkSession(getEncodings)

assertResult(cpuEncodings)(gpuEncodings)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little nervous that these will not always be equal, but the test appears to be passing, so I think we might be good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected result is:
(DICTIONARY_V2, 2) // for column 1
(DICTIONARY_V2, 3) // for column 2
2 and 3 are the dictionary size.
This assert will be always safe.

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

build

@jlowe jlowe merged commit 299393a into NVIDIA:branch-23.08 Jul 27, 2023
27 checks passed
@sameerz sameerz added the test Only impacts tests label Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
4 participants