Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1748333 Fix Iceberg decimal type schema parser #864

Merged
merged 4 commits into from
Oct 21, 2024

Conversation

sfc-gh-alhuang
Copy link
Contributor

@sfc-gh-alhuang sfc-gh-alhuang commented Oct 17, 2024

Summary

This PR addresses the issue where the scanner encounters an error when scanning delta-encoded decimal values with a precision greater than 18.

Changes

  1. Removed sfVer metadata in Iceberg mode: This prevents the scanner from applying optimizations specific to Snowflake-written Parquet files, which was the root cause of the issue with scanning.
  2. Add tests for multiple random generated decimals.

}
list.add(
new BigDecimal(
RandomStringUtils.randomNumeric(intPart)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes it impossible to reproduce failures locally. Why do we need this randomNumeric when we already generated a random number for intpart and floatpart using the seeded RNG in lines 412 and 413?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of decimal is greater than int/long.max, thus we need string generator for this. Switch to generator with consistent seed.

@sfc-gh-alhuang sfc-gh-alhuang force-pushed the alhuang-iceberg-data-type-parser branch from f0286a1 to 7787aa5 Compare October 21, 2024 18:50
bigDecimals_38_10);
}

private static List<Object> randomBigDecimal(int count, int precision, int scale) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i fully understand what this is trying to do (comments will help?). Lets get this merged in to unblock KC folks and I'll follow up with you offline on this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments

Copy link
Collaborator

@sfc-gh-hmadan sfc-gh-hmadan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, test code we'll discuss after this gets merged.

@sfc-gh-alhuang sfc-gh-alhuang enabled auto-merge (squash) October 21, 2024 21:35
@sfc-gh-alhuang sfc-gh-alhuang merged commit 1fe05a5 into master Oct 21, 2024
45 checks passed
@sfc-gh-alhuang sfc-gh-alhuang deleted the alhuang-iceberg-data-type-parser branch October 21, 2024 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants