Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration data type test for new table format #850

Merged
merged 11 commits into from
Oct 9, 2024
Merged

Conversation

sfc-gh-alhuang
Copy link
Contributor

@sfc-gh-alhuang sfc-gh-alhuang commented Oct 2, 2024

  1. Created data type ITs for Iceberg table and verified through local VM. This test should be disabled until the sever side change is release.
  2. Bug fix for FileColumnProperties. The Iceberg mode will fill in default value for corresponding type now if all values in the column are null.
  3. Created AbstractDataTypeTest.testIcebergIngestAndQuery for multi row testing.


@Test
public void testBoolean() throws Exception {
testIcebergIngestion("boolean", true, new BooleanProvider());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whenever you have time in the next few weeks, lets introduce a class called ITTestcaseBuilder that has methods withDataType, withInputValue, withExpectedValue, withVerificationQuery.

THe provider business is funky as it only supports doing exact value verification with a where value = {expectedValue}. I'd much rather write the sql filter directly in the testcase instead of have code generate it, for flexibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to jira

return tableName;
}

protected String createIcebergTable(String dataType) throws SQLException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add an argument for boolean nullable and we can make sure the sdk does proper nullability checking for columns marked non-nullable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added nullable test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see nullable was removed, why so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed and use <datatype> not null in datatype testing. By doing this we can set nullable on any sub-columns on structured data type.

Assert.assertThrows(
SFException.class,
() -> testIcebergIngestion("boolean", new Object(), true, new BooleanProvider()));
Assert.assertEquals(ErrorCode.INVALID_FORMAT_ROW.getMessageCode(), ex.getVendorCode());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(dupe) lets add nullability tests too - sending in null should work for every column, but when its a non-null column then the sdk should throw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added nullable test for non structured data type. Will fix structured data type null count in next PR.

/** Whether the column is an Iceberg column */
private final boolean isIcebergColumn;
/** Primitive type of the column, only used for Iceberg columns */
private final PrimitiveType primitiveType;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pl update the PR description to explain there's a bug fix in this PR too.

1. Remove hidden polymorphic behavior of RowBufferStats, 
2. remove isIceberg
3. Refactoring in FileColumnProperties - @sfc-gh-alhuang For strings, i see that there is a truncateUp for the Max stat and 
truncateDown for min stat. Please reivew this part carefully and compare with what it used to be before we started making changes, want to make sure we aren't regressing FDNs on this front.
@sfc-gh-alhuang
Copy link
Contributor Author

sfc-gh-alhuang commented Oct 8, 2024

@sfc-gh-hmadan Confirmed that the string min/max values for Iceberg table EP info also apply the 32 bytes round down/up truncate as FDN did.

@sfc-gh-alhuang sfc-gh-alhuang enabled auto-merge (squash) October 9, 2024 19:21
@sfc-gh-alhuang sfc-gh-alhuang merged commit a6339aa into master Oct 9, 2024
46 checks passed
@sfc-gh-alhuang sfc-gh-alhuang deleted the alhuang-IT branch October 9, 2024 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants