-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1708577 Parquet V2 support for new table format #851
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -124,4 +124,8 @@ public Optional<Integer> getMaxRowGroups() { | |
public String getParquetMessageTypeName() { | ||
return isIcebergMode ? PARQUET_MESSAGE_TYPE_NAME : BDEC_PARQUET_MESSAGE_TYPE_NAME; | ||
} | ||
|
||
public boolean isEnableDictionaryEncoding() { | ||
return isIcebergMode; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this might be dependent on storage serialization policy too, lets verify. no need to hold up PR. |
||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ | |
package net.snowflake.ingest.utils; | ||
|
||
import java.util.Arrays; | ||
import org.apache.parquet.column.ParquetProperties; | ||
import org.apache.parquet.hadoop.metadata.CompressionCodecName; | ||
|
||
/** Contains all the constants needed for Streaming Ingest */ | ||
|
@@ -71,9 +72,31 @@ public class Constants { | |
public static final String DROP_CHANNEL_ENDPOINT = "/v1/streaming/channels/drop/"; | ||
public static final String REGISTER_BLOB_ENDPOINT = "/v1/streaming/channels/write/blobs/"; | ||
|
||
public static final int PARQUET_MAJOR_VERSION = 1; | ||
public static final int PARQUET_MINOR_VERSION = 0; | ||
|
||
/** | ||
* Iceberg table serialization policy. Use v2 parquet writer for optimized serialization, | ||
* otherwise v1. | ||
*/ | ||
public enum IcebergSerializationPolicy { | ||
COMPATIBLE, | ||
OPTIMIZED; | ||
|
||
public ParquetProperties.WriterVersion toParquetWriterVersion() { | ||
switch (this) { | ||
case COMPATIBLE: | ||
return ParquetProperties.WriterVersion.PARQUET_1_0; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean that non-iceberg tables (which are snowflake managed tables AFAIK) only support parquet v1? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The server-side scanner for FDN tables supports Parquet V2. This PR is specific to the Iceberg table feature and does not alter the default behavior for streaming to FDN tables. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! |
||
case OPTIMIZED: | ||
return ParquetProperties.WriterVersion.PARQUET_2_0; | ||
default: | ||
throw new IllegalArgumentException( | ||
String.format( | ||
"Unsupported ICEBERG_SERIALIZATION_POLICY = '%s', allowed values are %s", | ||
this.name(), Arrays.asList(IcebergSerializationPolicy.values()))); | ||
} | ||
} | ||
} | ||
|
||
public enum WriteMode { | ||
CLOUD_STORAGE, | ||
REST_API, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd much rather depend on our own constant than a third-party library's constant, I thought I had left a comment on this but don't see it anywhere :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to take in next PR too, just remove the import whenever you revert this.