-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-672156 support specifying compression algorithm to be used for BDEC Parquet files #579
Conversation
d4273a7
to
9cab491
Compare
I think the intention for this parameter is to allow customer to go back to GZIP if there is any issue with ZSTD, so I suggest we only support GZIP and ZSTD, WDYT? |
...in/java/net/snowflake/ingest/streaming/internal/SnowflakeStreamingIngestChannelInternal.java
Outdated
Show resolved
Hide resolved
@@ -51,6 +54,9 @@ public class ParameterProvider { | |||
It reduces memory consumption compared to using Java Objects for buffering.*/ | |||
public static final boolean ENABLE_PARQUET_INTERNAL_BUFFERING_DEFAULT = false; | |||
|
|||
public static final Constants.BdecParquetCompression BDEC_PARQUET_COMPRESSION_ALGORITHM_DEFAULT = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we want ZSTD to be the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was the ZSTD already tested with all our tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested GZIP, ZSTD, SNAPPY
previously, but I will run the tests again. The SDK has changed a lot since then.
src/test/java/net/snowflake/ingest/streaming/internal/ParameterProviderTest.java
Show resolved
Hide resolved
Please hold this PR until we're done with the current release, thanks! |
I agree. Do we need to support anything else as customers do not have access to the underlying blob? |
9cab491
to
d00a148
Compare
if other parquet supported compression algos are tested against server side, why not allow them? |
d00a148
to
bc911e4
Compare
667af1b
to
f59dfc7
Compare
2f9ee4d
to
8f104b2
Compare
I narrowed down the scope of this PR to just make compression algorithm configurable. GZIP is the only allowed value as it has been the default value so far. I will have to do more correctness testing and performance evaluations before adding other allowed compression algorithms. |
8f104b2
to
b3f19ad
Compare
return e; | ||
} | ||
} | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add a test case for invalid input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, done.
src/test/java/net/snowflake/ingest/streaming/internal/ParameterProviderTest.java
Show resolved
Hide resolved
9ad933b
to
e9b3d36
Compare
This PR adds support for specifying the compression algorithm to be used for BDEC Parquet files. The allowed value is just
GZIP
for now.