-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-983635 Allow ZSTD compression algorithm #654
Conversation
Maybe add zstd-jni as dependency in README.md Edit: done |
DEW testingDEW tests using a local release of the Snowpipe Streaming SDK with my changes and ZSTD as the default compression algorithm passed. |
eed81af
to
d671c1f
Compare
How I ran the precommitsI ran a precommit with my branch rebased on sdk release 2.0.5. Results4 tests failed, that also failed on master. 9 tests failed because of reduced byte counts. 1 test failed because of a higher byte count. 1 test failed unexpectedly. Is zstd sometimes leading to a higher byte count a problem? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change! I think we need more test coverage here, for example: we need coverage to make sure it works on other data types. We probably have most of them which you can just add a parameter like what you did in StreamingIngestBigFilesIT.java
@@ -17,6 +17,7 @@ The Snowflake Ingest Service SDK depends on the following libraries: | |||
|
|||
* snowflake-jdbc (3.13.30 to 3.13.33) | |||
* slf4j-api | |||
* com.github.luben:zstd-jni (1.5.0-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we decide to treat this as one of the special dependencies? @sfc-gh-lsembera has more context here if you need more help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, the problem is that zstd-jni can't be shaded. Because of that we can't include it in our sdk's shaded jar. It has to be provided separately when customers want to build their application with our sdk. I thought this was reason enough to list it with our dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfc-gh-lsembera Any idea if we can do better here? Or any potential issue? I'm concerned that we keep adding to this list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good if shade fewer dependencies, and this change goes in that direction. Another advantage of the unshaded approach here is that if the customer does not want to use zstd and wants smaller distribution, they can exclude the zstd jar from their maven project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we stop relocating com.github.luben
, is there any concern that this is a behavior change?
What's our plan of making ZSTD the default? |
@sfc-gh-tzhang In the protocol of our Streaming Ingest Repeating Eng Sync on Dec 5th we wrote
I think this also applies to making ZSTD the default |
src/test/java/net/snowflake/ingest/streaming/internal/StreamingIngestIT.java
Outdated
Show resolved
Hide resolved
Description Testing
Description Testing
Description Testing
Description Testing
Description Testing
Description Testing
Description Testing
Description Testing
Description Testing
Description Testing
Description Testing
Description Testing
Description Testing
Description Testing
Description Testing
…hecking for unused dependencies Description Testing
…fault Description Testing
Description Testing
…ZSTD Description Testing
7d03433
to
ff91de2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider me as approved if you can check with @sfc-gh-lsembera to see if we can do better on the ZSTD dependency, thanks!
@@ -17,6 +17,7 @@ The Snowflake Ingest Service SDK depends on the following libraries: | |||
|
|||
* snowflake-jdbc (3.13.30 to 3.13.33) | |||
* slf4j-api | |||
* com.github.luben:zstd-jni (1.5.0-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfc-gh-lsembera Any idea if we can do better here? Or any potential issue? I'm concerned that we keep adding to this list
Ah, thanks for the reminder, we can do this, is this something you want to start if you have free time? |
pom.xml
Outdated
@@ -691,6 +694,9 @@ | |||
<configuration> | |||
<failOnWarning>true</failOnWarning> | |||
<ignoreNonCompile>true</ignoreNonCompile> | |||
<ignoredDependencies> | |||
<ignoredDependency>com.github.luben:zstd-jni</ignoredDependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We declare zstd-jni as a dependency even though it's not directly used in our code. If we didn't ignore zstd-jni here analyze-only would fail due to unused dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we can use the runtime
maven scope?
pom.xml
Outdated
@@ -1092,6 +1095,12 @@ | |||
<exclude>org/slf4j/**</exclude> | |||
</excludes> | |||
</filter> | |||
<filter> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not shading this dependency, why is this filter needed?
@@ -17,6 +17,7 @@ The Snowflake Ingest Service SDK depends on the following libraries: | |||
|
|||
* snowflake-jdbc (3.13.30 to 3.13.33) | |||
* slf4j-api | |||
* com.github.luben:zstd-jni (1.5.0-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good if shade fewer dependencies, and this change goes in that direction. Another advantage of the unshaded approach here is that if the customer does not want to use zstd and wants smaller distribution, they can exclude the zstd jar from their maven project.
Description Testing
Description Testing
…ncies filtered from shading, because it's already excluded Description Testing
|
Description Testing
Description Testing
Description Testing
Description Testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
@@ -17,6 +17,7 @@ The Snowflake Ingest Service SDK depends on the following libraries: | |||
|
|||
* snowflake-jdbc (3.13.30 to 3.13.33) | |||
* slf4j-api | |||
* com.github.luben:zstd-jni (1.5.0-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we stop relocating com.github.luben
, is there any concern that this is a behavior change?
The README.md of the ZSTD library say the following. This is the reason we can't shade it.
|
Test setting ZSTD as default compression algorithm