Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-983635 Allow ZSTD compression algorithm #654

Merged
merged 30 commits into from
Feb 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a7927d5
SNOW-983635 Set ZSTD as default compression algorithm
sfc-gh-lthiede Dec 5, 2023
5db8db3
SNOW-983635 Print supported Compression algos
sfc-gh-lthiede Dec 21, 2023
326f9b4
SNOW-983635 Test creating a variable of type RecyclingBufferPool
sfc-gh-lthiede Dec 21, 2023
fe06a7f
SNOW-983635 Add ZSTD dependency
sfc-gh-lthiede Dec 21, 2023
189e894
SNOW-983635 Add version number to zstd dependency
sfc-gh-lthiede Dec 21, 2023
c95d2fe
SNOW-983635 Sort pom
sfc-gh-lthiede Dec 21, 2023
86c5dc5
SNOW-983635 Use same version of zstd dependency as parquet
sfc-gh-lthiede Dec 21, 2023
efaf824
SNOW-983635 Reexecute probably failing static member initialization
sfc-gh-lthiede Dec 21, 2023
4cee250
SNOW-983635 Print UnsatisfiedLinkError
sfc-gh-lthiede Dec 22, 2023
01f7337
SNOW-983635 Don't relocated com.github.luben.zstd
sfc-gh-lthiede Dec 22, 2023
df2938f
SNOW-983635 Don't use luben.zstd directly
sfc-gh-lthiede Dec 22, 2023
cb52f29
SNOW-983635 Hopefully correctly exclude zstd library from shading
sfc-gh-lthiede Dec 22, 2023
c996081
SNOW-983635 Explicitly exclude zstd from shading in another part of p…
sfc-gh-lthiede Jan 4, 2024
6f8053b
SNOW-983635 Add tests for ZSTD compression
sfc-gh-lthiede Jan 9, 2024
4b48a85
SNOW-983635 Remove test that only made sure that I configured the com…
sfc-gh-lthiede Jan 9, 2024
67c3a69
SNOW-983635 Add zstd-jni to dependencies in pom.xml
sfc-gh-lthiede Jan 15, 2024
7f09641
SNOW-983635 Sort pom
sfc-gh-lthiede Jan 16, 2024
24148e3
SNOW-983635 Remove direct zstd dependency
sfc-gh-lthiede Jan 16, 2024
7143018
SNOW-983635 Add zstd-jni as explicit dependency but exclude it from c…
sfc-gh-lthiede Jan 16, 2024
ab25c68
Delete dependency-reduced-pom.xml
sfc-gh-lthiede Jan 22, 2024
601d01b
SNOW-983635 Add zstd-jni dependency to README.md and make ZSTD non de…
sfc-gh-lthiede Jan 22, 2024
a8cdc30
SNOW-983635 Remove unnecessary new line
sfc-gh-lthiede Jan 23, 2024
ff91de2
SNOW-983635 Parameterise StreamingIngestIT.java to run with GZIP and …
sfc-gh-lthiede Jan 23, 2024
0e743c8
SNOW-983635 Add even more datatype tests with ZSTD
sfc-gh-lthiede Jan 29, 2024
17c4b25
SNOW-983635 consistently use compile in public pom
sfc-gh-lthiede Jan 29, 2024
94c62bf
SNOW-983635 Fix AbstractDataTypeTest and remove zstd-jni from depende…
sfc-gh-lthiede Jan 29, 2024
584e75c
SNOW-983635 Make zstd-jni a runtime dependency
sfc-gh-lthiede Jan 29, 2024
4574c8e
SNOW-983635 Sort dependencies is pom
sfc-gh-lthiede Jan 29, 2024
fa71d94
SNOW-983635 Temporarily set ZSTD as default again to test recent changes
sfc-gh-lthiede Jan 29, 2024
a95b87f
SNOW-983635 Revert setting ZSTD as default compression algorithm
sfc-gh-lthiede Jan 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ The Snowflake Ingest Service SDK depends on the following libraries:

* snowflake-jdbc (3.13.30 to 3.13.33)
* slf4j-api
* com.github.luben:zstd-jni (1.5.0-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we decide to treat this as one of the special dependencies? @sfc-gh-lsembera has more context here if you need more help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, the problem is that zstd-jni can't be shaded. Because of that we can't include it in our sdk's shaded jar. It has to be provided separately when customers want to build their application with our sdk. I thought this was reason enough to list it with our dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-lsembera Any idea if we can do better here? Or any potential issue? I'm concerned that we keep adding to this list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good if shade fewer dependencies, and this change goes in that direction. Another advantage of the unshaded approach here is that if the customer does not want to use zstd and wants smaller distribution, they can exclude the zstd jar from their maven project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we stop relocating com.github.luben, is there any concern that this is a behavior change?


These dependencies will be fetched automatically by build systems like Maven or Gradle. If you don't build your project
using a build system, please make sure these dependencies are on the classpath.
Expand Down
13 changes: 7 additions & 6 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -356,7 +356,6 @@

<!-- All of our needed dependencies -->
<dependencies>

<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
Expand Down Expand Up @@ -389,7 +388,6 @@
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
Expand Down Expand Up @@ -473,6 +471,12 @@
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>
<dependency>
<groupId>com.github.luben</groupId>
<artifactId>zstd-jni</artifactId>
<version>1.5.0-1</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
Expand Down Expand Up @@ -954,6 +958,7 @@
<excludes>
<exclude>net.snowflake:snowflake-jdbc</exclude>
<exclude>org.slf4j:slf4j-api</exclude>
<exclude>com.github.luben:zstd-jni</exclude>
</excludes>
</artifactSet>
<relocations>
Expand Down Expand Up @@ -1034,10 +1039,6 @@
<pattern>com.ctc</pattern>
<shadedPattern>${shadeBase}.com.ctc</shadedPattern>
</relocation>
<relocation>
<pattern>com.github.luben</pattern>
<shadedPattern>${shadeBase}.com.github.luben</shadedPattern>
</relocation>
<relocation>
<pattern>com.thoughtworks</pattern>
<shadedPattern>${shadeBase}.com.thoughtworks</shadedPattern>
Expand Down
6 changes: 6 additions & 0 deletions public_pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,11 @@
<version>1.7.36</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.github.luben</groupId>
<artifactId>zstd-jni</artifactId>
<version>1.5.0-1</version>
sfc-gh-lthiede marked this conversation as resolved.
Show resolved Hide resolved
<scope>runtime</scope>
</dependency>
</dependencies>
</project>
3 changes: 2 additions & 1 deletion src/main/java/net/snowflake/ingest/utils/Constants.java
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,8 @@ public static BdecVersion fromInt(int val) {
* CompressionCodecName, but we want to control and allow only specific values of that.
*/
public enum BdecParquetCompression {
GZIP;
GZIP,
ZSTD;

public CompressionCodecName getCompressionCodec() {
return CompressionCodecName.fromConf(this.name());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,17 @@ public void testValidCompressionAlgorithmsAndWithUppercaseLowerCase() {
Constants.BdecParquetCompression.GZIP,
parameterProvider.getBdecParquetCompressionAlgorithm());
});
List<String> zstdValues = Arrays.asList("ZSTD", "zstd", "Zstd", "zStd");
zstdValues.forEach(
v -> {
Properties prop = new Properties();
Map<String, Object> parameterMap = getStartingParameterMap();
parameterMap.put(ParameterProvider.BDEC_PARQUET_COMPRESSION_ALGORITHM, v);
ParameterProvider parameterProvider = new ParameterProvider(parameterMap, prop);
Assert.assertEquals(
Constants.BdecParquetCompression.ZSTD,
parameterProvider.getBdecParquetCompressionAlgorithm());
});
}

@Test
Expand All @@ -322,7 +333,7 @@ public void testInvalidCompressionAlgorithm() {
} catch (IllegalArgumentException e) {
Assert.assertEquals(
"Unsupported BDEC_PARQUET_COMPRESSION_ALGORITHM = 'invalid_comp', allowed values are"
+ " [GZIP]",
+ " [GZIP, ZSTD]",
e.getMessage());
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import static net.snowflake.ingest.TestUtils.verifyTableRowCount;
import static net.snowflake.ingest.utils.Constants.ROLE;
import static net.snowflake.ingest.utils.ParameterProvider.BDEC_PARQUET_COMPRESSION_ALGORITHM;

import java.sql.Connection;
import java.sql.ResultSet;
Expand All @@ -17,8 +18,13 @@
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.junit.runners.Parameterized.Parameter;
import org.junit.runners.Parameterized.Parameters;

/** Ingest large amount of rows. */
@RunWith(Parameterized.class)
public class StreamingIngestBigFilesIT {
private static final String TEST_DB_PREFIX = "STREAMING_INGEST_TEST_DB";
private static final String TEST_SCHEMA = "STREAMING_INGEST_TEST_SCHEMA";
Expand All @@ -29,6 +35,14 @@ public class StreamingIngestBigFilesIT {
private Connection jdbcConnection;
private String testDb;

@Parameters(name = "{index}: {0}")
public static Object[] compressionAlgorithms() {
return new Object[] {"GZIP", "ZSTD"};
}

@Parameter
public String compressionAlgorithm;

@Before
public void beforeAll() throws Exception {
testDb = TEST_DB_PREFIX + "_" + UUID.randomUUID().toString().substring(0, 4);
Expand All @@ -51,6 +65,7 @@ public void beforeAll() throws Exception {
if (prop.getProperty(ROLE).equals("DEFAULT_ROLE")) {
prop.setProperty(ROLE, "ACCOUNTADMIN");
}
prop.setProperty(BDEC_PARQUET_COMPRESSION_ALGORITHM, compressionAlgorithm);
client =
(SnowflakeStreamingIngestClientInternal<?>)
SnowflakeStreamingIngestClientFactory.builder("client1").setProperties(prop).build();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import static net.snowflake.ingest.utils.Constants.BLOB_NO_HEADER;
import static net.snowflake.ingest.utils.Constants.COMPRESS_BLOB_TWICE;
import static net.snowflake.ingest.utils.Constants.REGISTER_BLOB_ENDPOINT;
import static net.snowflake.ingest.utils.ParameterProvider.BDEC_PARQUET_COMPRESSION_ALGORITHM;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.core.Is.is;

Expand Down Expand Up @@ -45,10 +46,15 @@
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.junit.runners.Parameterized.Parameter;
import org.junit.runners.Parameterized.Parameters;
import org.mockito.ArgumentMatchers;
import org.mockito.Mockito;

/** Example streaming ingest sdk integration test */
@RunWith(Parameterized.class)
public class StreamingIngestIT {
private static final String TEST_TABLE = "STREAMING_INGEST_TEST_TABLE";
private static final String TEST_DB_PREFIX = "STREAMING_INGEST_TEST_DB";
Expand All @@ -64,6 +70,14 @@ public class StreamingIngestIT {
private Connection jdbcConnection;
private String testDb;

@Parameters(name = "{index}: {0}")
public static Object[] compressionAlgorithms() {
return new Object[] {"GZIP", "ZSTD"};
}

@Parameter
public String compressionAlgorithm;

@Before
public void beforeAll() throws Exception {
testDb = TEST_DB_PREFIX + "_" + UUID.randomUUID().toString().substring(0, 4);
Expand All @@ -90,7 +104,7 @@ public void beforeAll() throws Exception {

// Test without role param
prop = TestUtils.getProperties(Constants.BdecVersion.THREE, true);

prop.setProperty(BDEC_PARQUET_COMPRESSION_ALGORITHM, compressionAlgorithm);
client =
(SnowflakeStreamingIngestClientInternal<?>)
SnowflakeStreamingIngestClientFactory.builder("client1").setProperties(prop).build();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,17 @@
import net.snowflake.ingest.streaming.SnowflakeStreamingIngestClient;
import net.snowflake.ingest.streaming.SnowflakeStreamingIngestClientFactory;
import net.snowflake.ingest.utils.Constants;
import static net.snowflake.ingest.utils.ParameterProvider.BDEC_PARQUET_COMPRESSION_ALGORITHM;
import net.snowflake.ingest.utils.SFException;
import org.junit.After;
import org.junit.Assert;
import org.junit.Before;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.junit.runners.Parameterized.Parameter;
import org.junit.runners.Parameterized.Parameters;

@RunWith(Parameterized.class)
public abstract class AbstractDataTypeTest {
private static final String SOURCE_COLUMN_NAME = "source";
private static final String VALUE_COLUMN_NAME = "value";
Expand Down Expand Up @@ -56,6 +62,14 @@ public abstract class AbstractDataTypeTest {
private SnowflakeStreamingIngestClient client;
private static final ObjectMapper objectMapper = new ObjectMapper();

@Parameters(name = "{index}: {0}")
public static Object[] compressionAlgorithms() {
return new Object[] {"GZIP", "ZSTD"};
}

@Parameter
public String compressionAlgorithm;

@Before
public void before() throws Exception {
databaseName = String.format("SDK_DATATYPE_COMPATIBILITY_IT_%s", getRandomIdentifier());
Expand All @@ -70,6 +84,7 @@ public void before() throws Exception {
if (props.getProperty(ROLE).equals("DEFAULT_ROLE")) {
props.setProperty(ROLE, "ACCOUNTADMIN");
}
props.setProperty(BDEC_PARQUET_COMPRESSION_ALGORITHM, compressionAlgorithm);
client = SnowflakeStreamingIngestClientFactory.builder("client1").setProperties(props).build();
}

Expand Down
Loading