SNOW-987122 Upgrade JDBC to 3.14.5 and Catch new exception type for r… #677

sfc-gh-japatel · 2024-02-08T22:04:51Z

…enewing expired S3 token

Changes

Here is the reason why we were not able to run for more than 1 hour when updating JDBC to latest available version: https://github.com/snowflakedb/snowflake-jdbc/pull/1473/files
@sfc-gh-lsembera did add a retry logic but that only compatible with version 3.13.30
Few changes are because of formatter.
isCredentialsExpiredException function is removed!
- we will retry on all exceptions and run metadata refresh on first exception.
Fixed a failing test and will fix it once next version of JDBC is released: https://github.com/snowflakedb/snowflake-sdks-drivers-issues-teamwork/issues/819
- The issue is only while JDBC query reads a column containing large values, not writing to table.
Ran longRunningTest present in E2E Jar. Look at this commit's E2E runs: (Took> 2 hours to run) 35c8468

sfc-gh-xhuang · 2024-02-09T18:57:23Z

can close #660

sfc-gh-psaha · 2024-02-09T19:36:19Z

I expect large values test to fail until SNOW-1003775 is fixed but you can work around by changing the tests to read back the length of values instead of the values themselves. Note that gh actions will just crash for these failures and you won't see a proper test failure in the logs. @sfc-gh-japatel

sfc-gh-japatel · 2024-02-09T21:28:42Z

I expect large values test to fail until SNOW-1003775 is fixed but you can work around by changing the tests to read back the length of values instead of the values themselves. Note that gh actions will just crash for these failures and you won't see a proper test failure in the logs. @sfc-gh-japatel

Ahh yes, thats what I am seeing, FlushServiceTests are running indefinitely and no errors as well! Thanks for pointing it out

e2e-jar-test/standard/src/test/java/net/snowflake/StandardIngestE2ETest.java

sfc-gh-japatel · 2024-02-13T00:51:06Z

Snyk checks can be ignored since it is complaining pom of e2e library

sfc-gh-tzhang

Left some comments, PTAL, thanks!

README.md

pom.xml

src/test/java/net/snowflake/ingest/streaming/internal/StreamingIngestBigFilesIT.java

src/test/java/net/snowflake/ingest/streaming/internal/DataValidationUtilTest.java

sfc-gh-tzhang · 2024-02-13T01:14:30Z

src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStage.java

+    if (e instanceof SnowflakeSQLLoggedException) {
+      return ((SnowflakeSQLLoggedException) e).getErrorCode()
+          == S3_OPERATION_ERROR.getMessageCode();
+    } else if (e instanceof SnowflakeSQLException) {


This is highly error-prone, the reason we need to fix it again is because the JDBC is throwing a new exception, and we have no control on whether they will update the exception type or add a new code pass that throughs a new type of exception, I think we should just catch all exceptions and always retry a few times. + @sfc-gh-lsembera to see if he has a better idea

I dont think catching all exception is good idea here.
I agree that this is error prone and not safe for future versions but ideally the JDBC interface should not change its response. (I think if it as a non backward compatible change)
If you check JDBC code (handleS3Exception), it catches all ClientException and ServerException.

It's better we catch specific exception and work with JDBC team!
Thoughts?

ideally the JDBC interface should not change its response.

Agree that catch all is not ideal, but how would you guarantee that there won't be any non-compatible change in the future? We don't have anyone who look into every JDBC PRs and make sure this won't happen. In fact, the same issues happens twice in a few months, the first fix (by Lukas) is done because they update the exception type, the second fix (this PR) is because they add a new place with a different exception

but how would you guarantee that there won't be any non-compatible change in the future?

I think this is something that the library owners should worry about. We as a consumer would have to go through these set of changes everytime we upgrade because a version we want to use fixes other issues.

having that said, I dont have a strong preference of either of them, I just feel catch all is suppressing all exceptions without knowing what the root cause is.

We are already retrying all exceptions up to 5 times, see here. What would have to change is that we would also attempt to refresh the token on each unknown exception, which we don't do currently, we only refresh the token if isCredentialsExpiredException is true. I think that given the recurring issues with this logic and the big impact it is having (users are unable to ingest until they restart their applications - not even channel reopen helps), we should be more resilient and handle all exceptions. WDYT about catching all unknown exceptions, just like now, keep retrying 5 times, but attempt to refresh the token after the first exception out of 5, regardless if it isCredentialsExpiredException or not?

or more like refresh the token regardless of an exception on the final attempt?

The benefit of doing it in the final attempt would be we would issue fewer token renewals, but on the other hand it would introduce latency because the token would only be renewed on 5th retry. I don't know which one is a bigger concern.

the frequency of these issues (which by far happened once in few hours) so I think it might be okay to do it in first attempt?

+1 on always refresh the token on the first exception, and then retry 5 times if the exception persisted

Yes, its implemented, PTAL! thanks

sfc-gh-tzhang · 2024-02-13T01:21:09Z

Snyk checks can be ignored since it is complaining pom of e2e library

I don't think this can be ignored since we don't see this issue before, and I assume Snyk will keep failing and we will miss real issues if it keeps like this

sfc-gh-xhuang · 2024-02-13T16:22:18Z

Snyk checks can be ignored since it is complaining pom of e2e library

I don't think this can be ignored since we don't see this issue before, and I assume Snyk will keep failing and we will miss real issues if it keeps like this

I had already created an jira last week and assigned to @sfc-gh-jfan to help investigate.
SNOW-1045928

README.md

sfc-gh-lsembera · 2024-02-14T18:43:52Z

pom.xml

@@ -65,7 +65,7 @@
    <shadeBase>net.snowflake.ingest.internal</shadeBase>
    <slf4j.version>1.7.36</slf4j.version>
    <snappy.version>1.1.10.4</snappy.version>
-    <snowjdbc.version>3.13.30</snowjdbc.version>
+    <snowjdbc.version>3.14.5</snowjdbc.version>


Seems like the fix of https://github.com/snowflakedb/snowflake-sdks-drivers-issues-teamwork/issues/819 is
on its way, could we wait for it and don't do the tricks with large strings?

yeah, we can wait now that it is pushed. but we will have to wait til next release of JDBC.
are we okay with that @sfc-gh-xhuang ?

Let me see when next release of JDBC is.
https://snowflake.slack.com/archives/C054J469AN4/p1707957136445859

Even if they release next week, there's no guarantees it will be a simple upgrade either.
The large strings issue only affects reads and this test, it doesn't affect write behavior which is the only thing we use

+1. We are just a bystander for that bug. It is not in our code and they were able to reproduce it themselves. I support not waiting for that fix to land.

JDBC plans to release next week but please consider the trade off of trying to use 3.14.6 vs today's 3.14.5

sfc-gh-lsembera · 2024-02-14T18:58:12Z

src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStage.java

+    if (e instanceof SnowflakeSQLLoggedException) {
+      return ((SnowflakeSQLLoggedException) e).getErrorCode()
+          == S3_OPERATION_ERROR.getMessageCode();
+    } else if (e instanceof SnowflakeSQLException) {


We are already retrying all exceptions up to 5 times, see here. What would have to change is that we would also attempt to refresh the token on each unknown exception, which we don't do currently, we only refresh the token if isCredentialsExpiredException is true. I think that given the recurring issues with this logic and the big impact it is having (users are unable to ingest until they restart their applications - not even channel reopen helps), we should be more resilient and handle all exceptions. WDYT about catching all unknown exceptions, just like now, keep retrying 5 times, but attempt to refresh the token after the first exception out of 5, regardless if it isCredentialsExpiredException or not?

sfc-gh-xhuang · 2024-02-15T22:10:09Z

src/test/java/net/snowflake/ingest/streaming/internal/ParameterProviderTest.java

@@ -310,15 +310,15 @@ public void testValidCompressionAlgorithmsAndWithUppercaseLowerCase() {
        });
    List<String> zstdValues = Arrays.asList("ZSTD", "zstd", "Zstd", "zStd");


unrelated unhelpful question but I don't understand the need for this test.
Do we not just normalize the zstdValue input? Why do we only test 4 variant cases of ZSTD? There's ZStd, zSTd, etc

sfc-gh-japatel · 2024-02-16T00:01:34Z

Folks, updated the PR with:

Kept using jdbc 3.13.5, modify test
No special exception handling, catch all exception and retry on very first exception.

Thanks, PTAL!

README.md

e2e-jar-test/fips/src/test/java/net/snowflake/FipsIngestE2ETest.java

src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStage.java

sfc-gh-tzhang · 2024-02-20T20:48:16Z

src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStage.java

+    if (e instanceof SnowflakeSQLLoggedException) {
+      return ((SnowflakeSQLLoggedException) e).getErrorCode()
+          == S3_OPERATION_ERROR.getMessageCode();
+    } else if (e instanceof SnowflakeSQLException) {


+1 on always refresh the token on the first exception, and then retry 5 times if the exception persisted

src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStage.java

…enewing expired S3 token

…ilure of upload

sfc-gh-tzhang

LGTM, left one minor comment, thanks!

sfc-gh-tzhang · 2024-02-20T23:58:41Z

src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStage.java

+        // for the first exception, we always perform a metadata refresh.
+        logger.logInfo(
+            "Stage metadata need to be refreshed due to upload error: {} on first retry attempt",
+            e.getMessage());


You can move out the log below? Also we need to improve the log since logging only message might miss some information

logger.logInfo( "Retrying upload, attempt {}/{} {}", retryCount, maxUploadRetries, e.getMessage());

This is preferable since this is indicating it is refreshing metadata. I can add stacktrace but that would be too much on every attempt.

We are already printing your suggested log line below.

sfc-gh-xhuang requested a review from sfc-gh-psaha February 9, 2024 18:57

sfc-gh-japatel force-pushed the japatel-SNOW-987122-renew-token-exception branch from 4cf2792 to 5a71a12 Compare February 10, 2024 02:15

sfc-gh-japatel marked this pull request as ready for review February 12, 2024 21:07

sfc-gh-japatel requested review from sfc-gh-tzhang and a team as code owners February 12, 2024 21:07

sfc-gh-japatel commented Feb 12, 2024

View reviewed changes

e2e-jar-test/standard/src/test/java/net/snowflake/StandardIngestE2ETest.java Outdated Show resolved Hide resolved

sfc-gh-tzhang reviewed Feb 13, 2024

View reviewed changes

sfc-gh-japatel requested a review from sfc-gh-lsembera February 14, 2024 18:26

sfc-gh-lsembera reviewed Feb 14, 2024

View reviewed changes

sfc-gh-xhuang reviewed Feb 15, 2024

View reviewed changes

sfc-gh-japatel force-pushed the japatel-SNOW-987122-renew-token-exception branch from 0bdab12 to 2529d72 Compare February 16, 2024 00:11

sfc-gh-japatel requested review from sfc-gh-tzhang and sfc-gh-lsembera February 16, 2024 19:13

sfc-gh-tzhang reviewed Feb 20, 2024

View reviewed changes

sfc-gh-japatel commented Feb 20, 2024

View reviewed changes

src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStage.java Show resolved Hide resolved

sfc-gh-japatel added 7 commits February 20, 2024 13:04

SNOW-987122 Upgrade JDBC to 3.14.5 and Catch new exception type for r…

6a3df2e

…enewing expired S3 token

Modify enforcer rule for duplicate classes

cf7e289

Fix test and set UTC TZ

0bec90d

Temporary fix for failing long variant, object and array test

422fbaf

Do not commit this - Long running test

7d273f9

Remove isCredentialExpired exception and refresh metadata on first fa…

cd7d161

…ilure of upload

Long running fips e2e test- lukas suggestion

1d16b9f

Revert long running tests

5e4d15d

sfc-gh-japatel force-pushed the japatel-SNOW-987122-renew-token-exception branch from eeb6828 to 5e4d15d Compare February 20, 2024 21:04

Remove unused function and formatter

602e281

sfc-gh-tzhang approved these changes Feb 21, 2024

View reviewed changes

sfc-gh-japatel added 2 commits February 20, 2024 17:11

Add stacktrace in logs

2e402f3

Merge branch 'master' into japatel-SNOW-987122-renew-token-exception

460bfff

sfc-gh-japatel merged commit f8c8e3f into master Feb 21, 2024
12 checks passed

sfc-gh-japatel deleted the japatel-SNOW-987122-renew-token-exception branch February 21, 2024 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-987122 Upgrade JDBC to 3.14.5 and Catch new exception type for r… #677

SNOW-987122 Upgrade JDBC to 3.14.5 and Catch new exception type for r… #677

sfc-gh-japatel commented Feb 8, 2024 •

edited

Loading

sfc-gh-xhuang commented Feb 9, 2024

sfc-gh-psaha commented Feb 9, 2024

sfc-gh-japatel commented Feb 9, 2024

sfc-gh-japatel commented Feb 13, 2024

sfc-gh-tzhang left a comment

sfc-gh-tzhang Feb 13, 2024

sfc-gh-japatel Feb 13, 2024

sfc-gh-tzhang Feb 13, 2024

sfc-gh-japatel Feb 14, 2024 •

edited

Loading

sfc-gh-lsembera Feb 14, 2024 •

edited

Loading

sfc-gh-japatel Feb 14, 2024

sfc-gh-lsembera Feb 14, 2024

sfc-gh-japatel Feb 14, 2024

sfc-gh-tzhang Feb 20, 2024

sfc-gh-japatel Feb 20, 2024

sfc-gh-tzhang commented Feb 13, 2024 •

edited

Loading

sfc-gh-xhuang commented Feb 13, 2024

sfc-gh-lsembera Feb 14, 2024

sfc-gh-japatel Feb 14, 2024

sfc-gh-xhuang Feb 14, 2024 •

edited

Loading

sfc-gh-xhuang Feb 15, 2024

sfc-gh-psaha Feb 15, 2024

sfc-gh-xhuang Feb 15, 2024

sfc-gh-lsembera Feb 14, 2024 •

edited

Loading

sfc-gh-xhuang Feb 15, 2024

sfc-gh-japatel commented Feb 16, 2024

sfc-gh-tzhang Feb 20, 2024

sfc-gh-tzhang left a comment

sfc-gh-tzhang Feb 20, 2024

sfc-gh-japatel Feb 21, 2024 •

edited

Loading

		@@ -310,15 +310,15 @@ public void testValidCompressionAlgorithmsAndWithUppercaseLowerCase() {
		});
		List<String> zstdValues = Arrays.asList("ZSTD", "zstd", "Zstd", "zStd");

SNOW-987122 Upgrade JDBC to 3.14.5 and Catch new exception type for r… #677

SNOW-987122 Upgrade JDBC to 3.14.5 and Catch new exception type for r… #677

Conversation

sfc-gh-japatel commented Feb 8, 2024 • edited Loading

Changes

sfc-gh-xhuang commented Feb 9, 2024

sfc-gh-psaha commented Feb 9, 2024

sfc-gh-japatel commented Feb 9, 2024

sfc-gh-japatel commented Feb 13, 2024

sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-japatel Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

sfc-gh-lsembera Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-tzhang commented Feb 13, 2024 • edited Loading

sfc-gh-xhuang commented Feb 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-xhuang Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-lsembera Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-japatel commented Feb 16, 2024

Choose a reason for hiding this comment

sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-japatel Feb 21, 2024 • edited Loading

Choose a reason for hiding this comment

sfc-gh-japatel commented Feb 8, 2024 •

edited

Loading

sfc-gh-japatel Feb 14, 2024 •

edited

Loading

sfc-gh-lsembera Feb 14, 2024 •

edited

Loading

sfc-gh-tzhang commented Feb 13, 2024 •

edited

Loading

sfc-gh-xhuang Feb 14, 2024 •

edited

Loading

sfc-gh-lsembera Feb 14, 2024 •

edited

Loading

sfc-gh-japatel Feb 21, 2024 •

edited

Loading