-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-987122 Upgrade JDBC to 3.14.5 and Catch new exception type for r… #677
SNOW-987122 Upgrade JDBC to 3.14.5 and Catch new exception type for r… #677
Conversation
can close #660 |
I expect large values test to fail until SNOW-1003775 is fixed but you can work around by changing the tests to read back the length of values instead of the values themselves. Note that gh actions will just crash for these failures and you won't see a proper test failure in the logs. @sfc-gh-japatel |
Ahh yes, thats what I am seeing, FlushServiceTests are running indefinitely and no errors as well! Thanks for pointing it out |
4cf2792
to
5a71a12
Compare
e2e-jar-test/standard/src/test/java/net/snowflake/StandardIngestE2ETest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, PTAL, thanks!
src/test/java/net/snowflake/ingest/streaming/internal/StreamingIngestBigFilesIT.java
Outdated
Show resolved
Hide resolved
src/test/java/net/snowflake/ingest/streaming/internal/DataValidationUtilTest.java
Show resolved
Hide resolved
if (e instanceof SnowflakeSQLLoggedException) { | ||
return ((SnowflakeSQLLoggedException) e).getErrorCode() | ||
== S3_OPERATION_ERROR.getMessageCode(); | ||
} else if (e instanceof SnowflakeSQLException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is highly error-prone, the reason we need to fix it again is because the JDBC is throwing a new exception, and we have no control on whether they will update the exception type or add a new code pass that throughs a new type of exception, I think we should just catch all exceptions and always retry a few times. + @sfc-gh-lsembera to see if he has a better idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think catching all exception is good idea here.
I agree that this is error prone and not safe for future versions but ideally the JDBC interface should not change its response. (I think if it as a non backward compatible change)
If you check JDBC code (handleS3Exception
), it catches all ClientException and ServerException.
It's better we catch specific exception and work with JDBC team!
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally the JDBC interface should not change its response.
Agree that catch all is not ideal, but how would you guarantee that there won't be any non-compatible change in the future? We don't have anyone who look into every JDBC PRs and make sure this won't happen. In fact, the same issues happens twice in a few months, the first fix (by Lukas) is done because they update the exception type, the second fix (this PR) is because they add a new place with a different exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but how would you guarantee that there won't be any non-compatible change in the future?
I think this is something that the library owners should worry about. We as a consumer would have to go through these set of changes everytime we upgrade because a version we want to use fixes other issues.
having that said, I dont have a strong preference of either of them, I just feel catch all is suppressing all exceptions without knowing what the root cause is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already retrying all exceptions up to 5 times, see here. What would have to change is that we would also attempt to refresh the token on each unknown exception, which we don't do currently, we only refresh the token if isCredentialsExpiredException
is true. I think that given the recurring issues with this logic and the big impact it is having (users are unable to ingest until they restart their applications - not even channel reopen helps), we should be more resilient and handle all exceptions. WDYT about catching all unknown exceptions, just like now, keep retrying 5 times, but attempt to refresh the token after the first exception out of 5, regardless if it isCredentialsExpiredException
or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or more like refresh the token regardless of an exception on the final attempt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit of doing it in the final attempt would be we would issue fewer token renewals, but on the other hand it would introduce latency because the token would only be renewed on 5th retry. I don't know which one is a bigger concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the frequency of these issues (which by far happened once in few hours) so I think it might be okay to do it in first attempt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on always refresh the token on the first exception, and then retry 5 times if the exception persisted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its implemented, PTAL! thanks
I had already created an jira last week and assigned to @sfc-gh-jfan to help investigate. |
@@ -65,7 +65,7 @@ | |||
<shadeBase>net.snowflake.ingest.internal</shadeBase> | |||
<slf4j.version>1.7.36</slf4j.version> | |||
<snappy.version>1.1.10.4</snappy.version> | |||
<snowjdbc.version>3.13.30</snowjdbc.version> | |||
<snowjdbc.version>3.14.5</snowjdbc.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the fix of https://github.com/snowflakedb/snowflake-sdks-drivers-issues-teamwork/issues/819 is
on its way, could we wait for it and don't do the tricks with large strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we can wait now that it is pushed. but we will have to wait til next release of JDBC.
are we okay with that @sfc-gh-xhuang ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see when next release of JDBC is.
https://snowflake.slack.com/archives/C054J469AN4/p1707957136445859
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if they release next week, there's no guarantees it will be a simple upgrade either.
The large strings issue only affects reads and this test, it doesn't affect write behavior which is the only thing we use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. We are just a bystander for that bug. It is not in our code and they were able to reproduce it themselves. I support not waiting for that fix to land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JDBC plans to release next week but please consider the trade off of trying to use 3.14.6 vs today's 3.14.5
if (e instanceof SnowflakeSQLLoggedException) { | ||
return ((SnowflakeSQLLoggedException) e).getErrorCode() | ||
== S3_OPERATION_ERROR.getMessageCode(); | ||
} else if (e instanceof SnowflakeSQLException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already retrying all exceptions up to 5 times, see here. What would have to change is that we would also attempt to refresh the token on each unknown exception, which we don't do currently, we only refresh the token if isCredentialsExpiredException
is true. I think that given the recurring issues with this logic and the big impact it is having (users are unable to ingest until they restart their applications - not even channel reopen helps), we should be more resilient and handle all exceptions. WDYT about catching all unknown exceptions, just like now, keep retrying 5 times, but attempt to refresh the token after the first exception out of 5, regardless if it isCredentialsExpiredException
or not?
@@ -310,15 +310,15 @@ public void testValidCompressionAlgorithmsAndWithUppercaseLowerCase() { | |||
}); | |||
List<String> zstdValues = Arrays.asList("ZSTD", "zstd", "Zstd", "zStd"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated unhelpful question but I don't understand the need for this test.
Do we not just normalize the zstdValue input? Why do we only test 4 variant cases of ZSTD? There's ZStd, zSTd, etc
Folks, updated the PR with:
Thanks, PTAL! |
0bdab12
to
2529d72
Compare
e2e-jar-test/fips/src/test/java/net/snowflake/FipsIngestE2ETest.java
Outdated
Show resolved
Hide resolved
src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStage.java
Show resolved
Hide resolved
if (e instanceof SnowflakeSQLLoggedException) { | ||
return ((SnowflakeSQLLoggedException) e).getErrorCode() | ||
== S3_OPERATION_ERROR.getMessageCode(); | ||
} else if (e instanceof SnowflakeSQLException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on always refresh the token on the first exception, and then retry 5 times if the exception persisted
src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestStage.java
Show resolved
Hide resolved
…enewing expired S3 token
eeb6828
to
5e4d15d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left one minor comment, thanks!
// for the first exception, we always perform a metadata refresh. | ||
logger.logInfo( | ||
"Stage metadata need to be refreshed due to upload error: {} on first retry attempt", | ||
e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move out the log below? Also we need to improve the log since logging only message might miss some information
logger.logInfo(
"Retrying upload, attempt {}/{} {}", retryCount, maxUploadRetries, e.getMessage());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is preferable since this is indicating it is refreshing metadata. I can add stacktrace but that would be too much on every attempt.
We are already printing your suggested log line below.
…enewing expired S3 token
Changes
Here is the reason why we were not able to run for more than 1 hour when updating JDBC to latest available version: https://github.com/snowflakedb/snowflake-jdbc/pull/1473/files
@sfc-gh-lsembera did add a retry logic but that only compatible with version 3.13.30
Few changes are because of formatter.
isCredentialsExpiredException
function is removed!Fixed a failing test and will fix it once next version of JDBC is released: https://github.com/snowflakedb/snowflake-sdks-drivers-issues-teamwork/issues/819
Ran longRunningTest present in E2E Jar. Look at this commit's E2E runs: (Took> 2 hours to run) 35c8468