Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-987122 Upgrade JDBC to 3.14.5 and Catch new exception type for r… #677

Merged
merged 11 commits into from
Feb 21, 2024

Conversation

sfc-gh-japatel
Copy link
Collaborator

@sfc-gh-japatel sfc-gh-japatel commented Feb 8, 2024

…enewing expired S3 token

Changes

@sfc-gh-xhuang
Copy link
Contributor

can close #660

@sfc-gh-psaha
Copy link
Contributor

I expect large values test to fail until SNOW-1003775 is fixed but you can work around by changing the tests to read back the length of values instead of the values themselves. Note that gh actions will just crash for these failures and you won't see a proper test failure in the logs. @sfc-gh-japatel

@sfc-gh-japatel
Copy link
Collaborator Author

I expect large values test to fail until SNOW-1003775 is fixed but you can work around by changing the tests to read back the length of values instead of the values themselves. Note that gh actions will just crash for these failures and you won't see a proper test failure in the logs. @sfc-gh-japatel

Ahh yes, thats what I am seeing, FlushServiceTests are running indefinitely and no errors as well! Thanks for pointing it out

@sfc-gh-japatel sfc-gh-japatel force-pushed the japatel-SNOW-987122-renew-token-exception branch from 4cf2792 to 5a71a12 Compare February 10, 2024 02:15
@sfc-gh-japatel sfc-gh-japatel marked this pull request as ready for review February 12, 2024 21:07
@sfc-gh-japatel sfc-gh-japatel requested review from sfc-gh-tzhang and a team as code owners February 12, 2024 21:07
@sfc-gh-japatel
Copy link
Collaborator Author

Snyk checks can be ignored since it is complaining pom of e2e library
Screenshot 2024-02-12 at 4 50 07 PM

Copy link
Contributor

@sfc-gh-tzhang sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, PTAL, thanks!

README.md Show resolved Hide resolved
pom.xml Show resolved Hide resolved
Comment on lines 230 to 233
if (e instanceof SnowflakeSQLLoggedException) {
return ((SnowflakeSQLLoggedException) e).getErrorCode()
== S3_OPERATION_ERROR.getMessageCode();
} else if (e instanceof SnowflakeSQLException) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is highly error-prone, the reason we need to fix it again is because the JDBC is throwing a new exception, and we have no control on whether they will update the exception type or add a new code pass that throughs a new type of exception, I think we should just catch all exceptions and always retry a few times. + @sfc-gh-lsembera to see if he has a better idea

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think catching all exception is good idea here.
I agree that this is error prone and not safe for future versions but ideally the JDBC interface should not change its response. (I think if it as a non backward compatible change)
If you check JDBC code (handleS3Exception), it catches all ClientException and ServerException.

It's better we catch specific exception and work with JDBC team!
Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally the JDBC interface should not change its response.

Agree that catch all is not ideal, but how would you guarantee that there won't be any non-compatible change in the future? We don't have anyone who look into every JDBC PRs and make sure this won't happen. In fact, the same issues happens twice in a few months, the first fix (by Lukas) is done because they update the exception type, the second fix (this PR) is because they add a new place with a different exception

Copy link
Collaborator Author

@sfc-gh-japatel sfc-gh-japatel Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how would you guarantee that there won't be any non-compatible change in the future?

I think this is something that the library owners should worry about. We as a consumer would have to go through these set of changes everytime we upgrade because a version we want to use fixes other issues.

having that said, I dont have a strong preference of either of them, I just feel catch all is suppressing all exceptions without knowing what the root cause is.

Copy link
Contributor

@sfc-gh-lsembera sfc-gh-lsembera Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already retrying all exceptions up to 5 times, see here. What would have to change is that we would also attempt to refresh the token on each unknown exception, which we don't do currently, we only refresh the token if isCredentialsExpiredException is true. I think that given the recurring issues with this logic and the big impact it is having (users are unable to ingest until they restart their applications - not even channel reopen helps), we should be more resilient and handle all exceptions. WDYT about catching all unknown exceptions, just like now, keep retrying 5 times, but attempt to refresh the token after the first exception out of 5, regardless if it isCredentialsExpiredException or not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or more like refresh the token regardless of an exception on the final attempt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of doing it in the final attempt would be we would issue fewer token renewals, but on the other hand it would introduce latency because the token would only be renewed on 5th retry. I don't know which one is a bigger concern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the frequency of these issues (which by far happened once in few hours) so I think it might be okay to do it in first attempt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on always refresh the token on the first exception, and then retry 5 times if the exception persisted

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its implemented, PTAL! thanks

@sfc-gh-tzhang
Copy link
Contributor

sfc-gh-tzhang commented Feb 13, 2024

Snyk checks can be ignored since it is complaining pom of e2e library Screenshot 2024-02-12 at 4 50 07 PM

I don't think this can be ignored since we don't see this issue before, and I assume Snyk will keep failing and we will miss real issues if it keeps like this

@sfc-gh-xhuang
Copy link
Contributor

Snyk checks can be ignored since it is complaining pom of e2e library Screenshot 2024-02-12 at 4 50 07 PM

I don't think this can be ignored since we don't see this issue before, and I assume Snyk will keep failing and we will miss real issues if it keeps like this

I had already created an jira last week and assigned to @sfc-gh-jfan to help investigate.
SNOW-1045928

README.md Show resolved Hide resolved
@@ -65,7 +65,7 @@
<shadeBase>net.snowflake.ingest.internal</shadeBase>
<slf4j.version>1.7.36</slf4j.version>
<snappy.version>1.1.10.4</snappy.version>
<snowjdbc.version>3.13.30</snowjdbc.version>
<snowjdbc.version>3.14.5</snowjdbc.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the fix of https://github.com/snowflakedb/snowflake-sdks-drivers-issues-teamwork/issues/819 is
on its way, could we wait for it and don't do the tricks with large strings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we can wait now that it is pushed. but we will have to wait til next release of JDBC.
are we okay with that @sfc-gh-xhuang ?

Copy link
Contributor

@sfc-gh-xhuang sfc-gh-xhuang Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if they release next week, there's no guarantees it will be a simple upgrade either.
The large strings issue only affects reads and this test, it doesn't affect write behavior which is the only thing we use

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. We are just a bystander for that bug. It is not in our code and they were able to reproduce it themselves. I support not waiting for that fix to land.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDBC plans to release next week but please consider the trade off of trying to use 3.14.6 vs today's 3.14.5

Comment on lines 230 to 233
if (e instanceof SnowflakeSQLLoggedException) {
return ((SnowflakeSQLLoggedException) e).getErrorCode()
== S3_OPERATION_ERROR.getMessageCode();
} else if (e instanceof SnowflakeSQLException) {
Copy link
Contributor

@sfc-gh-lsembera sfc-gh-lsembera Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already retrying all exceptions up to 5 times, see here. What would have to change is that we would also attempt to refresh the token on each unknown exception, which we don't do currently, we only refresh the token if isCredentialsExpiredException is true. I think that given the recurring issues with this logic and the big impact it is having (users are unable to ingest until they restart their applications - not even channel reopen helps), we should be more resilient and handle all exceptions. WDYT about catching all unknown exceptions, just like now, keep retrying 5 times, but attempt to refresh the token after the first exception out of 5, regardless if it isCredentialsExpiredException or not?

@@ -310,15 +310,15 @@ public void testValidCompressionAlgorithmsAndWithUppercaseLowerCase() {
});
List<String> zstdValues = Arrays.asList("ZSTD", "zstd", "Zstd", "zStd");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated unhelpful question but I don't understand the need for this test.
Do we not just normalize the zstdValue input? Why do we only test 4 variant cases of ZSTD? There's ZStd, zSTd, etc

@sfc-gh-japatel
Copy link
Collaborator Author

Folks, updated the PR with:

  1. Kept using jdbc 3.13.5, modify test
  2. No special exception handling, catch all exception and retry on very first exception.

Thanks, PTAL!

@sfc-gh-japatel sfc-gh-japatel force-pushed the japatel-SNOW-987122-renew-token-exception branch from 0bdab12 to 2529d72 Compare February 16, 2024 00:11
README.md Show resolved Hide resolved
Comment on lines 230 to 233
if (e instanceof SnowflakeSQLLoggedException) {
return ((SnowflakeSQLLoggedException) e).getErrorCode()
== S3_OPERATION_ERROR.getMessageCode();
} else if (e instanceof SnowflakeSQLException) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on always refresh the token on the first exception, and then retry 5 times if the exception persisted

@sfc-gh-japatel sfc-gh-japatel force-pushed the japatel-SNOW-987122-renew-token-exception branch from eeb6828 to 5e4d15d Compare February 20, 2024 21:04
Copy link
Contributor

@sfc-gh-tzhang sfc-gh-tzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left one minor comment, thanks!

// for the first exception, we always perform a metadata refresh.
logger.logInfo(
"Stage metadata need to be refreshed due to upload error: {} on first retry attempt",
e.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move out the log below? Also we need to improve the log since logging only message might miss some information

logger.logInfo(
          "Retrying upload, attempt {}/{} {}", retryCount, maxUploadRetries, e.getMessage());

Copy link
Collaborator Author

@sfc-gh-japatel sfc-gh-japatel Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is preferable since this is indicating it is refreshing metadata. I can add stacktrace but that would be too much on every attempt.

We are already printing your suggested log line below.

@sfc-gh-japatel sfc-gh-japatel merged commit f8c8e3f into master Feb 21, 2024
12 checks passed
@sfc-gh-japatel sfc-gh-japatel deleted the japatel-SNOW-987122-renew-token-exception branch February 21, 2024 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants