Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-999335: Spark snowflake read results in certificate issue #1591

Closed
dyang108 opened this issue Jan 3, 2024 · 13 comments
Closed

SNOW-999335: Spark snowflake read results in certificate issue #1591

dyang108 opened this issue Jan 3, 2024 · 13 comments
Assignees
Labels

Comments

@dyang108
Copy link

dyang108 commented Jan 3, 2024

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of JDBC driver are you using?
    3.14.2

  2. What operating system and processor architecture are you using?
    amazon linux

  3. What version of Java are you using?
    11

  4. What did you do?
    Reading from snowflake with snowflake jdbc driver in Spark results in S3 ssl certificate issue. It seems like our Spark job is trying to access a snowflake customer staging bucket and is failing:

Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 1296) (10.0.24.229 executor 5): net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver internal error: Max retry reached for the download of #chunk0 (Total chunks: 17) retry=7, error=net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver encountered communication error. Message: Certificate for <sfc-va-ds1-customer-stage.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com].
    at net.snowflake.client.jdbc.RestRequest.execute(RestRequest.java:237)
    at net.snowflake.client.jdbc.DefaultResultStreamProvider.getResultChunk(DefaultResultStreamProvider.java:122)
    at net.snowflake.client.jdbc.DefaultResultStreamProvider.getInputStream(DefaultResultStreamProvider.java:39)
    at net.snowflake.client.jdbc.SnowflakeChunkDownloader$2.call(SnowflakeChunkDownloader.java:975)
    at net.snowflake.client.jdbc.SnowflakeChunkDownloader$2.call(SnowflakeChunkDownloader.java:889)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for <sfc-va-ds1-customer-stage.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507)
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437)
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384)
    at net.snowflake.client.jdbc.internal.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
    at net.snowflake.client.jdbc.internal.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
    at net.snowflake.client.jdbc.internal.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
    at net.snowflake.client.jdbc.internal.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
    at net.snowflake.client.jdbc.internal.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
    at net.snowflake.client.jdbc.internal.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
    at net.snowflake.client.jdbc.internal.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at net.snowflake.client.jdbc.internal.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
    at net.snowflake.client.jdbc.internal.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
    at net.snowflake.client.jdbc.internal.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
    at net.snowflake.client.jdbc.RestRequest.execute(RestRequest.java:222)
    ... 8 more
.
    at net.snowflake.client.jdbc.SnowflakeChunkDownloader.getNextChunkToConsume(SnowflakeChunkDownloader.java:601)
    at net.snowflake.client.core.SFArrowResultSet.fetchNextRowUnsorted(SFArrowResultSet.java:232)
    at net.snowflake.client.core.SFArrowResultSet.fetchNextRow(SFArrowResultSet.java:209)
    at net.snowflake.client.core.SFArrowResultSet.next(SFArrowResultSet.java:344)
    at net.snowflake.client.jdbc.SnowflakeResultSetV1.next(SnowflakeResultSetV1.java:92)
    at net.snowflake.spark.snowflake.io.ResultIterator.hasNext(SnowflakeResultSetRDD.scala:152)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
    at org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.hasNext(InMemoryRelation.scala:118)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
    at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:302)
    at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1508)
    at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1435)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1499)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1322)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:327)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:136)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

I reverted back to 3.12.12 and that helped fix the issue for me, but I wanted to flag this moving forward. LMK if this belongs better in the spark-snowflake project.

@dyang108 dyang108 added the bug label Jan 3, 2024
@github-actions github-actions bot changed the title Spark snowflake read results in certificate issue SNOW-999335: Spark snowflake read results in certificate issue Jan 3, 2024
@sfc-gh-wfateem
Copy link
Collaborator

sfc-gh-wfateem commented Jan 11, 2024

@dyang108 thanks for reporting this. Do you know if you have a proxy server in your environment and if you expect your S3 connection to go through it or it should bypass it?
The changes between 3.12.12 and 3.14.2 are quite a handful, but there has been significant changes in proxy configurations.

Can you open a support case and provide the Spark logs after adding the following JVM argument, please?
-Djavax.net.debug=ssl,handshake

You would need to add that to the Spark driver's extra Java options, for instance:
--conf spark.driver.extraJavaOptions='-Djavax.net.debug=ssl,handshake'

@sfc-gh-wfateem
Copy link
Collaborator

@dyang108 do you still need help with this?

@dyang108
Copy link
Author

I did a downgrade to 3.12.12 as a workaround. I think this remains an issue on 3.14.2.

I'm not sure if we have a proxy configured - if we do, this would be the first i've heard of it.

@sfc-gh-wfateem
Copy link
Collaborator

@dyang108 It's definitely interesting that your issue goes away after you downgrade to 3.12.12, but I'm not sure why that would be the case, and that's a pretty old version. Based on the error stack the problem was raised by the Apache HTTP client code when verifying the hostname, and that library version changed from v4.5.5 to v4.5.14 between those two JDBC driver versions. I don't see any difference in the implementation of that method in both of those branches:

Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for <sfc-va-ds1-customer-stage.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507)
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437)
    at net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384)
    at net.snowflake.client.jdbc.internal.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
    at net.snowflake.client.jdbc.internal.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)

The challenge here is that this isn't something I'm able to reproduce, so it's hard to really say what might be going on here. While the JDBC driver version change has caused some sort of problem in your case, there's just no way for us to debug your problem without additional information.

A good place to start is to review the output produced by the following JVM argument when reproducing the issue:
-Djavax.net.debug=ssl,handshake

Will you be able to open a support case to share that information with us? Otherwise, I'm not entirely sure how else we can look into this.

@sfc-gh-wfateem sfc-gh-wfateem added the status-information_needed Additional information is required from the reporter label Feb 21, 2024
@sfc-gh-wfateem sfc-gh-wfateem self-assigned this Feb 23, 2024
@sfc-gh-wfateem
Copy link
Collaborator

I'm going to close this issue for now. If you're able to provide additional information to help us debug the issue then please feel free to open this again.

@sfc-gh-wfateem
Copy link
Collaborator

Reopening case since we received more information about the problem from a different user that experienced the same issue.

@richard-axual
Copy link

I've encountered this issue today with the Kafka Connector 2.2.1 and 2.1.2.
They use the JDBC version 3.24.5 and 3.13.30 respectively.
This might be an interaction with the JVM
I'm running my connect in a container using a red hat UBI OpenJDK image

> java -version
openjdk version "21.0.2" 2024-01-16 LTS
OpenJDK Runtime Environment (Red_Hat-21.0.2.0.13-1) (build 21.0.2+13-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-21.0.2.0.13-1) (build 21.0.2+13-LTS, mixed mode)

Maybe this can help you track down the issue

@sfc-gh-wfateem
Copy link
Collaborator

@dyang108 @richard-axual we debugged this extensively with a user experiencing the same issue and based on our findings this had to do with the Apache HTTP client's usage of the public-suffix-list.txt file. PR #1690 addresses that issue and was included in version 3.15.1 which the user confirmed addressed their problem.
Can you try testing this with the JDBC driver version 3.15.1 and let us know if that addresses your problem or not?

@richard-axual
Copy link

@sfc-gh-wfateem Thanks for the update and explanation.
I've only encountered this issue with the Snowflake Kafka Connector, which embeds the JDBC driver.
So unfortunately I cannot verify this until the Kafka connector updates the dependency as well.

@sfc-gh-wfateem
Copy link
Collaborator

@richard-axual was your issue consistent or was it a one-time problem you experienced with the Snowflake Kafka Connector? The issue we're discussing here is a consistent problem once the JDBC driver version is upgraded.

@richard-axual
Copy link

@sfc-gh-wfateem The error with the updated version of the connector is consistent, but since it is on a different intermediate project I cannot guarantee it's the same problem

@sfc-gh-wfateem
Copy link
Collaborator

Thanks @richard-axual
We spent quite a bit of time trying to figure this out especially that it wasn't reproducible. The best solution we have which we believe should address the issue reported here is in #1690.
If you have a dev environment where you can consistently reproduce the issue, then I suggest trying to rebuild the Kafka connector with the newer JDBC driver version just to test if it actually addresses the issue or not.

@sfc-gh-wfateem
Copy link
Collaborator

I'm going to close this issue based on the assumption that #1690 addresses the issue, if not, please feel free to open this again.

@sfc-gh-wfateem sfc-gh-wfateem removed the status-information_needed Additional information is required from the reporter label Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants