Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES-hadoop is not compatible with spark 3.5.1 #2210

Open
1 task
edward-capriolo-db opened this issue Apr 2, 2024 · 10 comments
Open
1 task

ES-hadoop is not compatible with spark 3.5.1 #2210

edward-capriolo-db opened this issue Apr 2, 2024 · 10 comments

Comments

@edward-capriolo-db
Copy link

edward-capriolo-db commented Apr 2, 2024

What kind an issue is this?

  • [ X ] Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
    The easier it is to track down the bug, the faster it is solved.
  • Feature Request. Start by telling us what problem you’re trying to solve.
    Often a solution already exists! Don’t send pull requests to implement new features without
    first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

Spark 3.5.1 has changed some UDF code in catalyst which breaks a number of applications built against older versions of spark

Steps to reproduce

Code:

es.writeStream().... 

Strack trace:

2024-04-01 21:49:13 ERROR streaming.MicroBatchExecution:97 - Query reconquery [id = 4ead2d05-8e7f-4d9f-bbd2-9153441d2cb5, runId = dfabec28-8824-46d1-b573-5a49b5352ccd] terminated with error
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 8) (lonasworkd1.uk.db.com executor 2): java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder;
    at org.elasticsearch.spark.sql.streaming.EsStreamQueryWriter.<init>(EsStreamQueryWriter.scala:50)
    at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink.$anonfun$addBatch$5(EsSparkSqlStreamingSink.scala:72)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
    at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
    at org.apache.spark.scheduler.Task.run(Task.scala:141)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

Version Info

OS: : Linux
JVM : JDK8/11
Hadoop/Spark:
ES-Hadoop :

      <dependency>
                 <groupId>org.elasticsearch</groupId>
                 <artifactId>elasticsearch-spark-30_${scala.version}</artifactId>
               <version>8.13.0</version>
             </dependency>

ES : 7.X latest.

Feature description

@edward-capriolo-db
Copy link
Author

@masseyke
Copy link
Member

masseyke commented Apr 2, 2024

Thanks for the report!

@edward-capriolo-db
Copy link
Author

[INFO] | | +- org.apache.spark:spark-network-common_2.12:jar:3.5.1:compile
[INFO] | | | - com.google.crypto.tink:tink:jar:1.9.0:compile
[INFO] | | | +- com.google.code.gson:gson:jar:2.8.9:compile
[INFO] | | | +- com.google.protobuf:protobuf-java:jar:3.19.6:compile
[INFO] | | | - joda-time:joda-time:jar:2.12.5:compile

Also dependencies are bringing in a protobuf that is old that sets off OSS vulnerability scanning.
[INFO] +- org.elasticsearch:elasticsearch-spark-30_2.12:jar:8.13.0:compile
[INFO] | +- org.scala-lang:scala-reflect:jar:2.12.17:compile
[INFO] | +- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] | +- javax.xml.bind:jaxb-api:jar:2.3.1:runtime
[INFO] | - com.google.protobuf:protobuf-java:jar:2.5.0:compile

@masseyke masseyke mentioned this issue Apr 15, 2024
2 tasks
@masseyke
Copy link
Member

Upgrading to Spark 3.5 is going to be tricky because of compiler errors like this caused by a breaking change in the spark API:

[Error] /Users/kmassey/workspace/elasticsearch-hadoop/spark/core/src/main/scala/org/elasticsearch/spark/package.scala:34:42: Symbol 'type org.apache.spark.internal.Logging' is missing from the classpath.
This symbol is required by 'class org.apache.spark.SparkContext'.
Make sure that type Logging is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'SparkContext.class' was compiled against an incompatible version of org.apache.spark.internal.
[Error] /Users/kmassey/workspace/elasticsearch-hadoop/spark/core/src/main/scala/org/elasticsearch/spark/rdd/EsSpark.scala:25:8: Symbol 'type org.apache.spark.internal.Logging' is missing from the classpath.
This symbol is required by 'class org.apache.spark.rdd.RDD'.
Make sure that type Logging is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'RDD.class' was compiled against an incompatible version of org.apache.spark.internal.
[Error] /Users/kmassey/workspace/elasticsearch-hadoop/spark/core/src/main/scala/org/elasticsearch/spark/cfg/SparkSettingsManager.java:21:8: Symbol 'type org.apache.spark.internal.Logging' is missing from the classpath.
This symbol is required by 'class org.apache.spark.SparkConf'.
Make sure that type Logging is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'SparkConf.class' was compiled against an incompatible version of org.apache.spark.internal.
three errors found

I think we'll have to move several more classes from our spark core package down into the various spark-version-specific packages.

@edward-capriolo-db
Copy link
Author

These are unavoidable, previously in Hive we had made "shim layers" and used reflection to deal with breaking API changes. I will look into at least getting it working and then we can see what the change set is.

@ps-rterzman
Copy link

Is there any updates on that?

@masseyke
Copy link
Member

We recently added support for 3.4.3, but we have not dealt with the big changes in 3.5 yet.

@chandaku
Copy link

@masseyke I am facing this issue with spark 3.5.2 any update to this so far?

@masseyke
Copy link
Member

No update yet, sorry.

@chandaku
Copy link

@masseyke Thanks for confirmation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants