ES-hadoop is not compatible with spark 3.5.1 #2210

edward-capriolo-db · 2024-04-02T14:57:39Z

What kind an issue is this?

[ X ] Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.
Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

Spark 3.5.1 has changed some UDF code in catalyst which breaks a number of applications built against older versions of spark

Steps to reproduce

Code:

es.writeStream()....

Strack trace:

2024-04-01 21:49:13 ERROR streaming.MicroBatchExecution:97 - Query reconquery [id = 4ead2d05-8e7f-4d9f-bbd2-9153441d2cb5, runId = dfabec28-8824-46d1-b573-5a49b5352ccd] terminated with error
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 8) (lonasworkd1.uk.db.com executor 2): java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder;
    at org.elasticsearch.spark.sql.streaming.EsStreamQueryWriter.<init>(EsStreamQueryWriter.scala:50)
    at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink.$anonfun$addBatch$5(EsSparkSqlStreamingSink.scala:72)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
    at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
    at org.apache.spark.scheduler.Task.run(Task.scala:141)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

Version Info

OS: : Linux
JVM : JDK8/11
Hadoop/Spark:
ES-Hadoop :

      <dependency>
                 <groupId>org.elasticsearch</groupId>
                 <artifactId>elasticsearch-spark-30_${scala.version}</artifactId>
               <version>8.13.0</version>
             </dependency>

ES : 7.X latest.

Feature description

The text was updated successfully, but these errors were encountered:

edward-capriolo-db · 2024-04-02T14:59:00Z

Many projects have similar issues from the spark api change

https://stackoverflow.com/questions/77797606/rowencoder-apply-and-rowencoder-encoderfor-methods-in-spark-catalyst-package

masseyke · 2024-04-02T17:27:20Z

Thanks for the report!

edward-capriolo-db · 2024-04-03T22:43:46Z

[INFO] | | +- org.apache.spark:spark-network-common_2.12:jar:3.5.1:compile
[INFO] | | | - com.google.crypto.tink:tink:jar:1.9.0:compile
[INFO] | | | +- com.google.code.gson:gson:jar:2.8.9:compile
[INFO] | | | +- com.google.protobuf:protobuf-java:jar:3.19.6:compile
[INFO] | | | - joda-time:joda-time:jar:2.12.5:compile

Also dependencies are bringing in a protobuf that is old that sets off OSS vulnerability scanning.
[INFO] +- org.elasticsearch:elasticsearch-spark-30_2.12:jar:8.13.0:compile
[INFO] | +- org.scala-lang:scala-reflect:jar:2.12.17:compile
[INFO] | +- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] | +- javax.xml.bind:jaxb-api:jar:2.3.1:runtime
[INFO] | - com.google.protobuf:protobuf-java:jar:2.5.0:compile

masseyke · 2024-04-15T20:50:09Z

Upgrading to Spark 3.5 is going to be tricky because of compiler errors like this caused by a breaking change in the spark API:

[Error] /Users/kmassey/workspace/elasticsearch-hadoop/spark/core/src/main/scala/org/elasticsearch/spark/package.scala:34:42: Symbol 'type org.apache.spark.internal.Logging' is missing from the classpath.
This symbol is required by 'class org.apache.spark.SparkContext'.
Make sure that type Logging is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'SparkContext.class' was compiled against an incompatible version of org.apache.spark.internal.
[Error] /Users/kmassey/workspace/elasticsearch-hadoop/spark/core/src/main/scala/org/elasticsearch/spark/rdd/EsSpark.scala:25:8: Symbol 'type org.apache.spark.internal.Logging' is missing from the classpath.
This symbol is required by 'class org.apache.spark.rdd.RDD'.
Make sure that type Logging is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'RDD.class' was compiled against an incompatible version of org.apache.spark.internal.
[Error] /Users/kmassey/workspace/elasticsearch-hadoop/spark/core/src/main/scala/org/elasticsearch/spark/cfg/SparkSettingsManager.java:21:8: Symbol 'type org.apache.spark.internal.Logging' is missing from the classpath.
This symbol is required by 'class org.apache.spark.SparkConf'.
Make sure that type Logging is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'SparkConf.class' was compiled against an incompatible version of org.apache.spark.internal.
three errors found

I think we'll have to move several more classes from our spark core package down into the various spark-version-specific packages.

edward-capriolo-db · 2024-04-16T11:45:13Z

These are unavoidable, previously in Hive we had made "shim layers" and used reflection to deal with breaking API changes. I will look into at least getting it working and then we can see what the change set is.

ps-rterzman · 2024-06-20T07:42:21Z

Is there any updates on that?

masseyke · 2024-06-20T13:06:12Z

We recently added support for 3.4.3, but we have not dealt with the big changes in 3.5 yet.

chandaku · 2024-09-23T06:24:31Z

@masseyke I am facing this issue with spark 3.5.2 any update to this so far?

masseyke · 2024-09-23T13:04:09Z

No update yet, sorry.

chandaku · 2024-09-24T06:30:54Z

@masseyke Thanks for confirmation

masseyke mentioned this issue Apr 15, 2024

Upgrade to Spark 3.4.x #2187

Closed

2 tasks

braniq mentioned this issue Jul 19, 2024

[FEATURE] Add support for Apache Spark 3.5.1 (streaming) opensearch-project/opensearch-hadoop#496

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES-hadoop is not compatible with spark 3.5.1 #2210

ES-hadoop is not compatible with spark 3.5.1 #2210

edward-capriolo-db commented Apr 2, 2024 •

edited

Loading

edward-capriolo-db commented Apr 2, 2024

masseyke commented Apr 2, 2024

edward-capriolo-db commented Apr 3, 2024

masseyke commented Apr 15, 2024

edward-capriolo-db commented Apr 16, 2024

ps-rterzman commented Jun 20, 2024

masseyke commented Jun 20, 2024

chandaku commented Sep 23, 2024

masseyke commented Sep 23, 2024

chandaku commented Sep 24, 2024

ES-hadoop is not compatible with spark 3.5.1 #2210

ES-hadoop is not compatible with spark 3.5.1 #2210

Comments

edward-capriolo-db commented Apr 2, 2024 • edited Loading

What kind an issue is this?

Issue description

Steps to reproduce

Version Info

Feature description

edward-capriolo-db commented Apr 2, 2024

masseyke commented Apr 2, 2024

edward-capriolo-db commented Apr 3, 2024

masseyke commented Apr 15, 2024

edward-capriolo-db commented Apr 16, 2024

ps-rterzman commented Jun 20, 2024

masseyke commented Jun 20, 2024

chandaku commented Sep 23, 2024

masseyke commented Sep 23, 2024

chandaku commented Sep 24, 2024

edward-capriolo-db commented Apr 2, 2024 •

edited

Loading