Spark dependency is not compatible resulting in compile error #2199

dimitarKiryakov · 2024-02-16T09:32:35Z

What kind an issue is this?

Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.
Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

When adding the following dependency with spark 3.5.0 mvn clean install fails on compile stage:

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-spark-30_2.13</artifactId>
            <version>8.12.1</version>
        </dependency>

We use java 17 for the build.

Steps to reproduce

Code:
Pom

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>data-streaming</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <properties>
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
        <scala.binary.version>2.12</scala.binary.version>
        <spark.version>3.5.0</spark.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-aws</artifactId>
            <version>3.3.4</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-spark-30_${scala.binary.version}</artifactId>
            <version>8.12.1</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>

        <resources>
            <resource>
                <directory>resources</directory>
            </resource>
        </resources>
    </build>

</project>

Then add some code that uses: org.apache.spark.sql.Row or org.apache.spark.sql.types.*
Execute mvn clean install and it will fail with a compile error

Excluding spark_catalyst resolves this issue, but still, the dependency should be fixed:

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-spark-30_${scala.binary.version}</artifactId>
            <version>8.12.1</version>
            <exclusions>
                <exclusion>
                    <artifactId>spark-catalyst_2.12</artifactId>
                    <groupId>org.apache.spark</groupId>
                </exclusion>
            </exclusions>
        </dependency>

Version Info

OS: M1 Mac :
JVM : 17
Hadoop/Spark: 3.5.0
ES-Hadoop : 8.12.1
ES : 8.12.1

The text was updated successfully, but these errors were encountered:

masseyke · 2024-02-20T21:17:28Z

Can you paste the compiler error here? Also, I notice that you have a variable called scala.binary.version that is set to 2.12 but you are using elasticsearch-spark-30_2.13 (that last part is the scala version, 2.13). You might run into compatibility problems with scala 2.12 and 2.13.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark dependency is not compatible resulting in compile error #2199

Spark dependency is not compatible resulting in compile error #2199

dimitarKiryakov commented Feb 16, 2024

masseyke commented Feb 20, 2024

Spark dependency is not compatible resulting in compile error #2199

Spark dependency is not compatible resulting in compile error #2199

Comments

dimitarKiryakov commented Feb 16, 2024

What kind an issue is this?

Issue description

Steps to reproduce

Version Info

masseyke commented Feb 20, 2024