Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

ClassNotFoundException: com.audienceproject.spark.dynamodb.datasource.DynamoWriterFactory #103

Open
michaelmaitland opened this issue Jul 30, 2021 · 0 comments

Comments

@michaelmaitland
Copy link

michaelmaitland commented Jul 30, 2021

Spark Version: 2.12.10

I submit job to AWS EMR Cluster as follow:

aws emr-containers start-job-run \
--virtual-cluster-id xxx \
--name spark-pi \
--execution-role-arn arn:aws:iam::xxx:role/xxx \
--release-label emr-6.2.0-latest \
--job-driver '{
    "sparkSubmitJobDriver": {
        "entryPoint": "s3://xxx/spark-scripts/xxx-spark.py",
        "entryPointArguments" : ["s3://xxx/data/xxx"],
        "sparkSubmitParameters": "--conf spark.executor.instances=2 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1 --packages com.audienceproject:spark-dynamodb_2.12:1.1.2"
        }
    }'

This results in the following exception:

21/07/30 17:33:24 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.0.137.191, executor 2): java.lang.ClassNotFoundException: com.audienceproject.spark.dynamodb.datasource.DynamoWriterFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1986)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1850)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2160)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

When I get a shell into the pod driver container, I see com.audienceproject_spark-dynamodb_2.12-1.1.2.jar in .ivy2 and if I get the spark-kubernetes-driver lots for that pod, I see that the dependency was brought in:

:: resolution report :: resolve 2886ms :: artifacts dl 497ms
	:: modules in use:
	...
	com.audienceproject#spark-dynamodb_2.12;1.1.2 from central in [default]
        ...
	:: evicted modules:
        ...

It looks like this guy had a similiar issue: https://githubmemory.com/repo/audienceproject/spark-dynamodb/issues/45 but the start job run request is the only place I specify the dependency:

        "sparkSubmitParameters": "--conf spark.executor.instances=2 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1 --packages com.audienceproject:spark-dynamodb_2.12:1.1.2"

Do I have to install the package anywhere else?

This guy also had a similiar issue due to using unreleased maven issues but I am using 2.12:1.1.2 which is released: https://githubmemory.com/repo/audienceproject/spark-dynamodb/issues/47

Any ideas on how to solve?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

1 participant