Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to collect data from an RDD on spark cluster #289

Open
deepakjangid123 opened this issue Apr 19, 2018 · 2 comments
Open

Not able to collect data from an RDD on spark cluster #289

deepakjangid123 opened this issue Apr 19, 2018 · 2 comments

Comments

@deepakjangid123
Copy link

deepakjangid123 commented Apr 19, 2018

Hi all,
This is my spark configuration for spark-context in Clojure. I am running Spark 2.2.1 and flambo 0.8.2. Flambo is a Clojure DSL for Apache Spark and flambo-0.8x is compatible with Spark-2x.

(ns some-namespace
  (:require
     [flambo.api :as f]
     [flambo.conf :as conf]))

;; Spark-configurations
(def c
  (-> (conf/spark-conf)
      (conf/master "spark://abhi:7077")
      (conf/app-name "My-app")
      ;; ***To add all the jars used in the project***
      (conf/jars (map #(.getPath %) (.getURLs (java.lang.ClassLoader/getSystemClassLoader))))
      (conf/set "spark.cores.max" "4")
      (conf/set "spark.executor.memory" "2g")
      (conf/set "spark.home" (get (System/getenv) "SPARK_HOME")))

;; Initializing spark-context
(def sc (f/spark-context c))

;; Returns data in `org.apache.spark.api.java.JavaRDD` form from a file
(def data (f/text-file sc "Path-to-a-csv-file"))
;; Collecting data from RDD
(.collect data)

I am able to run this piece of code on master local[*]. But when I am using spark cluster, I am getting following error while collecting the JavaRDD.

18/04/19 05:50:39 ERROR util.Utils: Exception encountered
java.lang.ArrayStoreException: [Z
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
	at org.apache.spark.util.collection.PrimitiveVector.$plus$eq(PrimitiveVector.scala:43)
	at org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:30)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:792)
	at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1350)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:231)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:81)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

I am not sure what is going wrong. Have searched about this issue a lot, tried many different things but still getting the very same error. Added all the jars also in spark-configuration.
Please let me know if you have any thought about this.

@vishnu2kmohan
Copy link

@deepakjangid123 Please let us know if you continue to run into this issue with the latest Spark 2.5.0-2.2.1 release.

@anuragpan
Copy link

Spark depends on the memory API's which has been changed in JDK 9 so it is not available starting JDK 9.
You have to setup java 8 for spark. for more info https://issues.apache.org/jira/browse/SPARK-24421

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants