Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.NoSuchMethodError: breeze.storage.Zero$.FloatZero()Lbreeze/storage/Zero; #14376

Open
1 task done
SidWeng opened this issue Aug 22, 2024 · 4 comments
Open
1 task done
Assignees
Labels

Comments

@SidWeng
Copy link

SidWeng commented Aug 22, 2024

Is there an existing issue for this?

  • I have searched the existing issues and did not find a match.

Who can help?

No response

What are you working on?

train a classifier with MPNetEmbeddings

Current Behavior

throw following exception during pipeline.fit()

24/08/22 13:02:34.982 [Executor task launch worker for task 0.0 in stage 2.0 (TID 9)] ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 2.0 (TID 9)
java.lang.NoSuchMethodError: breeze.storage.Zero$.FloatZero()Lbreeze/storage/Zero;
	at com.johnsnowlabs.ml.util.LinAlg$.$anonfun$avgPooling$1(LinAlg.scala:112)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
	at com.johnsnowlabs.ml.util.LinAlg$.avgPooling(LinAlg.scala:112)
	at com.johnsnowlabs.ml.ai.MPNet.getSentenceEmbeddingFromOnnx(MPNet.scala:192)
	at com.johnsnowlabs.ml.ai.MPNet.getSentenceEmbedding(MPNet.scala:74)
	at com.johnsnowlabs.ml.ai.MPNet.$anonfun$predict$1(MPNet.scala:237)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
	at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)
	at com.johnsnowlabs.ml.ai.MPNet.predict(MPNet.scala:231)
	at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings.batchAnnotate(MPNetEmbeddings.scala:317)
	at com.johnsnowlabs.nlp.HasBatchedAnnotate.processBatchRows(HasBatchedAnnotate.scala:65)
	at com.johnsnowlabs.nlp.HasBatchedAnnotate.$anonfun$batchProcess$1(HasBatchedAnnotate.scala:53)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Expected Behavior

should not have such exception

Steps To Reproduce

val documentAssembler = new DocumentAssembler()
  .setInputCol("ref")
  .setOutputCol("document")

val sentenceEmbeddings = MPNetEmbeddings.pretrained("all_mpnet_base_v2", "en").setInputCols(Array("document")).setOutputCol("embeddings")

val docClassifier = new ClassifierDLApproach()
  .setInputCols("embeddings")
  .setOutputCol("category")
  .setLabelColumn("label")
  .setBatchSize(8)
  .setMaxEpochs(1)
  .setLr(5e-3f)
  .setDropout(0.5f)
  .setRandomSeed(44)

val pipeline = new Pipeline()
  .setStages(Array(documentAssembler, sentenceEmbeddings, docClassifier))

val pipelineModel = pipeline.fit(data)

Spark NLP version and Apache Spark

Spark NLP: 5.4.1
Apache Spark: 3.3.0

Type of Spark Application

No response

Java Version

No response

Java Home Directory

No response

Setup and installation

No response

Operating System and Version

Ubuntu 20.04

Link to your project (if available)

No response

Additional Information

No response

@SidWeng
Copy link
Author

SidWeng commented Aug 23, 2024

Turn out to be dependency conflict of breeze library.
After remove the old version breeze library, another exception happens:

05:38:00.344 [main] ERROR org.apache.spark.broadcast.TorrentBroadcast - Store broadcast broadcast_0 fail, remove all pieces of the broadcast
java.lang.NoClassDefFoundError: breeze/storage/Zero$DoubleZero$
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:348)
  at org.apache.spark.util.Utils$.classForName(Utils.scala:218)
  at org.apache.spark.serializer.KryoSerializer$.$anonfun$loadableSparkClasses$1(KryoSerializer.scala:537)
  at scala.collection.immutable.List.flatMap(List.scala:366)
  at org.apache.spark.serializer.KryoSerializer$.loadableSparkClasses$lzycompute(KryoSerializer.scala:535)
  at org.apache.spark.serializer.KryoSerializer$.org$apache$spark$serializer$KryoSerializer$$loadableSparkClasses(KryoSerializer.scala:502)
  at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:226)
  at org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)
  at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
  at org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)
  at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346)
  at org.apache.spark.serializer.KryoSerializationStream.<init>(KryoSerializer.scala:266)
  at org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:432)
  at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:319)
  at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:140)
  at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:95)
  at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
  at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:75)
  at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1529)
  at org.apache.spark.SparkContext.$anonfun$hadoopFile$1(SparkContext.scala:1145)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.SparkContext.withScope(SparkContext.scala:806)
  at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1137)
  at org.apache.spark.SparkContext.$anonfun$textFile$1(SparkContext.scala:940)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.SparkContext.withScope(SparkContext.scala:806)
  at org.apache.spark.SparkContext.textFile(SparkContext.scala:937)
  at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:587)
  at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:465)
  at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:31)
  at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
  at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:515)
  at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:507)
  at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:44)
  at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:41)
  at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.com$johnsnowlabs$nlp$embeddings$ReadablePretrainedMPNetModel$$super$pretrained(MPNetEmbeddings.scala:474)
  at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedMPNetModel.pretrained(MPNetEmbeddings.scala:401)
  at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedMPNetModel.pretrained$(MPNetEmbeddings.scala:400)
  at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.pretrained(MPNetEmbeddings.scala:474)
  at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.pretrained(MPNetEmbeddings.scala:474)
  at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:47)
  at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:47)
  at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.com$johnsnowlabs$nlp$embeddings$ReadablePretrainedMPNetModel$$super$pretrained(MPNetEmbeddings.scala:474)
  at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedMPNetModel.pretrained(MPNetEmbeddings.scala:398)
  at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedMPNetModel.pretrained$(MPNetEmbeddings.scala:397)
  at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.pretrained(MPNetEmbeddings.scala:474)
  ... 79 elided
Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero$DoubleZero$
  at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  ... 128 more

I guess it's related to Kryo since I set KryoSerializer as default serializer. And it works good after I unset KryoSerializer

@maziyarpanahi
Copy link
Member

Please share information about where you are, which spark is this, what's the environment, and how you are installing and starting SparkSession with Spark NLP.

@SidWeng
Copy link
Author

SidWeng commented Aug 27, 2024

OS: Ubuntu 20.04
Spark: 3.32.0
Java: 1.8.0_412
Installation: put spark-nlp-assembly-5.4.1.jar under SPARK_HOME/jars
start SparkSession: SPARK_HOME/bin/spark-shell --master spark://master-ip:7077

@maziyarpanahi
Copy link
Member

Please use --jars PATH/spark-nlp-assembly-5.4.1.jar explicitly in your spark-shell command and try again. It seems there is a mismatch between Spark NLP and Apache Spark versions.

If you can quickly do this in your Ubuntu terminal would be a great way to test everything:

conda create -n sparknlp python=3.8 -y
conda activate sparknlp
pip install spark-nlp==5.4.2 pyspark==3.3.1

Then in the same terminal use Python console

$ python
import sparknlp
spark = sparknlp.start()

# rest of your code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants