-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Py4JJavaError: An error occurred while calling o2951.fullAnnotateJava. #13903
Comments
Hi @Ededu1984 This is a bug that we will fix in the next release |
@maziyarpanahi Do you know when is the next release? |
@noga-eps we scheduled Spark NLP 5.0.2 release in 2-3 days. (100% by the end of this week) |
Hi, This is the code: documentAssembler = DocumentAssembler().setInputCol("text").setOutputCol("document") sentencerDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx").setInputCols(["document"]).setOutputCol("sentences") marian = MarianTransformer.pretrained("opus_mt_mul_en", "xx") sdf = spark.createDataFrame([[">>deu<< Hallo wie geht es dir Ich bin hubert aus Deutschland"], marian_pipeline = Pipeline(stages=[documentAssembler, sentencerDL, marian]) config on Databricks SparkSession - hive Version AppName Spark NLP version 5.0.2 The code works perfectly in Colab but I got this error when I try to execute on Databricks. Runtime version |
@Ededu1984 This is a typical mistake made by users in Databricks. They think changing PyPI version to 5.0.2 is enough. You must also change the Maven version to 5.0.2 as well. As you can see we no longer have that error in You just have to make sure your Maven (the actual core library) is also pointing to |
Is there an existing issue for this?
Who can help?
I'm trying to reproduce the code
https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/TRANSLATION_MARIAN.ipynb#scrollTo=EYf_9sXDXR4t
My code:
from sparknlp.annotator import SentenceDetectorDLModel, MarianTransformer
documentAssembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
sentencerDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")
.setInputCols(["document"])
.setOutputCol("sentence")
marian = MarianTransformer.pretrained("opus_mt_it_en", "xx")
.setInputCols(["sentence"])
.setOutputCol("translation")
marian_pipeline = Pipeline(stages=[documentAssembler, sentencerDL, marian])
light_pipeline = LightPipeline(marian_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")))
result = light_pipeline.fullAnnotate("""La Gioconda è un dipinto ad olio del XVI secolo creato da Leonardo. Si tiene al Louvre di Parigi.""")
The error
Py4JJavaError Traceback (most recent call last)
File :16
13 marian_pipeline = Pipeline(stages=[documentAssembler, sentencerDL, marian])
14 light_pipeline = LightPipeline(marian_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")))
---> 16 result = light_pipeline.fullAnnotate("""La Gioconda è un dipinto ad olio del XVI secolo creato da Leonardo. Si tiene al Louvre di Parigi.""")
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-8bb6e45d-4a31-48b9-8f1c-a3e9553fddde/lib/python3.9/site-packages/sparknlp/base/light_pipeline.py:201, in LightPipeline.fullAnnotate(self, target, optional_target)
199 if optional_target == "":
200 if self.__isTextInput(target):
--> 201 result = self.__fullAnnotateText(target)
202 elif self.__isAudioInput(target):
203 result = self.__fullAnnotateAudio(target)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-8bb6e45d-4a31-48b9-8f1c-a3e9553fddde/lib/python3.9/site-packages/sparknlp/base/light_pipeline.py:243, in LightPipeline.__fullAnnotateText(self, target)
240 if type(target) is str:
241 target = [target]
--> 243 for annotations_result in self._lightPipeline.fullAnnotateJava(target):
244 result.append(self.__buildStages(annotations_result))
245 return result
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in JavaMember.call(self, *args)
1315 command = proto.CALL_COMMAND_NAME +
1316 self.command_header +
1317 args_command +
1318 proto.END_COMMAND_PART
1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
1322 answer, self.gateway_client, self.target_id, self.name)
1324 for temp_arg in temp_args:
1325 temp_arg._detach()
File /databricks/spark/python/pyspark/errors/exceptions.py:228, in capture_sql_exception..deco(*a, **kw)
226 def deco(*a: Any, **kw: Any) -> Any:
227 try:
--> 228 return f(*a, **kw)
229 except Py4JJavaError as e:
230 converted = convert_exception(e.java_exception)
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
Py4JJavaError: An error occurred while calling o2951.fullAnnotateJava.
: java.lang.ClassCastException: java.util.ArrayList cannot be cast to [Ljava.lang.String;
at com.johnsnowlabs.nlp.annotators.seq2seq.MarianTransformer.batchAnnotate(MarianTransformer.scala:352)
at com.johnsnowlabs.nlp.LightPipeline.processBatchedAnnotator(LightPipeline.scala:202)
at com.johnsnowlabs.nlp.LightPipeline.processAnnotatorModel(LightPipeline.scala:184)
at com.johnsnowlabs.nlp.LightPipeline.$anonfun$fullAnnotateInternal$1(LightPipeline.scala:118)
at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
at com.johnsnowlabs.nlp.LightPipeline.fullAnnotateInternal(LightPipeline.scala:100)
at com.johnsnowlabs.nlp.LightPipeline.fullAnnotate(LightPipeline.scala:49)
at com.johnsnowlabs.nlp.LightPipeline.fullAnnotateJava(LightPipeline.scala:303)
at com.johnsnowlabs.nlp.LightPipeline.$anonfun$fullAnnotateJava$5(LightPipeline.scala:342)
at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:659)
at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
at scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:153)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
I'm using spark-nlp on the Databricks environment.
Spark NLP version 4.2.8
Apache Spark version: 3.3.2
What are you working on?
Text translation
Current Behavior
Translate the text
Expected Behavior
Translate the text
Steps To Reproduce
I don't have the link
Spark NLP version and Apache Spark
Spark NLP version 4.2.8
Apache Spark version: 3.3.2
Type of Spark Application
spark-shell
Java Version
No response
Java Home Directory
No response
Setup and installation
No response
Operating System and Version
No response
Link to your project (if available)
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: