-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot cast to float #14162
Comments
Cloud you please provide some links:
It's not possible to help with what's provided, we do need these links or at least the end-to-end Colab notebook |
Give me a moment while I get the information you’ll need :) |
Facing the same issue. My setup is interactive spark jupyter notebook with sparkmagic via livy. Here is my setup Spark version 3.2.2.3.2.2
The complete notebook code
Giving error
|
Hi all, We have this toFloat() function here: def toFloat: Float = {
val versionString = parts.length match {
case 1 => parts.head.toString
case 2 => f"${parts.head.toString}.${parts(1).toString}"
case 3 => f"${parts.head.toString}.${parts(1).toString}${parts(2).toString}"
case _ =>
throw new UnsupportedOperationException(
f"Cannot cast to float version ${this.toString()}")
}
versionString.toFloat
} We use it in various places to extract the version of Apache Spark. In this case, here: def getEncoder(inputDataset: Dataset[_], newStructType: StructType): ExpressionEncoder[Row] = {
val sparkVersion = Version.parse(inputDataset.sparkSession.version).toFloat
if (sparkVersion >= 3.5f) {
val expressionEncoderClass =
Class.forName("org.apache.spark.sql.catalyst.encoders.ExpressionEncoder")
val applyMethod = expressionEncoderClass.getMethod("apply", classOf[StructType])
applyMethod.invoke(null, newStructType).asInstanceOf[ExpressionEncoder[Row]]
} else {
try {
// Use reflection to access RowEncoder.apply in older Spark versions
val rowEncoderClass = Class.forName("org.apache.spark.sql.catalyst.encoders.RowEncoder")
val applyMethod = rowEncoderClass.getMethod("apply", classOf[StructType])
applyMethod.invoke(null, newStructType).asInstanceOf[ExpressionEncoder[Row]]
} catch {
case _: Throwable =>
throw new UnsupportedOperationException(
"RowEncoder.apply is not supported in this Spark version.")
}
}
} As you can see, this is required because of Spark not being backward compatible. We either have to drop support for previous versions of Apache Spark, or find a way to adapt conditionally. It seems your ENV, doesn't have Apache Spark the same pattern as the other ENVs. Could you please do cc @danilojsl |
Thanks for the response.
Meanwhile checking with our Spark admin if there is some misconfiguration on our end. Will come back in a few days. |
Thanks, that must be it! We first noticed this in EMR, where they were adding some stuff to the version. This also looks similar, makes it pretty hard to parse really. (I think we can have pattern for Livy if this is the pattern of showing spark version) |
Spoke to our Spark administrator and got to know that in our on-premise deployment there are mild changes made to it to suit the on-premise deployment. The in-house Spark team maintains a version number by appending to the original Spark version. Resulting in the On a separate note, would you be open to a change (PR) to consider only the the first three elements in the If yes, then I could propose such a PR. |
Thanks for looking into this on your side. Either way would work for us, if it's a PR as long as it is still backward compatible I will include it in the next release 👍🏼 |
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 5 days |
up |
Okay so I think the code needs upddating cause the same issue is with Microsoft Fabric environment where Spark is preinstalled. |
@maziyarpanahi - would you be able to have a look at this? It will become a major issue when people will try to use Spark NLP on Microsoft Fabric. I am amazed that I am the first who has encountered this issue!
|
we are using Fabric just fine. This is not exactly spark-nlp, but the steps are exactly the same: https://nlp.johnsnowlabs.com/docs/en/licensed_install#microsoft-fabric-instructions |
I should be more specific, apologies. I managed to get spark-nlp to work and use document assembler and sentence embedding module. Is it only a problem for the classification module? Edit: I used the public spark-nlp library and used spark.jars parameter as part of the environment set up, not within the notebook. Will follow your route now, just in case there are some differences and let you now. Thanks a lot! I did a lot of seaarch on Fabric and Spar-NLP and this article didn't come up, very helpful. |
Thank you for the details. This was very helpful. Could you please open a new issue so we can track its progress? More an more users start using Fabric, it would be beneficial if it has its own issues so others can follow. |
Is there an existing issue for this?
Who can help?
@maz
What are you working on?
GTE Small EN 5.0.2 En
Current Behavior
Traceback (most recent call last):
File "/path/to/task.py", line 94, in
result = pipeline.fit(data).transform(data)
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 217, in transform
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 278, in _transform
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 217, in transform
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 350, in _transform
File "/opt/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1309, in call
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/opt/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o111.transform.
: java.lang.UnsupportedOperationException: Cannot cast to float version 3.2.0.3.2.2
at com.johnsnowlabs.util.Version.toFloat(Version.scala:36)
at com.johnsnowlabs.nlp.util.SparkNlpConfig$.getEncoder(SparkNlpConfig.scala:28)
at com.johnsnowlabs.nlp.AnnotatorModel._transform(AnnotatorModel.scala:69)
at com.johnsnowlabs.nlp.AnnotatorModel.transform(AnnotatorModel.scala:130)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
Expected Behavior
Process should run
Steps To Reproduce
document = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = Tokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")
embeddings = BertEmbeddings.load("\path\to\local\model\gte_small_en", spark)\
.setInputCols(["document", "token"])\
.setOutputCol("embeddings")
Spark NLP version and Apache Spark
Spark 3.2.0
Spark NLP 5.2.3
Type of Spark Application
spark-submit
Java Version
No response
Java Home Directory
No response
Setup and installation
No response
Operating System and Version
No response
Link to your project (if available)
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: