Cannot cast to float #14162

ottermegazord · 2024-02-06T05:11:54Z

Is there an existing issue for this?

I have searched the existing issues and did not find a match.

Who can help?

What are you working on?

GTE Small EN 5.0.2 En

Current Behavior

Traceback (most recent call last):
File "/path/to/task.py", line 94, in
result = pipeline.fit(data).transform(data)
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 217, in transform
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 278, in _transform
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 217, in transform
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py", line 350, in _transform
File "/opt/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1309, in call
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/opt/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o111.transform.
: java.lang.UnsupportedOperationException: Cannot cast to float version 3.2.0.3.2.2
at com.johnsnowlabs.util.Version.toFloat(Version.scala:36)
at com.johnsnowlabs.nlp.util.SparkNlpConfig$.getEncoder(SparkNlpConfig.scala:28)
at com.johnsnowlabs.nlp.AnnotatorModel._transform(AnnotatorModel.scala:69)
at com.johnsnowlabs.nlp.AnnotatorModel.transform(AnnotatorModel.scala:130)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)

Expected Behavior

Process should run

Steps To Reproduce

document = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

tokenizer = Tokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")

embeddings = BertEmbeddings.load("\path\to\local\model\gte_small_en", spark)\
.setInputCols(["document", "token"])\
.setOutputCol("embeddings")

Spark NLP version and Apache Spark

Spark 3.2.0
Spark NLP 5.2.3

Type of Spark Application

spark-submit

Java Version

No response

Java Home Directory

No response

Setup and installation

No response

Operating System and Version

No response

Link to your project (if available)

No response

Additional Information

No response

maziyarpanahi · 2024-02-06T09:14:16Z

Cloud you please provide some links:

link to the model you are trying to import
link to the notebook you used to export and save the model in Spark NLP
a Colab notebook to show how you install and start Spark NLP session and reproduce the error

It's not possible to help with what's provided, we do need these links or at least the end-to-end Colab notebook

ottermegazord · 2024-02-13T05:46:36Z

Give me a moment while I get the information you’ll need :)

vdksoda · 2024-04-20T14:29:14Z

Facing the same issue. My setup is interactive spark jupyter notebook with sparkmagic via livy.

Here is my setup

Spark version 3.2.2.3.2.2

{
    "driverMemory": "32G",
    "executorMemory": "16G",
    "numExecutors": 20,
    "executorCores": 5,
    "jars": ["/path/to/jars/spark-nlp-assembly-5.3.3.jar"],
    "archives": [
        "/path/to/conda-packs/conda-pack-spark-nlp-mpnetqa.tar.gz#env"
    ],
    "conf": {
        "spark.pyspark.python": "env/bin/python"
    },
    "proxyUser": "some.user"
}

The complete notebook code

%%spark
from pyspark import SparkContext,SparkConf
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from pyspark.sql.types import *
from pyspark.sql import Row
import re
import sys
from datetime import datetime,timedelta

import sparknlp
print(sparknlp.version()). # prints 5.3.3

import pandas as pd
# from sparknlp.pretrained import PretrainedPipeline
# from sparknlp.annotator import DocumentAssembler, BertSentenceEmbeddings, SentenceDetector, MPNetForQuestionAnswering
# from sparknlp.base import EmbeddingsFinisher
# from pyspark.ml import Pipeline
# from sparknlp.base import LightPipeline, TokenAssembler

from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline

documentAssembler = MultiDocumentAssembler() \
    .setInputCols(["question", "context"]) \
    .setOutputCols(["document_question", "document_context"])

spanClassifier = MPNetForQuestionAnswering.load('/path/to/hdfs/pretrained-models/mpnet_base_question_answering_squad2_en_5.2.4_3.0_1705756189243') \
    .setInputCols(["document_question", "document_context"]) \
    .setOutputCol("answer") \
    .setCaseSensitive(False)

pipeline = Pipeline().setStages([
    documentAssembler,
    spanClassifier
])

data = spark.createDataFrame([["What's my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context")
result = pipeline.fit(data).transform(data)
result.select("answer.result").show(truncate=False)

# # Fit the model to an empty data frame so it can be used on inputs.
# empty_df = spark.createDataFrame([['','']]).toDF("question", "context")
# pipeline_model = pipeline.fit(empty_df)
# light_pipeline = LightPipeline(pipeline_model)


# embed_df = light_pipeline.transform(spark.createDataFrame([["What's my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context"))
# # embed_df.createOrReplaceTempView("embed_df")

Giving error

An error was encountered:
An error occurred while calling o70.transform.
: java.lang.UnsupportedOperationException: Cannot cast to float version 3.2.2.3.2.2
	at com.johnsnowlabs.util.Version.toFloat(Version.scala:36)
	at com.johnsnowlabs.nlp.util.SparkNlpConfig$.getEncoder(SparkNlpConfig.scala:28)
	at com.johnsnowlabs.nlp.AnnotatorModel._transform(AnnotatorModel.scala:69)
	at com.johnsnowlabs.nlp.AnnotatorModel.transform(AnnotatorModel.scala:130)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)

maziyarpanahi · 2024-04-22T08:41:38Z

Hi all,

We have this toFloat() function here:

def toFloat: Float = {
    val versionString = parts.length match {
      case 1 => parts.head.toString
      case 2 => f"${parts.head.toString}.${parts(1).toString}"
      case 3 => f"${parts.head.toString}.${parts(1).toString}${parts(2).toString}"
      case _ =>
        throw new UnsupportedOperationException(
          f"Cannot cast to float version ${this.toString()}")
    }
    versionString.toFloat
  }

We use it in various places to extract the version of Apache Spark. In this case, here:

def getEncoder(inputDataset: Dataset[_], newStructType: StructType): ExpressionEncoder[Row] = {
    val sparkVersion = Version.parse(inputDataset.sparkSession.version).toFloat
    if (sparkVersion >= 3.5f) {
      val expressionEncoderClass =
        Class.forName("org.apache.spark.sql.catalyst.encoders.ExpressionEncoder")
      val applyMethod = expressionEncoderClass.getMethod("apply", classOf[StructType])
      applyMethod.invoke(null, newStructType).asInstanceOf[ExpressionEncoder[Row]]
    } else {
      try {
        // Use reflection to access RowEncoder.apply in older Spark versions
        val rowEncoderClass = Class.forName("org.apache.spark.sql.catalyst.encoders.RowEncoder")
        val applyMethod = rowEncoderClass.getMethod("apply", classOf[StructType])
        applyMethod.invoke(null, newStructType).asInstanceOf[ExpressionEncoder[Row]]
      } catch {
        case _: Throwable =>
          throw new UnsupportedOperationException(
            "RowEncoder.apply is not supported in this Spark version.")
      }
    }
  }

As you can see, this is required because of Spark not being backward compatible. We either have to drop support for previous versions of Apache Spark, or find a way to adapt conditionally.

It seems your ENV, doesn't have Apache Spark the same pattern as the other ENVs. Could you please do spark.version and show me the result?

cc @danilojsl

vdksoda · 2024-04-24T14:33:24Z

Thanks for the response.

>>> spark.version
'3.2.2.3.2.2.0-1'

Meanwhile checking with our Spark admin if there is some misconfiguration on our end. Will come back in a few days.

maziyarpanahi · 2024-04-24T16:38:33Z

Thanks, that must be it! We first noticed this in EMR, where they were adding some stuff to the version. This also looks similar, makes it pretty hard to parse really. (I think we can have pattern for Livy if this is the pattern of showing spark version)

vdksoda · 2024-04-25T12:47:50Z

Spoke to our Spark administrator and got to know that in our on-premise deployment there are mild changes made to it to suit the on-premise deployment. The in-house Spark team maintains a version number by appending to the original Spark version. Resulting in the spark.version showcased above. They are checking if there is a way to override this version number on a per Spark job / session basis.

On a separate note, would you be open to a change (PR) to consider only the the first three elements in the parts sequence vs attempting to transform the entire parts. The original author of this post also raised the same issue along with the EMR observation in the previous post; so I guess we're not the only ones impacted by the present version checking.

If yes, then I could propose such a PR.

maziyarpanahi · 2024-04-26T13:09:40Z

Thanks for looking into this on your side. Either way would work for us, if it's a PR as long as it is still backward compatible I will include it in the next release 👍🏼

github-actions · 2024-10-24T00:24:43Z

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 5 days

kanishkamaheshwari · 2024-10-25T14:36:41Z

up

annaf-data · 2024-11-27T17:09:26Z

Okay so I think the code needs upddating cause the same issue is with Microsoft Fabric environment where Spark is preinstalled.
The version created by Microsoft is for example 3.4.3.5.3.20241016.1
Which causes the same issue...

annaf-data · 2024-11-27T17:35:47Z

@maziyarpanahi - would you be able to have a look at this? It will become a major issue when people will try to use Spark NLP on Microsoft Fabric. I am amazed that I am the first who has encountered this issue!
Would this suffice?

def toFloat: Float = {
  val versionString = parts.length match {
    case 1 => parts.head.toString
    case 2 => s"${parts.head}.${parts(1)}"
    case 3 => s"${parts.head}.${parts(1)}${parts(2)}"
    case _ => s"${parts.head}.${parts(1)}"
  }
  
  try {
    versionString.toFloat
  } catch {
    case e: NumberFormatException =>
      throw new UnsupportedOperationException(s"Cannot cast to float version $versionString", e)
  }
}

maziyarpanahi · 2024-11-29T09:38:26Z

we are using Fabric just fine. This is not exactly spark-nlp, but the steps are exactly the same:

https://nlp.johnsnowlabs.com/docs/en/licensed_install#microsoft-fabric-instructions

annaf-data · 2024-11-29T11:57:07Z

I should be more specific, apologies. I managed to get spark-nlp to work and use document assembler and sentence embedding module.
The Classification module is not working and throwing the "cannot cast to float error". I used both general ClassifierDLModel() or the existing pretrained models - all fail with the same exception "UnsupportedOperationException: Cannot cast to float version 3.4.3.5.3.20241016.1".

Is it only a problem for the classification module?

Edit: I used the public spark-nlp library and used spark.jars parameter as part of the environment set up, not within the notebook. Will follow your route now, just in case there are some differences and let you now. Thanks a lot! I did a lot of seaarch on Fabric and Spar-NLP and this article didn't come up, very helpful.

maziyarpanahi · 2024-11-29T12:33:33Z

I should be more specific, apologies. I managed to get spark-nlp to work and use document assembler and sentence embedding module. The Classification module is not working and throwing the "cannot cast to float error". I used both general ClassifierDLModel() or the existing pretrained models - all fail with the same exception "UnsupportedOperationException: Cannot cast to float version 3.4.3.5.3.20241016.1".

Is it only a problem for the classification module?

Edit: I used the public spark-nlp library and used spark.jars parameter as part of the environment set up, not within the notebook. Will follow your route now, just in case there are some differences and let you now. Thanks a lot! I did a lot of seaarch on Fabric and Spar-NLP and this article didn't come up, very helpful.

Thank you for the details. This was very helpful. Could you please open a new issue so we can track its progress? More an more users start using Fabric, it would be beneficial if it has its own issues so others can follow.

ottermegazord added the question label Feb 6, 2024

ottermegazord assigned maziyarpanahi Feb 6, 2024

maziyarpanahi added the Requires more input label Feb 6, 2024

maziyarpanahi added bug and removed question Requires more input labels Apr 22, 2024

maziyarpanahi assigned danilojsl Apr 22, 2024

maziyarpanahi added enhancement and removed bug labels Apr 26, 2024

github-actions bot added the Stale label Oct 24, 2024

github-actions bot removed the Stale label Oct 26, 2024

danilojsl mentioned this issue Nov 29, 2024

[SPARKNLP-1096] Adding support to Microsoft Fabric for WordEmbeddings #14467

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot cast to float #14162

Cannot cast to float #14162

ottermegazord commented Feb 6, 2024

maziyarpanahi commented Feb 6, 2024

ottermegazord commented Feb 13, 2024

vdksoda commented Apr 20, 2024

maziyarpanahi commented Apr 22, 2024

vdksoda commented Apr 24, 2024

maziyarpanahi commented Apr 24, 2024

vdksoda commented Apr 25, 2024

maziyarpanahi commented Apr 26, 2024

github-actions bot commented Oct 24, 2024

kanishkamaheshwari commented Oct 25, 2024

annaf-data commented Nov 27, 2024 •

edited

Loading

annaf-data commented Nov 27, 2024 •

edited

Loading

maziyarpanahi commented Nov 29, 2024

annaf-data commented Nov 29, 2024 •

edited

Loading

maziyarpanahi commented Nov 29, 2024

Cannot cast to float #14162

Cannot cast to float #14162

Comments

ottermegazord commented Feb 6, 2024

Is there an existing issue for this?

Who can help?

What are you working on?

Current Behavior

Expected Behavior

Steps To Reproduce

Spark NLP version and Apache Spark

Type of Spark Application

Java Version

Java Home Directory

Setup and installation

Operating System and Version

Link to your project (if available)

Additional Information

maziyarpanahi commented Feb 6, 2024

ottermegazord commented Feb 13, 2024

vdksoda commented Apr 20, 2024

maziyarpanahi commented Apr 22, 2024

vdksoda commented Apr 24, 2024

maziyarpanahi commented Apr 24, 2024

vdksoda commented Apr 25, 2024

maziyarpanahi commented Apr 26, 2024

github-actions bot commented Oct 24, 2024

kanishkamaheshwari commented Oct 25, 2024

annaf-data commented Nov 27, 2024 • edited Loading

annaf-data commented Nov 27, 2024 • edited Loading

maziyarpanahi commented Nov 29, 2024

annaf-data commented Nov 29, 2024 • edited Loading

maziyarpanahi commented Nov 29, 2024

annaf-data commented Nov 27, 2024 •

edited

Loading

annaf-data commented Nov 27, 2024 •

edited

Loading

annaf-data commented Nov 29, 2024 •

edited

Loading