[BUG]udfCompiler produced a wrong analyzed logical plan in a UDF case on Spark 3.4.0+ #10381

GaryShen2008 · 2024-02-06T10:21:19Z

Describe the bug
When enabling spark.rapids.sql.udfCompiler.enabled=true on Spark 3.4.0+, one UT case failed with different result.

Steps/Code to reproduce bug
Start a Spark 3.4.0+ spark-shell with plugin 23.12.1+ and enable udfCompiler.
spark-shell --conf spark.plugins=com.nvidia.spark.SQLPlugin --conf spark.rapids.sql.udfCompiler.enabled=true

Paste below code:

import org.apache.spark.sql.{Dataset, Row, SparkSession}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions.{udf => makeUdf}

val myudf: (String, String) => String = (a,b) => {
   if (null==a) {
      a
   } else {
      b
   }
}
val u = makeUdf(myudf)
val dataset = List(("","z")).toDF("x","y")
val result = dataset.withColumn("new", u(col("x"),col("y")))
val ref = dataset.withColumn("new", lit("z"))

result.show()
ref.show()

result.explain(true)

Expected behavior
The test should pass.

Environment details (please complete the following information)

Environment location: spark local
Spark configuration settings related to the issue: spark.rapids.sql.udfCompiler.enabled=true

Additional context
The issue doesn't happen on Spark 3.3.2 but observed from Spark 3.4.0.
The logical plan became wrong in Spark 3.4.0.

== Parsed Logical Plan ==
'Project [x#10, y#11, UDF('x, 'y) AS new#14]
+- Project [_1#5 AS x#10, _2#6 AS y#11]
   +- LocalRelation [_1#5, _2#6]

== Analyzed Logical Plan ==
x: string, y: string, new: string
Project [x#10, y#11, x#10 AS new#14]
+- Project [_1#5 AS x#10, _2#6 AS y#11]
   +- LocalRelation [_1#5, _2#6]

== Optimized Logical Plan ==
LocalRelation [x#10, y#11, new#14]

== Physical Plan ==
LocalTableScan [x#10, y#11, new#14]

The output on Spark 3.3.2.

== Parsed Logical Plan ==
'Project [x#10, y#11, UDF('x, 'y) AS new#14]
+- Project [_1#5 AS x#10, _2#6 AS y#11]
   +- LocalRelation [_1#5, _2#6]

== Analyzed Logical Plan ==
x: string, y: string, new: string
Project [x#10, y#11, if (NOT isnotnull(x#10)) x#10 else y#11 AS new#14]
+- Project [_1#5 AS x#10, _2#6 AS y#11]
   +- LocalRelation [_1#5, _2#6]

== Optimized Logical Plan ==
LocalRelation [x#10, y#11, new#14]

== Physical Plan ==
LocalTableScan [x#10, y#11, new#14]

The text was updated successfully, but these errors were encountered:

abellina · 2024-11-01T14:20:21Z

This issue can cause silent data corruption according to the info pasted above. If we can only detect it after a query has a new explain with a different plan, that the user didn't intend, that's really bad. We should discourage the use of the UDF compiler, at least for Spark 3.4.0+

zpuller · 2024-11-08T18:54:56Z

I'm unable to reproduce this. I'm using the latest version of the plugin (24.12), and tried on spark 3.3, 3.4, and 3.5, and always get the following plan:

== Parsed Logical Plan ==
'Project [x#10, y#11, UDF('x, 'y) AS new#14]
+- Project [_1#5 AS x#10, _2#6 AS y#11]
   +- LocalRelation [_1#5, _2#6]

   == Analyzed Logical Plan ==
   x: string, y: string, new: string
   Project [x#10, y#11, UDF(x#10, y#11) AS new#14]
   +- Project [_1#5 AS x#10, _2#6 AS y#11]
      +- LocalRelation [_1#5, _2#6]

      == Optimized Logical Plan ==
      LocalRelation [x#10, y#11, new#14]

      == Physical Plan ==
      LocalTableScan [x#10, y#11, new#14]

I verified this config as well

scala> spark.conf.get("spark.rapids.sql.udfCompiler.enabled")
res4: String = true

GaryShen2008 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Feb 6, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 6, 2024

mattahrens added the ? - Needs Triage Need team to review and classify label Nov 1, 2024

mattahrens assigned abellina and zpuller Nov 5, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]udfCompiler produced a wrong analyzed logical plan in a UDF case on Spark 3.4.0+ #10381

[BUG]udfCompiler produced a wrong analyzed logical plan in a UDF case on Spark 3.4.0+ #10381

GaryShen2008 commented Feb 6, 2024

abellina commented Nov 1, 2024

zpuller commented Nov 8, 2024 •

edited

Loading

[BUG]udfCompiler produced a wrong analyzed logical plan in a UDF case on Spark 3.4.0+ #10381

[BUG]udfCompiler produced a wrong analyzed logical plan in a UDF case on Spark 3.4.0+ #10381

Comments

GaryShen2008 commented Feb 6, 2024

abellina commented Nov 1, 2024

zpuller commented Nov 8, 2024 • edited Loading

zpuller commented Nov 8, 2024 •

edited

Loading