You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When enabling spark.rapids.sql.udfCompiler.enabled=true on Spark 3.4.0+, one UT case failed with different result.
Steps/Code to reproduce bug
Start a Spark 3.4.0+ spark-shell with plugin 23.12.1+ and enable udfCompiler. spark-shell --conf spark.plugins=com.nvidia.spark.SQLPlugin --conf spark.rapids.sql.udfCompiler.enabled=true
Paste below code:
import org.apache.spark.sql.{Dataset, Row, SparkSession}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions.{udf => makeUdf}
val myudf: (String, String) => String = (a,b) => {
if (null==a) {
a
} else {
b
}
}
val u = makeUdf(myudf)
val dataset = List(("","z")).toDF("x","y")
val result = dataset.withColumn("new", u(col("x"),col("y")))
val ref = dataset.withColumn("new", lit("z"))
result.show()
ref.show()
result.explain(true)
Expected behavior
The test should pass.
Environment details (please complete the following information)
Environment location: spark local
Spark configuration settings related to the issue: spark.rapids.sql.udfCompiler.enabled=true
Additional context
The issue doesn't happen on Spark 3.3.2 but observed from Spark 3.4.0.
The logical plan became wrong in Spark 3.4.0.
== Parsed Logical Plan ==
'Project [x#10, y#11, UDF('x, 'y) AS new#14]
+- Project [_1#5 AS x#10, _2#6 AS y#11]
+- LocalRelation [_1#5, _2#6]
== Analyzed Logical Plan ==
x: string, y: string, new: string
Project [x#10, y#11, x#10 AS new#14]
+- Project [_1#5 AS x#10, _2#6 AS y#11]
+- LocalRelation [_1#5, _2#6]
== Optimized Logical Plan ==
LocalRelation [x#10, y#11, new#14]
== Physical Plan ==
LocalTableScan [x#10, y#11, new#14]
The output on Spark 3.3.2.
== Parsed Logical Plan ==
'Project [x#10, y#11, UDF('x, 'y) AS new#14]
+- Project [_1#5 AS x#10, _2#6 AS y#11]
+- LocalRelation [_1#5, _2#6]
== Analyzed Logical Plan ==
x: string, y: string, new: string
Project [x#10, y#11, if (NOT isnotnull(x#10)) x#10 else y#11 AS new#14]
+- Project [_1#5 AS x#10, _2#6 AS y#11]
+- LocalRelation [_1#5, _2#6]
== Optimized Logical Plan ==
LocalRelation [x#10, y#11, new#14]
== Physical Plan ==
LocalTableScan [x#10, y#11, new#14]
The text was updated successfully, but these errors were encountered:
This issue can cause silent data corruption according to the info pasted above. If we can only detect it after a query has a new explain with a different plan, that the user didn't intend, that's really bad. We should discourage the use of the UDF compiler, at least for Spark 3.4.0+
I'm unable to reproduce this. I'm using the latest version of the plugin (24.12), and tried on spark 3.3, 3.4, and 3.5, and always get the following plan:
== Parsed Logical Plan ==
'Project [x#10, y#11, UDF('x, 'y) AS new#14]
+- Project [_1#5 AS x#10, _2#6 AS y#11]
+- LocalRelation [_1#5, _2#6]
== Analyzed Logical Plan ==
x: string, y: string, new: string
Project [x#10, y#11, UDF(x#10, y#11) AS new#14]
+- Project [_1#5 AS x#10, _2#6 AS y#11]
+- LocalRelation [_1#5, _2#6]
== Optimized Logical Plan ==
LocalRelation [x#10, y#11, new#14]
== Physical Plan ==
LocalTableScan [x#10, y#11, new#14]
Describe the bug
When enabling spark.rapids.sql.udfCompiler.enabled=true on Spark 3.4.0+, one UT case failed with different result.
Steps/Code to reproduce bug
Start a Spark 3.4.0+ spark-shell with plugin 23.12.1+ and enable udfCompiler.
spark-shell --conf spark.plugins=com.nvidia.spark.SQLPlugin --conf spark.rapids.sql.udfCompiler.enabled=true
Paste below code:
Expected behavior
The test should pass.
Environment details (please complete the following information)
Additional context
The issue doesn't happen on Spark 3.3.2 but observed from Spark 3.4.0.
The logical plan became wrong in Spark 3.4.0.
The output on Spark 3.3.2.
The text was updated successfully, but these errors were encountered: