Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-49605][SQL] Fix the prompt when
ascendingOrder
is `DataTypeM…
…ismatch` in `SortArray` ### What changes were proposed in this pull request? The pr aims to fix the `prompt` when `ascendingOrder` is `DataTypeMismatch` in `SortArray`. ### Why are the changes needed? - Give an example with the following code: ```scala val df = Seq((Array[Int](2, 1, 3), true), (Array.empty[Int], false)).toDF("a", "b") df.selectExpr("sort_array(a, b)").collect() ``` - Before: ```scala scala> val df = Seq((Array[Int](2, 1, 3), true), (Array.empty[Int], false)).toDF("a", "b") val df: org.apache.spark.sql.DataFrame = [a: array<int>, b: boolean] scala> df.selectExpr("sort_array(a, b)").collect() org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "sort_array(a, b)" due to data type mismatch: The second parameter requires the "BOOLEAN" type, however "b" has the type "BOOLEAN". SQLSTATE: 42K09; line 1 pos 0; 'Project [unresolvedalias(sort_array(a#7, b#8))] +- Project [_1#2 AS a#7, _2#3 AS b#8] +- LocalRelation [_1#2, _2#3] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:331) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7$adapted(CheckAnalysis.scala:313) ``` <img width="1394" alt="image" src="https://github.com/user-attachments/assets/c0eea384-af29-42c1-9ee5-c65310de6070"> Obviously, this error message is `incorrect` and `confusing`. Through the following code: https://github.com/apache/spark/blob/8023504e69fdd037dea002e961b960fd9fa662ba/sql/api/src/main/scala/org/apache/spark/sql/functions.scala#L7176-L7195 we found that it actually requires `ascendingOrder` to be `foldable` and the data type to be `BooleanType`. - After: ``` scala> val df = Seq((Array[Int](2, 1, 3), true), (Array.empty[Int], false)).toDF("a", "b") val df: org.apache.spark.sql.DataFrame = [a: array<int>, b: boolean] scala> df.selectExpr("sort_array(a, b)").collect() org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.NON_FOLDABLE_INPUT] Cannot resolve "sort_array(a, b)" due to data type mismatch: the input `ascendingOrder` should be a foldable "BOOLEAN" expression; however, got "b". SQLSTATE: 42K09; line 1 pos 0; 'Project [unresolvedalias(sort_array(a#7, b#8))] +- Project [_1#2 AS a#7, _2#3 AS b#8] +- LocalRelation [_1#2, _2#3] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:331) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7$adapted(CheckAnalysis.scala:313) ``` <img width="1396" alt="image" src="https://github.com/user-attachments/assets/2c173aab-52b8-4794-8ef0-d14ae269aadc"> ### Does this PR introduce _any_ user-facing change? Yes, When the value `ascendingOrder` in `SortArray` is `DataTypeMismatch`, the prompt is more `accurate`. ### How was this patch tested? - Add new UT - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48082 from panbingkun/SPARK-49605. Authored-by: panbingkun <[email protected]> Signed-off-by: Max Gekk <[email protected]>
- Loading branch information