[BUG] Fillnull command throw AMBIGUOUS_REFERENCE exception #959

qianheng-aws · 2024-11-29T06:53:26Z

What is the bug?
Fillnull command throw AMBIGUOUS_REFERENCE exception in the case that the datatypes of null_replacement is not the same as null_fields(although compatible, Spark will transform them to the same datatype by Analyzer).

It's actually a potential BUG for all cases but most of them has been luckily addressed by another Spark bug currently: https://issues.apache.org/jira/browse/SPARK-49782. After upgrading to a Spark version that includes this fix (potentially Spark 3.5.4 or later), the AMBIGUOUS_REFERENCE exception will be thrown more consistently across a wide range of scenarios, not limiting to the specific cases mentioned above.

How can one reproduce the bug?
Steps to reproduce the behavior:

Create a table with a column of LONG type

create table test (id INT, longV LONG) using CSV OPTIONS (header 'false', delimiter '\t');

Insert a value with null on column longV into that table

insert into test values (1, null);

run a fillnull with null_replacement is 0(parsed as Literal of integer type)

source=test | fields longV | eval originalLongV = longV | fillnull with 0 in longV;

It will throw exception:

[AMBIGUOUS_REFERENCE] Reference `longV` is ambiguous, could be: [`longV`, `spark_catalog`.`default`.`test`.`longV`].

What is the expected behavior?
It should run successfully.

The root cause is that we converted the ppl into a ambiguous plan on 'DataFrameDropColumns ['longV]. Spark cannot resolve longV because there are 2 longV, one from its child Project and another from its grand-child which derives from the Table.

And the reason why most cases works well is that, there is a bug in Spark rule ResolveDataFrameDropColumns. It resolves DataFrameDropColumns's expressions by its grand-children instead of children(which is incorrect), so there is only one longV in its grand-children and doesn't have any ambiguous. While for the specific case where there is datatype mismatch, it goes into the rule typeCoercionRules first to transform datatypes into the same, and then into the rule ResolveReferences which doesn't have a similar bug.

So it's actually a case of two wrongs making a right.

plan
== Parsed Logical Plan ==
'Project [*]
+- 'DataFrameDropColumns ['longV]
   +- 'Project [*, 'coalesce('longV, 0) AS longV#1]
      +- 'Project [*, 'longV AS originalLong#0]
         +- 'Project ['longV]
            +- 'UnresolvedRelation [test4], [], false

== Analyzed Logical Plan ==
org.apache.spark.sql.AnalysisException: [AMBIGUOUS_REFERENCE] Reference `longV` is ambiguous, could be: [`longV`, `spark_catalog`.`default`.`test4`.`longV`].

What is your host/environment?

OS: mac
Version: opensearch-spark 0.7.0, spark 3.5.3
Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

The text was updated successfully, but these errors were encountered:

LantaoJin · 2024-11-29T07:10:37Z

Good catching! This bug is required to fix when upgrades Spark.

qianheng-aws · 2024-11-29T07:15:57Z

Good catching! This bug is required to fix when upgrades Spark.

In the current spark version, It can still be reproduced in some cases mentioned above. Should we prioritize the fix?

YANG-DB · 2024-11-29T16:27:09Z

@qianheng-aws nice catch !!
IMO lets fix this ASAP

qianheng-aws added bug Something isn't working untriaged labels Nov 29, 2024

LantaoJin removed the untriaged label Nov 29, 2024

YANG-DB added the Lang:PPL Pipe Processing Language support label Nov 29, 2024

YANG-DB added this to PPL Commands Nov 29, 2024

YANG-DB moved this to BUG in PPL Commands Nov 29, 2024

qianheng-aws mentioned this issue Dec 2, 2024

[BugFix] Fix ambiguous exception when type mismatch in fillnull command #960

Merged

5 tasks

LantaoJin closed this as completed in #960 Dec 3, 2024

github-project-automation bot moved this from BUG to Done in PPL Commands Dec 3, 2024

LantaoJin assigned qianheng-aws Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Fillnull command throw AMBIGUOUS_REFERENCE exception #959

[BUG] Fillnull command throw AMBIGUOUS_REFERENCE exception #959

qianheng-aws commented Nov 29, 2024

LantaoJin commented Nov 29, 2024

qianheng-aws commented Nov 29, 2024

YANG-DB commented Nov 29, 2024

[BUG] Fillnull command throw AMBIGUOUS_REFERENCE exception #959

[BUG] Fillnull command throw AMBIGUOUS_REFERENCE exception #959

Comments

qianheng-aws commented Nov 29, 2024

LantaoJin commented Nov 29, 2024

qianheng-aws commented Nov 29, 2024

YANG-DB commented Nov 29, 2024