Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fillnull command throw AMBIGUOUS_REFERENCE exception #959

Closed
qianheng-aws opened this issue Nov 29, 2024 · 3 comments · Fixed by #960
Closed

[BUG] Fillnull command throw AMBIGUOUS_REFERENCE exception #959

qianheng-aws opened this issue Nov 29, 2024 · 3 comments · Fixed by #960
Assignees
Labels
bug Something isn't working Lang:PPL Pipe Processing Language support

Comments

@qianheng-aws
Copy link
Contributor

What is the bug?
Fillnull command throw AMBIGUOUS_REFERENCE exception in the case that the datatypes of null_replacement is not the same as null_fields(although compatible, Spark will transform them to the same datatype by Analyzer).

It's actually a potential BUG for all cases but most of them has been luckily addressed by another Spark bug currently: https://issues.apache.org/jira/browse/SPARK-49782. After upgrading to a Spark version that includes this fix (potentially Spark 3.5.4 or later), the AMBIGUOUS_REFERENCE exception will be thrown more consistently across a wide range of scenarios, not limiting to the specific cases mentioned above.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create a table with a column of LONG type
create table test (id INT, longV LONG) using CSV OPTIONS (header 'false', delimiter '\t');
  1. Insert a value with null on column longV into that table
insert into test values (1, null);
  1. run a fillnull with null_replacement is 0(parsed as Literal of integer type)
source=test | fields longV | eval originalLongV = longV | fillnull with 0 in longV;

It will throw exception:

[AMBIGUOUS_REFERENCE] Reference `longV` is ambiguous, could be: [`longV`, `spark_catalog`.`default`.`test`.`longV`].

What is the expected behavior?
It should run successfully.

The root cause is that we converted the ppl into a ambiguous plan on 'DataFrameDropColumns ['longV]. Spark cannot resolve longV because there are 2 longV, one from its child Project and another from its grand-child which derives from the Table.

And the reason why most cases works well is that, there is a bug in Spark rule ResolveDataFrameDropColumns. It resolves DataFrameDropColumns's expressions by its grand-children instead of children(which is incorrect), so there is only one longV in its grand-children and doesn't have any ambiguous. While for the specific case where there is datatype mismatch, it goes into the rule typeCoercionRules first to transform datatypes into the same, and then into the rule ResolveReferences which doesn't have a similar bug.

So it's actually a case of two wrongs making a right.

plan
== Parsed Logical Plan ==
'Project [*]
+- 'DataFrameDropColumns ['longV]
   +- 'Project [*, 'coalesce('longV, 0) AS longV#1]
      +- 'Project [*, 'longV AS originalLong#0]
         +- 'Project ['longV]
            +- 'UnresolvedRelation [test4], [], false

== Analyzed Logical Plan ==
org.apache.spark.sql.AnalysisException: [AMBIGUOUS_REFERENCE] Reference `longV` is ambiguous, could be: [`longV`, `spark_catalog`.`default`.`test4`.`longV`].

What is your host/environment?

  • OS: mac
  • Version: opensearch-spark 0.7.0, spark 3.5.3
  • Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@qianheng-aws qianheng-aws added bug Something isn't working untriaged labels Nov 29, 2024
@LantaoJin
Copy link
Member

Good catching! This bug is required to fix when upgrades Spark.

@qianheng-aws
Copy link
Contributor Author

Good catching! This bug is required to fix when upgrades Spark.

In the current spark version, It can still be reproduced in some cases mentioned above. Should we prioritize the fix?

@YANG-DB YANG-DB added the Lang:PPL Pipe Processing Language support label Nov 29, 2024
@YANG-DB
Copy link
Member

YANG-DB commented Nov 29, 2024

@qianheng-aws nice catch !!
IMO lets fix this ASAP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Lang:PPL Pipe Processing Language support
Projects
Status: Done
3 participants