Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50133][PYTHON] Support df.argument() for conversion to table argument in Spark Classic #48914

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Nov 21, 2024

What changes were proposed in this pull request?

We want to implement a function df.argument(), which returns a Column object as a table arument. This Column can then be passed as input to call a User-Defined Table Function (UDTF) for example.

The PR targets at Spark Classic only.

Why are the changes needed?

To reach parity with SQL functionality, specifically, #41750.

Does this PR introduce any user-facing change?

Yes.

For example:

>>> @udtf(returnType="a: int")
... class TestUDTF:
...     def eval(self, row: Row):
...         if row[0] > 5:
...             yield row[0],
... 
>>> df.argument()
Column<'functiontablesubqueryargumentexpression()'>
>>> TestUDTF(df.argument()).show()
+---+                                                                           
|  a|
+---+
|  6|
|  7|
+---+

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

Comment on lines 1042 to 1045
self.assertEqual(
func(df.argument()).collect(),
[Row(a=6), Row(a=7)],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use assertDataFrameEqual?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Modified.

@xinrong-meng xinrong-meng changed the title [WIP][SPARK-50133][PYTHON] Support df.argument() for conversion to table argument [SPARK-50133][PYTHON] Support df.argument() for conversion to table argument in Spark Classic Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants