Separate evaluation logic from `IR` objects in cudf-polars #17175

rjzamora · 2024-10-24T21:36:34Z

Description

This PR implements the proposal in [FEA] [Proposal] Separate IR evaluation logic from the IR object in cudf-polars #17127
This change technically "breaks" with the existing IR.evaluate convention.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

We will use this to provide infrastructure for making IR nodes easier to traverse. Expr nodes already use this facility, but we want to share it.

And tests of basic functionality.

This way we will be able to write generic traversals more easily.

Now that we have a uniform child attribute, this is easier.

…factor-evaluate

This simplifies things a bit and means we don't need to type the children property everywhere else.

…into rjzamora/refactor-evaluate

…factor-evaluate

rjzamora · 2024-10-25T17:23:58Z

python/cudf_polars/cudf_polars/dsl/ir.py

+    # Hacky to avoid type-checking issues, just advertise the
+    # signature. Both mypy and pyright complain if we have an abstract
+    # method that takes arbitrary *args, but the subclasses have
+    # tighter signatures. This complaint is correct because the
+    # subclass is not Liskov-substitutable for the superclass.
+    # However, we know do_evaluate will only be called with the
+    # correct arguments by "construction".
+    do_evaluate: Callable[..., DataFrame]


Not sure if I like this any better than just using a catch-all signature. For example, mypy seems happy with:

@classmethod def do_evaluate(cls, *args: Any, **kwargs: Any): """ Evaluate the node (given its evaluated children), and return a dataframe. Parameters ---------- args Non child arguments followed by any evaluated dataframe inputs. kwargs Key-word arguments. Should be empty! Returns ------- DataFrame (on device) representing the evaluation of this plan node. Raises ------ NotImplementedError If we couldn't evaluate things. Ideally this should not occur, since the translation phase should pick up things that we cannot handle. """ raise NotImplementedError( f"Evaluation of plan {type(cls).__name__}" ) # pragma: no cover

If you remove kwargs from the signature (since there are none), it complains though

Yes, I agree it's still "off" :)

rjzamora

It looks like we need to account for _non_child_args elements that are normalized within __init__. (Left a few suggestions, but didn't look for everything yet)

python/cudf_polars/cudf_polars/dsl/ir.py

…factor-evaluate

rjzamora · 2024-10-30T01:31:48Z

@wence- - Just a note (still thinking on this): I've been experimenting with the next steps (beyond this PR). I'm realizing that it may not be terribly useful to separate the IR evaluation logic unless we also do the same for Expr classes. I say this because there seem to be many expressions that require reductions and/or shuffles.

wence- · 2024-10-30T10:12:57Z

@wence- - Just a note (still thinking on this): I've been experimenting with the next steps (beyond this PR). I'm realizing that it may not be terribly useful to separate the IR evaluation logic unless we also do the same for Expr classes. I say this because there seem to be many expressions that require reductions and/or shuffles.

Can we do that in a separate step? I would like to check the utility of this setup in the "single-partition" evaluation model first.

rjzamora · 2024-10-30T13:45:23Z

Can we do that in a separate step? I would like to check the utility of this setup in the "single-partition" evaluation model first.

Yes, sorry. I'm not suggesting that we do anything further in this PR (I feel that this is ready for a final review).

The goal of my previous comment was to establish that we may need to do do something similar to Expr to (cleanly) enable shuffling/reductions over multiple partitions. Single-partition execution is already easy to implement on top of this PR as it stands.

bdice

Minor feedback. Generally looks good to me.

python/cudf_polars/cudf_polars/dsl/ir.py

…factor-evaluate

wence- · 2024-11-05T15:56:30Z

/merge

wence- and others added 25 commits October 14, 2024 11:55

Renaming in typing for clarity

6248ec3

Extract abstract base for nodes into new file

8b5aaed

We will use this to provide infrastructure for making IR nodes easier to traverse. Expr nodes already use this facility, but we want to share it.

Use new Node base class for expressions

26b9d7d

Infrastructure for traversal and visitors

ffe460c

And tests of basic functionality.

Use abstract Node infrastructure to define IR nodes

83a60f0

This way we will be able to write generic traversals more easily.

Add tests of traversal over IR nodes

a234e37

Now that we have a uniform child attribute, this is easier.

Overview documentation for visitor pattern/utilities

73019c8

Some grammar fixes

b14b150

Reinstate docstrings for properties

a49846f

Use side-effect free rather than pure

9449b44

Merge remote-tracking branch 'upstream/branch-24.12' into HEAD

8a4197f

rseparate recursive logic between evaluate_node and evaluate

1e7d22a

updating docstring

1aa4947

Merge remote-tracking branch 'upstream/branch-24.12' into rjzamora/re…

be61b3f

…factor-evaluate

Merge remote-tracking branch 'upstream/branch-24.12' into rjzamora/re…

5d0f92a

…factor-evaluate

Merge branch 'branch-24.12' into wence/fea/polars-uniform-nodes

6525cf8

Merge branch 'branch-24.12' into wence/fea/polars-uniform-nodes

a2eb05d

Grammar

d8f770d

CTRP for type of children

cf286d2

This simplifies things a bit and means we don't need to type the children property everywhere else.

Doc fixes

bc8375c

Add more complicated rewrite test

6260ff9

Merge remote-tracking branch 'wence-/wence/fea/polars-uniform-nodes' …

9005073

…into rjzamora/refactor-evaluate

Merge remote-tracking branch 'upstream/branch-24.12' into rjzamora/re…

de91970

…factor-evaluate

add changes from polars 1.11 upgrade

0483ff1

remove schema from GroupBy._eval_arguments as demonstration

e1e6195

rjzamora added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function breaking Breaking change cudf.polars Issues specific to cudf.polars labels Oct 24, 2024

rjzamora self-assigned this Oct 24, 2024

rjzamora assigned wence- Oct 25, 2024

wence- added 2 commits October 25, 2024 17:00

Slightly simplify evaluate_node implementations

30ee2e0

Merge remote-tracking branch 'upstream/branch-24.12' into rjzamora/re…

65fe592

…factor-evaluate

wence- force-pushed the rjzamora/refactor-evaluate branch from 5efd67a to 65fe592 Compare October 25, 2024 17:08

wence- marked this pull request as ready for review October 25, 2024 17:09

wence- requested a review from a team as a code owner October 25, 2024 17:09

wence- requested review from bdice and brandon-b-miller October 25, 2024 17:09

rjzamora commented Oct 25, 2024

View reviewed changes

rjzamora added 6 commits October 25, 2024 12:39

Apply suggestions from code review

7201ef3

remove cloud_options

cf52510

Merge remote-tracking branch 'upstream/branch-24.12' into rjzamora/re…

56d5fa0

…factor-evaluate

update docs

09fa3c4

Merge branch 'branch-24.12' into rjzamora/refactor-evaluate

ea94242

Merge branch 'branch-24.12' into rjzamora/refactor-evaluate

3f4ae3f

Merge branch 'branch-24.12' into rjzamora/refactor-evaluate

069f71c

bdice approved these changes Nov 4, 2024

View reviewed changes

python/cudf_polars/cudf_polars/dsl/ir.py Outdated Show resolved Hide resolved

python/cudf_polars/cudf_polars/dsl/ir.py Outdated Show resolved Hide resolved

python/cudf_polars/cudf_polars/dsl/ir.py Outdated Show resolved Hide resolved

rjzamora added 4 commits November 4, 2024 07:59

Merge remote-tracking branch 'upstream/branch-24.12' into rjzamora/re…

765a9c5

…factor-evaluate

address code review

e03ac31

remove workaround

a336760

Merge branch 'branch-24.12' into rjzamora/refactor-evaluate

32b1a27

rjzamora added 4 - Needs Review Waiting for reviewer to review or respond and removed 2 - In Progress Currently a work in progress labels Nov 5, 2024

rapids-bot bot merged commit 9d5041c into rapidsai:branch-24.12 Nov 5, 2024
102 checks passed

rjzamora deleted the rjzamora/refactor-evaluate branch November 5, 2024 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate evaluation logic from `IR` objects in cudf-polars #17175

Separate evaluation logic from `IR` objects in cudf-polars #17175

rjzamora commented Oct 24, 2024 •

edited

Loading

rjzamora Oct 25, 2024

wence- Oct 25, 2024

rjzamora Oct 25, 2024

rjzamora left a comment

rjzamora commented Oct 30, 2024

wence- commented Oct 30, 2024

rjzamora commented Oct 30, 2024

bdice left a comment

wence- commented Nov 5, 2024

Separate evaluation logic from IR objects in cudf-polars #17175

Separate evaluation logic from IR objects in cudf-polars #17175

Conversation

rjzamora commented Oct 24, 2024 • edited Loading

Description

Checklist

rjzamora Oct 25, 2024

Choose a reason for hiding this comment

wence- Oct 25, 2024

Choose a reason for hiding this comment

rjzamora Oct 25, 2024

Choose a reason for hiding this comment

rjzamora left a comment

Choose a reason for hiding this comment

rjzamora commented Oct 30, 2024

wence- commented Oct 30, 2024

rjzamora commented Oct 30, 2024

bdice left a comment

Choose a reason for hiding this comment

wence- commented Nov 5, 2024

Separate evaluation logic from `IR` objects in cudf-polars #17175

Separate evaluation logic from `IR` objects in cudf-polars #17175

rjzamora commented Oct 24, 2024 •

edited

Loading