Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate evaluation logic from IR objects in cudf-polars #17175

Merged
merged 38 commits into from
Nov 5, 2024

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented Oct 24, 2024

Description

Closes #17127

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

wence- and others added 25 commits October 14, 2024 11:55
We will use this to provide infrastructure for making IR nodes easier
to traverse. Expr nodes already use this facility, but we want to
share it.
And tests of basic functionality.
This way we will be able to write generic traversals more easily.
Now that we have a uniform child attribute, this is easier.
This simplifies things a bit and means we don't need to type the
children property everywhere else.
@rjzamora rjzamora added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function breaking Breaking change cudf.polars Issues specific to cudf.polars labels Oct 24, 2024
@rjzamora rjzamora self-assigned this Oct 24, 2024
@wence- wence- marked this pull request as ready for review October 25, 2024 17:09
@wence- wence- requested a review from a team as a code owner October 25, 2024 17:09
Comment on lines +150 to +157
# Hacky to avoid type-checking issues, just advertise the
# signature. Both mypy and pyright complain if we have an abstract
# method that takes arbitrary *args, but the subclasses have
# tighter signatures. This complaint is correct because the
# subclass is not Liskov-substitutable for the superclass.
# However, we know do_evaluate will only be called with the
# correct arguments by "construction".
do_evaluate: Callable[..., DataFrame]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I like this any better than just using a catch-all signature. For example, mypy seems happy with:

    @classmethod
    def do_evaluate(cls, *args: Any, **kwargs: Any):
        """
        Evaluate the node (given its evaluated children), and return a dataframe.

        Parameters
        ----------
        args
            Non child arguments followed by any evaluated dataframe inputs.
        kwargs
            Key-word arguments. Should be empty!

        Returns
        -------
        DataFrame (on device) representing the evaluation of this plan
        node.

        Raises
        ------
        NotImplementedError
            If we couldn't evaluate things. Ideally this should not occur,
            since the translation phase should pick up things that we
            cannot handle.
        """
        raise NotImplementedError(
            f"Evaluation of plan {type(cls).__name__}"
        )  # pragma: no cover

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove kwargs from the signature (since there are none), it complains though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree it's still "off" :)

Copy link
Member Author

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we need to account for _non_child_args elements that are normalized within __init__. (Left a few suggestions, but didn't look for everything yet)

python/cudf_polars/cudf_polars/dsl/ir.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/dsl/ir.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/dsl/ir.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/dsl/ir.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/dsl/ir.py Outdated Show resolved Hide resolved
@rjzamora
Copy link
Member Author

@wence- - Just a note (still thinking on this): I've been experimenting with the next steps (beyond this PR). I'm realizing that it may not be terribly useful to separate the IR evaluation logic unless we also do the same for Expr classes. I say this because there seem to be many expressions that require reductions and/or shuffles.

@wence-
Copy link
Contributor

wence- commented Oct 30, 2024

@wence- - Just a note (still thinking on this): I've been experimenting with the next steps (beyond this PR). I'm realizing that it may not be terribly useful to separate the IR evaluation logic unless we also do the same for Expr classes. I say this because there seem to be many expressions that require reductions and/or shuffles.

Can we do that in a separate step? I would like to check the utility of this setup in the "single-partition" evaluation model first.

@rjzamora
Copy link
Member Author

Can we do that in a separate step? I would like to check the utility of this setup in the "single-partition" evaluation model first.

Yes, sorry. I'm not suggesting that we do anything further in this PR (I feel that this is ready for a final review).

The goal of my previous comment was to establish that we may need to do do something similar to Expr to (cleanly) enable shuffling/reductions over multiple partitions. Single-partition execution is already easy to implement on top of this PR as it stands.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor feedback. Generally looks good to me.

python/cudf_polars/cudf_polars/dsl/ir.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/dsl/ir.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/dsl/ir.py Outdated Show resolved Hide resolved
@rjzamora rjzamora added 4 - Needs Review Waiting for reviewer to review or respond and removed 2 - In Progress Currently a work in progress labels Nov 5, 2024
@wence-
Copy link
Contributor

wence- commented Nov 5, 2024

/merge

@rapids-bot rapids-bot bot merged commit 9d5041c into rapidsai:branch-24.12 Nov 5, 2024
102 checks passed
@rjzamora rjzamora deleted the rjzamora/refactor-evaluate branch November 5, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Needs Review Waiting for reviewer to review or respond breaking Breaking change cudf.polars Issues specific to cudf.polars improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA] [Proposal] Separate IR evaluation logic from the IR object in cudf-polars
3 participants