Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor how we run FoldConstants #30622

Merged
merged 5 commits into from
Nov 26, 2024

Conversation

ggevay
Copy link
Contributor

@ggevay ggevay commented Nov 25, 2024

This PR refactors how we run FoldConstants in the first commit, and then does some minor cleanups in subsequent commits. (It's best to review it commit-by-commit.)

As discussed in https://github.com/MaterializeInc/database-issues/issues/5346, the problem with the current way we call FoldConstants is that FoldConstants can't inline Lets, which can instead be done by NormalizeLets. Therefore, this PR's first commit creates a function that bundles together FoldConstants with NormalizeLets into a fixpoint loop, so they can alternate when needed, and thus run the constant folding to completion. This new function is now called every time where the old code used to just call FoldConstants.

Additionally, the PR also adds ReduceScalars in the same fixpoint loop, because that const-folds scalar expressions. (There was always a call to ReduceScalars immediately after FoldConstants, so this part is just a refactoring, where we make this "official".)

There is only a trivial slt change, but in the past I've seen chaos in optimizer traces several times, where constant folding happened in a haphazard way rather then running to completion at the first call. So, it was just a matter of luck that things somehow always worked out fine at the end even with the old code in all our slts.

Note that fold_constants_fixpoint() can be quadratic in the number of Lets in the worst case, but I think this is a smaller problem than things not getting const-folded all the way. (In the long term, we could consider changing FoldConstants itself, so that it can do Let inlining on the fly, and then we wouldn't need to alternate with NormalizeLets. But this is not a trivial thing, as evidenced by the complexity of the inlining code in NormalizeLets.)

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ggevay ggevay added A-optimization Area: query optimization and transformation A-CLUSTER Topics related to the CLUSTER layer labels Nov 25, 2024
@ggevay ggevay marked this pull request as ready for review November 25, 2024 19:16
@ggevay ggevay requested a review from a team as a code owner November 25, 2024 19:16
// We need this to ensure that `CollectIndexRequests` gets a normalized plan.
// (For example, `FoldConstants` can break the normalized form by removing all
// references to a Let, see https://github.com/MaterializeInc/database-issues/issues/6371)
Box::new(normalize_lets::NormalizeLets::new(false)),
Box::new(typecheck::Typecheck::new(ctx.typecheck()).disallow_new_globals()),
Box::new(NormalizeLets::new(false)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor, but I think this NormalizeLets is now rendundant, because fold_constants_fixpoint() just above ended with it. That might be great news, in that it means we are a bit closer to having some expectations about the resulting state (the closer we get to ending with something like "canonicalization" the better, imo).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I've deleted the call now!

// We might have arrived at a constant, e.g., due to contradicting literal constraints.
Box::new(fold_constants::FoldConstants {
Box::new(FoldConstants {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not want the fixed point here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no Lets in the fast_path_optimizer, because this only runs on fast path peeks (as determined by create_fast_path_plan).

But we might need to alternate between ReduceScalars and FoldConstants, so now I've added a custom fixpoint loop that does just these two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! What do you think about a future where normalize_lets is documented not to introduce lets, nor change the structure of let-free expressions (I think this is true / intended), at which point we could just use the general version. It would trade away "performance" but in the interest of "fewer special cases".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it seems ok to me that we have some special things in the fast_path_optimizer. If we were calling more general functions there, then I'd be a little bit afraid of possibly introducing optimizer performance regressions in the fast_path_optimizer when modifying those functions.

Comment on lines -801 to -803
// TODO: One could imagine CSEing multiple occurrences of a global Get
// to make us read from Persist only once.
// See <https://github.com/MaterializeInc/database-issues/issues/6363>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this comment ever right? I thought we dedup reads from persist at the source importing moment. Was it removed because it wasn't right? Or some other reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was never right!

I thought we dedup reads from persist at the source importing moment.

That's correct!

Copy link
Contributor

@frankmcsherry frankmcsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Some comments about unresolved questions, but they seem easy to resolve.

Fwiw, I don't think it is that hard to do constant inlining around Let and LetRec. If the value is constant, then (barring size concerns) inline it for every Get of a Let, or of a LetRec after the binding is formed (so that the collection is constant, rather than empty and then constant).

@ggevay
Copy link
Contributor Author

ggevay commented Nov 26, 2024

Fwiw, I don't think it is that hard to do constant inlining around Let and LetRec. If the value is constant, then (barring size concerns) inline it for every Get of a Let, or of a LetRec after the binding is formed (so that the collection is constant, rather than empty and then constant).

I've created a follow-up issue for this:
https://github.com/MaterializeInc/database-issues/issues/8790

@ggevay ggevay merged commit a558d6b into MaterializeInc:main Nov 26, 2024
79 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CLUSTER Topics related to the CLUSTER layer A-optimization Area: query optimization and transformation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants