Refactor how we run `FoldConstants` #30622

ggevay · 2024-11-25T19:06:23Z

This PR refactors how we run FoldConstants in the first commit, and then does some minor cleanups in subsequent commits. (It's best to review it commit-by-commit.)

As discussed in https://github.com/MaterializeInc/database-issues/issues/5346, the problem with the current way we call FoldConstants is that FoldConstants can't inline Lets, which can instead be done by NormalizeLets. Therefore, this PR's first commit creates a function that bundles together FoldConstants with NormalizeLets into a fixpoint loop, so they can alternate when needed, and thus run the constant folding to completion. This new function is now called every time where the old code used to just call FoldConstants.

Additionally, the PR also adds ReduceScalars in the same fixpoint loop, because that const-folds scalar expressions. (There was always a call to ReduceScalars immediately after FoldConstants, so this part is just a refactoring, where we make this "official".)

There is only a trivial slt change, but in the past I've seen chaos in optimizer traces several times, where constant folding happened in a haphazard way rather then running to completion at the first call. So, it was just a matter of luck that things somehow always worked out fine at the end even with the old code in all our slts.

Note that fold_constants_fixpoint() can be quadratic in the number of Lets in the worst case, but I think this is a smaller problem than things not getting const-folded all the way. (In the long term, we could consider changing FoldConstants itself, so that it can do Let inlining on the fly, and then we wouldn't need to alternate with NormalizeLets. But this is not a trivial thing, as evidenced by the complexity of the inlining code in NormalizeLets.)

Motivation

This PR refactors existing code: https://github.com/MaterializeInc/database-issues/issues/5346

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

frankmcsherry · 2024-11-26T11:21:04Z

src/transform/src/lib.rs

            // We need this to ensure that `CollectIndexRequests` gets a normalized plan.
            // (For example, `FoldConstants` can break the normalized form by removing all
            // references to a Let, see https://github.com/MaterializeInc/database-issues/issues/6371)
-            Box::new(normalize_lets::NormalizeLets::new(false)),
-            Box::new(typecheck::Typecheck::new(ctx.typecheck()).disallow_new_globals()),
+            Box::new(NormalizeLets::new(false)),


Very minor, but I think this NormalizeLets is now rendundant, because fold_constants_fixpoint() just above ended with it. That might be great news, in that it means we are a bit closer to having some expectations about the resulting state (the closer we get to ending with something like "canonicalization" the better, imo).

Right, I've deleted the call now!

frankmcsherry · 2024-11-26T11:23:31Z

src/transform/src/lib.rs

            // We might have arrived at a constant, e.g., due to contradicting literal constraints.
-            Box::new(fold_constants::FoldConstants {
+            Box::new(FoldConstants {


Do we not want the fixed point here?

There are no Lets in the fast_path_optimizer, because this only runs on fast path peeks (as determined by create_fast_path_plan).

But we might need to alternate between ReduceScalars and FoldConstants, so now I've added a custom fixpoint loop that does just these two.

Cool! What do you think about a future where normalize_lets is documented not to introduce lets, nor change the structure of let-free expressions (I think this is true / intended), at which point we could just use the general version. It would trade away "performance" but in the interest of "fewer special cases".

Well, it seems ok to me that we have some special things in the fast_path_optimizer. If we were calling more general functions there, then I'd be a little bit afraid of possibly introducing optimizer performance regressions in the fast_path_optimizer when modifying those functions.

frankmcsherry · 2024-11-26T11:24:34Z

src/transform/src/normalize_lets.rs

-                            // TODO: One could imagine CSEing multiple occurrences of a global Get
-                            // to make us read from Persist only once.
-                            // See <https://github.com/MaterializeInc/database-issues/issues/6363>


Was this comment ever right? I thought we dedup reads from persist at the source importing moment. Was it removed because it wasn't right? Or some other reason?

It was never right!

I thought we dedup reads from persist at the source importing moment.

That's correct!

frankmcsherry

Looks good to me. Some comments about unresolved questions, but they seem easy to resolve.

Fwiw, I don't think it is that hard to do constant inlining around Let and LetRec. If the value is constant, then (barring size concerns) inline it for every Get of a Let, or of a LetRec after the binding is formed (so that the collection is constant, rather than empty and then constant).

ggevay · 2024-11-26T13:50:43Z

Fwiw, I don't think it is that hard to do constant inlining around Let and LetRec. If the value is constant, then (barring size concerns) inline it for every Get of a Let, or of a LetRec after the binding is formed (so that the collection is constant, rather than empty and then constant).

I've created a follow-up issue for this:
https://github.com/MaterializeInc/database-issues/issues/8790

ggevay added A-optimization Area: query optimization and transformation A-CLUSTER Topics related to the CLUSTER layer labels Nov 25, 2024

ggevay force-pushed the fold-constants-fixpoint branch from 1b7b051 to 410c378 Compare November 25, 2024 19:13

ggevay marked this pull request as ready for review November 25, 2024 19:16

ggevay requested a review from a team as a code owner November 25, 2024 19:16

ggevay requested a review from frankmcsherry November 25, 2024 19:16

frankmcsherry reviewed Nov 26, 2024

View reviewed changes

frankmcsherry approved these changes Nov 26, 2024

View reviewed changes

ggevay added 5 commits November 26, 2024 14:22

Refactor how we call FoldConstants

7ddd36c

Trivial cleanup

fb27611

Rename fuse_and_collapse() to make it clear that it's a fixpoint loop

8083672

Import transforms in transform/src/lib.rs to reduce visual noise

ebbdcfe

Minor changes in transform/src/lib.rs

66096e1

ggevay force-pushed the fold-constants-fixpoint branch from 410c378 to 66096e1 Compare November 26, 2024 13:37

ggevay enabled auto-merge November 26, 2024 13:48

ggevay merged commit a558d6b into MaterializeInc:main Nov 26, 2024
79 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor how we run `FoldConstants` #30622

Refactor how we run `FoldConstants` #30622

ggevay commented Nov 25, 2024 •

edited

Loading

frankmcsherry Nov 26, 2024

ggevay Nov 26, 2024

frankmcsherry Nov 26, 2024

ggevay Nov 26, 2024

frankmcsherry Nov 26, 2024

ggevay Nov 26, 2024

frankmcsherry Nov 26, 2024

ggevay Nov 26, 2024

frankmcsherry left a comment

ggevay commented Nov 26, 2024

Refactor how we run FoldConstants #30622

Refactor how we run FoldConstants #30622

Conversation

ggevay commented Nov 25, 2024 • edited Loading

Motivation

Tips for reviewer

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frankmcsherry left a comment

Choose a reason for hiding this comment

ggevay commented Nov 26, 2024

Refactor how we run `FoldConstants` #30622

Refactor how we run `FoldConstants` #30622

ggevay commented Nov 25, 2024 •

edited

Loading