Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query properties rule filtering #2613

Merged
merged 19 commits into from
Aug 5, 2024
Merged

Conversation

max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Jul 31, 2024

Edit most of the analyzer interfaces to pass a new context object that accumulates query specific properties. Currently the object is called QueryFlags, and accumulates information about the query to inform better rule filtering and more efficient spooling strategies.

The change that has the biggest effect on oltp_point_select perf is the sql.QFlagMax1Row setting, which lets us skip the default results iter boilerplate when we're only returning one row. Added a couple other skips for rules that are easy to whitelist correctly and show prominently on CPU profiles, like aggregations and subqueries.

@max-hoffman max-hoffman requested a review from zachmu August 1, 2024 17:56
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea is sound and good, but this implementation conflates two separate concerns, which is undesirable:

  • The analyzer wants to skip / alter certain steps based on previously determined facts about the plan
  • The row spooler wants to shave some time by avoiding the overhead of a loop when it's not necessary

The first concern should be encapsulated in a data structure local to a given analyzer invocation. Leaking it to the query context as a whole is not ideal. It considerably tightens the analyzer / context contract in a way that will be easy to introduce bugs into.

The second concern seems more appropriate for this implementation, but it's still not great. A better solution imo would be something like a QueryProps top-level node that the analyzer uses to communicate things of this nature to the rest of the engine. A new return type from analysis that contains [Node, Props] would also work well.

But I definitely want to push back on cramming more things into the context / session, it's already quite crowded in there. And more importantly, the lifecycle of that object is much longer / broader than the narrow concerns you're trying to address here.

@@ -499,6 +501,33 @@ func resultForEmptyIter(ctx *sql.Context, iter sql.RowIter, resultFields []*quer
return &sqltypes.Result{Fields: resultFields}, nil
}

// resultForMax1RowIter ensures that an empty iterator returns at most one row
func resultForMax1RowIter(ctx *sql.Context, schema sql.Schema, iter sql.RowIter, resultFields []*querypb.Field) (*sqltypes.Result, error) {
defer trace2.StartRegion(ctx, "Handler.resultForMax1RowIter").End()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use a better import alias for this

// Strict index lookup without a join or subquery scope will return
// at most one row. We could also use some sort of scope counting
// to check for single scope.
ctx.QProps.Set(sql.QPropMax1Row)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of concerned about the lifecycle of these properties introducing correctness issues over time.

Do you have a good sense about e.g.

  • contexts are always created fresh for every new query (even in multiquery mode)
  • subcontexts don't inherit them?

@@ -155,6 +155,8 @@ func replanJoin(ctx *sql.Context, n *plan.JoinNode, a *Analyzer, scope *plan.Sco
j := memo.NewJoinOrderBuilder(m)
j.ReorderJoin(n)

ctx.QProps.Set(sql.QPropInnerJoin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just means there's one or more inner joins somewhere in the query?


package sql

const (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document these with comments

@max-hoffman
Copy link
Contributor Author

@zachmu I refactored this to pass flags through rule handlers. Equivalent behavior, but the lifecycle is separate from the context now.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one note on this implementation

qp.Flags.Add(flag)
}

func (qp *QueryFlags) IsSet(flag int) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Everything is set if it's nil" is kind of insane bro

If you want this kind of default behavior, put it in a function up one abstraction layer (in the analyzer package), let this be a dumb data object.

@max-hoffman max-hoffman merged commit f695d9f into main Aug 5, 2024
7 of 8 checks passed
@max-hoffman max-hoffman deleted the max/query-props-rule-filtering branch August 5, 2024 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants