Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evalengine: serialize to SQL #14337

Merged
merged 5 commits into from
Oct 30, 2023
Merged

evalengine: serialize to SQL #14337

merged 5 commits into from
Oct 30, 2023

Conversation

vmg
Copy link
Collaborator

@vmg vmg commented Oct 23, 2023

Description

As part of some planner refactorings, we'd like to depend more on the evalengine for its type definitions. To accomplish this more efficiently, we want to keep the evalengine AST expressions inside the planner, but these expressions need to be able to be serialized back into SQL for all the plans that will be relayed to the upstream MySQL. This PR implements this functionality by making evalengine.Expr implement sqlparser.Expr.

cc @dbussink @systay

Related Issue(s)

Part of #14310

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Oct 23, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Oct 23, 2023
@github-actions github-actions bot added this to the v19.0.0 milestone Oct 23, 2023
@vmg vmg added Type: Internal Cleanup Component: Query Serving and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request labels Oct 23, 2023
@vmg vmg force-pushed the vmg/evalsql branch 2 times, most recently from 1fdf96c to 9345ca7 Compare October 23, 2023 11:35
@dbussink dbussink force-pushed the vmg/evalsql branch 2 times, most recently from 4a4a03f to f9794e2 Compare October 24, 2023 12:19
@vmg vmg requested a review from mattlord as a code owner October 24, 2023 13:57
@vmg vmg force-pushed the vmg/evalsql branch 5 times, most recently from affc1eb to f57f7a5 Compare October 26, 2023 08:28
return 1
}

func (asm *assembler) PushColumn_datetime(offset int) {
Copy link
Contributor

@dbussink dbussink Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmg Do we treat timestamp columns the same as datetime?

Nvm, I read this further down that we do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we've always done so in the evalengine. All the existing code does.

Key string
Type sqltypes.Type
Collation collations.ID
dynamicTypeOffset int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a little comment explaining this field?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

"[COLUMN 4] as weight_string(us.bar)"
"count(*) * count(*) as count(*)",
"count(ue.foo) * count(*) as count(ue.foo)",
":3 as bar",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear to me why some offsets are serialized as column names and some as offsets

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's up to the planner: if you passed the original column sqlparser.Expr when translating the expression, we'll keep it when serializing. Otherwise, we'll print an offset.

Copy link
Collaborator

@systay systay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lovely stuff

@@ -31,6 +31,9 @@ import (
)

func start(t *testing.T) (utils.MySQLCompare, func()) {
// ensure that the vschema and the tables have been created before running any tests
_ = utils.WaitForAuthoritative(t, keyspaceName, "t1", clusterInstance.VtgateProcess.ReadVSchema)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we also want on the older release branches to avoid flaky tests there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whenever we have established that the schema tracking is what is introducing the flakiness, this is an excellent way to stop if from flaking

}
}

// NewTupleExpr returns a tuple expression
func NewTupleExpr(exprs ...Expr) TupleExpr {
tupleExpr := make(TupleExpr, 0, len(exprs))
for _, f := range exprs {
tupleExpr = append(tupleExpr, f)
tupleExpr = append(tupleExpr, f.(IR))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this can never be anything but an IR type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I should lift that into the signature TBH.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -39,7 +45,11 @@ func (asm *assembler) PushColumn_i(offset int) {
asm.adjustStack(1)

asm.emit(func(env *ExpressionEnv) int {
return push_i(env, env.Row[offset].Raw())
col := env.Row[offset]
if col.IsNull() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not possible to get a NULL here before or was this an existing bug that was uncovered? And could we avoid emitting this if we know it's NOT NULL?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an existing bug! It would crash the compiler when a value from a nullable column contains a null (which is, huh, very frequent). The extra null checks can be removed by specializing: we would need to add a dependency on Fields, however, which the compiler doesn't have right now. Alternatively, we'd need to gather the nullability information from the VSchema, which may or may not be supported yet. Definitely something to follow up on a separate PR.

Signed-off-by: Vicent Marti <[email protected]>
Copy link
Member

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

Comment on lines +28 to +29
t.Skipf("TODO: these tests are not green")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this left to be addressed in this PR or do we plan to fix it later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuptaManan100 oops just saw this. Will follow up. These tests are no longer relevant, they need to be refactored.

@vmg vmg merged commit 34216a2 into vitessio:main Oct 30, 2023
115 checks passed
@vmg vmg deleted the vmg/evalsql branch October 30, 2023 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants