Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: Normalizing literals in SELECT is broken #15043

Closed
wants to merge 11 commits into from

Conversation

systay
Copy link
Collaborator

@systay systay commented Jan 26, 2024

Description

Vitess does auto-parameterization of queries. This means replacing literal values with arguments, so that a plan cache can be re-used between queries, if the only difference is the literal value.

Sometimes, this rewrite messes with queries, such as in the issue reported below.

In this PR, I wanted to stop the Normalizer from rewriting expressions if they are inside unalised select expressions.

To do that, first I needed to clean up our AST a little bit.

Back in the days, we used FuncExpr for all expressions, including count(*). Since then, we have replaced most functions with custom AST types (such as CountStar), and this means that we no longer need to parse * as an argument to a function call.

Using pure expressions simplifies a lot of code and makes it possible to know that when we are looking at an *sqlparser.AliasedExpr, we know we are in the SELECT clause and nowhere else.

Related Issue(s)

Fixes #15020

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

Copy link
Contributor

vitess-bot bot commented Jan 26, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jan 26, 2024
@github-actions github-actions bot added this to the v19.0.0 milestone Jan 26, 2024
@systay systay added Type: Bug Component: Query Serving and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jan 26, 2024
@@ -2398,7 +2398,7 @@ type (
FuncExpr struct {
Qualifier IdentifierCS
Name IdentifierCI
Exprs SelectExprs
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back in the days, we used FuncExpr for all expressions, including count(*). Since then, we have replaced most functions with custom AST types (such as CountStar), and this means that we no longer need to parse * as an argument to a function call.

Using pure expressions simplifies a lot of code and makes it possible to know that when we are looking at an *sqlparser.AliasedExpr, we know we are in the SELECT clause and nowhere else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this change

@systay systay changed the title bugfix: Literals in SELECT are broken bugfix: Normalizing literals in SELECT is broken Jan 26, 2024
@systay

This comment was marked as outdated.

@@ -170,7 +170,7 @@ func TestSubqueryInAggregation(t *testing.T) {
mcmp.Exec("insert into t1(id1, id2) values(0,0),(1,1)")
mcmp.Exec("insert into t2(id3, id4) values(1,2),(5,7)")
mcmp.Exec(`SELECT max((select min(id2) from t1)) FROM t2`)
mcmp.Exec(`SELECT max((select group_concat(id1, id2) from t1 where id1 = 1)) FROM t1 where id1 = 1`)
mcmp.Exec(`SELECT max((select group_concat(id1, id2) from t1 where id1 = 1)) as x FROM t1 where id1 = 1`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why an alias is needed here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the alias we will not normalize the subquery, since it is in a SELECT expression. We are however normalizing the WHERE clause, so the planner doesn't know that the outer query and the inner query are going to the same shard. That turns this into a correlated query that we don't support. By adding the alias, we are free to normalize both and the planner knows it safe to merge the two sides of the subquery

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a failing query to the unsupported list if it is not already there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not very easy to do. this is unsupported only after the normalizer has had a go at it. it's unsupported because it's a correlated subquery when we can't merge the two sides, and we already have other examples of correlated subqueries not being supported at the moment

@systay systay modified the milestones: v19.0.0, v20.0.0 Feb 6, 2024
@systay systay force-pushed the literal-aliased-expr branch from de8fc91 to 3c92aed Compare February 6, 2024 12:55
@systay systay requested a review from harshit-gangal February 6, 2024 12:55
@GrahamCampbell
Copy link
Contributor

Needs a backport to 19.0 label?

Signed-off-by: Andres Taylor <[email protected]>
@systay
Copy link
Collaborator Author

systay commented Feb 6, 2024

Needs a backport to 19.0 label?

Not sure we want to backport this. @deepthi @harshit-gangal wdyt?

Copy link

codecov bot commented Feb 6, 2024

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (4e31f60) 67.29% compared to head (08cabf1) 67.27%.

Files Patch % Lines
go/vt/vtgate/planbuilder/operators/aggregator.go 0.00% 5 Missing ⚠️
go/vt/vttablet/tabletmanager/vdiff/table_differ.go 0.00% 4 Missing ⚠️
go/vt/vtgate/simplifier/expression_simplifier.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15043      +/-   ##
==========================================
- Coverage   67.29%   67.27%   -0.02%     
==========================================
  Files        1560     1560              
  Lines      192123   192109      -14     
==========================================
- Hits       129283   129249      -34     
- Misses      62840    62860      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM!

@systay systay force-pushed the literal-aliased-expr branch from fd38aab to e831863 Compare February 7, 2024 15:27
@systay systay requested a review from notfelineit as a code owner February 7, 2024 15:27
@systay systay force-pushed the literal-aliased-expr branch from e831863 to 96c3ed3 Compare February 7, 2024 15:28
@frouioui
Copy link
Member

frouioui commented Feb 7, 2024

Not sure we want to backport this. @deepthi @harshit-gangal wdyt?

I feel like we should given this is a bug fix, but i have no strong opinions.

EDIT: I think we should also backport to release-18.0, the linked issue (#15020) was using v18.0.1.

Copy link
Member

@frouioui frouioui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good change. Using sqlparser.Expr instead of sqlparser.SelectExpr interface is awesome

@harshit-gangal
Copy link
Member

Needs a backport to 19.0 label?

Not sure we want to backport this. @deepthi @harshit-gangal wdyt?

some part of it is regression from v17 and some are new support. So, we should backport it till v18

@harshit-gangal harshit-gangal added Backport to: release-18.0 Backport to: release-19.0 Needs to be back ported to release-19.0 labels Feb 8, 2024
Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with 1 comment about adding an unsupported query test.

Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, this will cause regression for aggregation subquery without an alias.

@systay systay mentioned this pull request Feb 26, 2024
38 tasks
@systay systay marked this pull request as draft February 26, 2024 10:51
This was referenced Mar 13, 2024
@frouioui frouioui removed Backport to: release-18.0 Backport to: release-19.0 Needs to be back ported to release-19.0 labels Mar 13, 2024
@systay systay closed this Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: The SQL query "select 1 from user union select 2 from user" is causing an error.
5 participants