Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Avoid creating MoveTables in case of non-empty target tables #16826

Closed
wants to merge 3 commits into from

Conversation

beingnoble03
Copy link
Member

Description

This PR adds a validation check for empty tables in the target keyspace while creating MoveTables workflow.

Related Issue(s)

Screenshot

Screenshot 2024-09-24 at 12 41 44 AM

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

Copy link
Contributor

vitess-bot bot commented Sep 23, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Sep 23, 2024
@github-actions github-actions bot added this to the v21.0.0 milestone Sep 23, 2024
@beingnoble03 beingnoble03 added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Sep 23, 2024
@beingnoble03 beingnoble03 marked this pull request as ready for review September 23, 2024 19:37
@deepthi deepthi requested review from mattlord and removed request for ajm188 September 24, 2024 22:01
go/vt/vtctl/workflow/utils.go Outdated Show resolved Hide resolved
Comment on lines 1039 to 1043
var selectQueries []string
for _, t := range tables {
selectQueries = append(selectQueries, fmt.Sprintf("(select '%s' from %s limit 1)", t, t))
}
query := strings.Join(selectQueries, "union all")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this for every shard? Can't we build the query once and then use it on each shard?

Keep in mind that the size of this query is unbounded, so we could hit max_allowed_packet. I want to say that there's also a limit on the number of UNIONs you can do in a single statement but I didn't find any docs on that in a quick search. For JOINs e.g.:

The maximum number of tables that can be referenced in a single join is 61. This includes a join handled by merging derived tables and views in the FROM clause into the outer query block (see [Section 10.2.2.4, “Optimizing Derived Tables, View References, and Common Table Expressions with Merging or Materialization”](https://dev.mysql.com/doc/refman/8.4/en/derived-table-optimization.html)).

So that may be what we bump up against here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt for finding this. Fixed that. But even I couldn't find out the limit of UNIONs on any docs or web search. What do you think should be the limit? Should we break the query down into multiple queries?

mattlord added a commit to planetscale/vitess that referenced this pull request Sep 25, 2024
And defer that work to vitessio#16826

Signed-off-by: Matt Lord <[email protected]>
@beingnoble03 beingnoble03 marked this pull request as draft September 26, 2024 09:36
Comment on lines -483 to +484
re := regexp.MustCompile(qry)
if re.MatchString(qry) {
re := regexp.MustCompile(qry[1:])
if re.MatchString(query) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was probably a mistake.

Copy link

codecov bot commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 87.30159% with 8 lines in your changes missing coverage. Please review.

Project coverage is 69.45%. Comparing base (95f2e3e) to head (8393dea).
Report is 63 commits behind head on main.

Files with missing lines Patch % Lines
go/vt/vtctl/workflow/utils.go 85.71% 6 Missing ⚠️
go/vt/vtctl/workflow/materializer.go 90.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16826      +/-   ##
==========================================
- Coverage   69.51%   69.45%   -0.06%     
==========================================
  Files        1569     1571       +2     
  Lines      202517   203121     +604     
==========================================
+ Hits       140780   141083     +303     
- Misses      61737    62038     +301     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@beingnoble03
Copy link
Member Author

I am working on the failing TestVtctlMigrate e2e test.

@beingnoble03 beingnoble03 marked this pull request as ready for review September 26, 2024 18:44
Comment on lines +286 to +291
alreadyExistingTables := make([]string, len(hasTargetTable))
i := 0
for t := range hasTargetTable {
alreadyExistingTables[i] = t
i++
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can instead use maps.Keys(hasTargetTable) with validateEmptyTables.

Comment on lines +1032 to +1035
for _, t := range tables {
selectQueries = append(selectQueries, fmt.Sprintf("(select '%s' from %s limit 1)", t, t))
}
query := strings.Join(selectQueries, "union all")
Copy link
Contributor

@mattlord mattlord Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a way to handle cases where there are many tables as the number of tables is unbounded. Two factors we'll have to consider:

  1. This one statement could be so long that it is beyond max_allowed_packet
  2. This one statement could affect more than the maximum allowed tables

In either case, the user is stuck and there is no outlet. At least unless we add a new flag that skips this test / allows for tables with existing data OR splitting the workflow up (which itself might be a bit daunting to manually specify hundreds or thousands of tables).

@rohit-nayak-ps
Copy link
Contributor

rohit-nayak-ps commented Sep 27, 2024

There is one issue already in the existing code: We get the list of tables just for one shard in getTablesInKeyspace() However each target shard might have a different set of tables which exist. If the selected shard has the table, but other shards not, then we will end up not creating the target tables in some shards.

This extends to the new check we are adding. If a table is present in the first code then we will also check for data in the table in other shards. If the other shards don't have that table then the check for data will fail with a "table not found"

In addition, the getSchema() call, made while trying to getTablesInKeyspace() that we make is a heavy duty query, just to get the list of tables.

We may want to consider a different approach that fixes all the issues mentioned above:

  • Use the list of tables in the workflow from MaterializeSettings
  • We run select 1 from <table1> limit 1 for each table on each shard. This will give us both the existence of the table (check for table not found ) and a non-empty table.
  • We can run these queries with READ UNCOMMITTED
  • Run in parallel for each shard

For a workflow with 1000s of tables and 1000s of shards this will imply a large number of queries. We may want to add a --no-validations flag to bypass these checks in such cases. But since this is run during the Create step and with a weak isolation level, we may be fine without the flag for now. The flag might be a fail-safe though in case of some bug in this code, as pointed out above.

@noble, I suggest you wait before coding further on this until we reach an agreement on the final approach.

This approach avoids the issue of size of query/number of joined tables in a query that we have with the union approach. So that will also have to be batched and a query with several tables being joined even with a union may have performance impacts which might reduce the effect of the order of magnitude increase in queries.

@beingnoble03
Copy link
Member Author

The changes were moved to: #16874. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants