VDiff: Support diffing tables without a defined Primary Key #14794

mattlord · 2023-12-15T22:14:00Z

Description

VReplication supports tables without defined Primary Keys, but VDiff did not. If you executed a VDiff on a workflow that included a table without one, it would fail like this (see issue):

VDiff Summary for customer.commerce2customer (4ecb6a28-9263-404e-a4e9-d737ecbca5e5)
State:        error
              Error: (shard 0) buildPlan: error getting PK column collations for table tnopk: empty list supplied for vars2
RowsCompared: 0
HasMismatch:  false
StartedAt:    2023-12-15 21:29:47

Use "--format=json" for more detailed output

This PR adds support for diffing tables that have no defined Primary Key. It uses a Primary Key equivalent (unique index on non-null column(s)) if one exists, and if not then all of the columns in the table as a substitute — just as does VReplication, and as does MySQL row-based replication (although there it will use any index if there is one first).

Basic manual test:

./101_initial_cluster.sh

mysql < ../common/insert_commerce_data.sql

./201_customer_tablets.sh

mysql commerce -e "create table nopk (name varchar(100), age bigint unsigned)"

for _ in {1..15}; do
  mysql commerce -e "insert into nopk values ('${RANDOM}_person_${RANDOM}', ${RANDOM}); insert into nopk select * from nopk;"
done

vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer create --source-keyspace commerce --all-tables

vtctldclient vdiff --workflow=commerce2customer --target-keyspace=customer create

sleep 10

vtctldclient vdiff --workflow=commerce2customer --target-keyspace=customer show last --verbose

Related Issue(s)

Fixes: Bug Report: VDiff fails on workflow that includes a table without a Primary Key #14793

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added: Add warning now that we support tables with PKs website#1651

Signed-off-by: Matt Lord <[email protected]>

vitess-bot · 2023-12-15T22:14:03Z

This also more generally adds support for diffing tables without PK columns. Signed-off-by: Matt Lord <[email protected]>

Signed-off-by: Matt Lord <[email protected]>

rohit-nayak-ps · 2023-12-19T10:41:01Z

go/vt/mysqlctl/schema.go

@@ -579,13 +579,7 @@ func (mysqld *Mysqld) ApplySchemaChange(ctx context.Context, dbName string, chan
 // defined PRIMARY KEY then it may return the columns for
 // that index if it is likely the most efficient one amongst
 // the available PKE indexes on the table.
-func (mysqld *Mysqld) GetPrimaryKeyEquivalentColumns(ctx context.Context, dbName, table string) ([]string, string, error) {
-	conn, err := getPoolReconnect(ctx, mysqld.dbaPool)


Did the functionality change for this PR require a change in how we acquire a connection here?

When I first created this I had a mysqlctl.Mysqld instance readily available in the callsite. In working on this, I realized that I should have changed it when using it in vstreamer as I created a new instance there, which is heavy as it creates conn pools etc (and I wasn't properly calling close in a defer 🤦‍♂️), so I changed it here to use a callback to talk to the DB instead. It's much lighter and is far better (and is an already established pattern in the mysqlctl package).

If we adopt this pattern, then we will find ourselves passing exec functions in a large amount of functions - not saying it's wrong - just thinking of the code/signature impact.

It's already a pattern in that file/package and elsewhere. I agree that it's not one which should be used w/o good reason.

shlomi-noach

Looks good with one question about allowing full table scans.

Also, it looks like we're implementing similar behaviors in different approaches; the vstreamer/vplayer approach uses different code paths, right?

shlomi-noach · 2023-12-19T17:39:27Z

go/vt/vttablet/tabletmanager/vdiff/table_plan.go

+			tp.table.PrimaryKeyColumns = append(tp.table.PrimaryKeyColumns, pkeCols...)
+		} else {
+			// We use every column together as a substitute PK.
+			tp.table.PrimaryKeyColumns = append(tp.table.PrimaryKeyColumns, tp.table.Columns...)


Does it make sense to allow full scan in VDiff? Should we instead say this table isn't supported? I'm imagining a VDiff running over weeks - as a user of VDiff I'd not want it to run so long.

IMO, yes. We allow it in VReplication as well. This is up to the user -- it's not uncommon to have small tables w/o a primary key or PKE. In some cases all queries against the table will be a full scan. It's not our call to say that you shouldn't be able to move those tables and diff them after doing so.

mattlord · 2023-12-19T18:06:09Z

Also, it looks like we're implementing similar behaviors in different approaches; the vstreamer/vplayer approach uses different code paths, right?

I'm not sure what you're referring to here. The vreplicator, vstreamer, and now vdiff are all using the same function and code path for managing PKEs. Maybe I'm misunderstanding?

shlomi-noach · 2023-12-19T18:21:58Z

I'm not sure what you're referring to here. The vreplicator, vstreamer, and now vdiff are all using the same function and code path for managing PKEs. Maybe I'm misunderstanding?

No, you were good. I wasn't clear, even to myself. OK, the example I was thinking of was actually Online DDL, where we can use ay non-NULLable unique key for the migration. Context: #8364.

In Online DDL, we get the shared unique key of two tables:

https://github.com/vitessio/vitess/pull/8364/files#diff-552cf284bd2c1a9e4ae1c9f9f1c350c7dcc04177ed3a663b0670d084518e7fe1R107

Having read a table's unique keys:

https://github.com/vitessio/vitess/pull/8364/files#diff-552cf284bd2c1a9e4ae1c9f9f1c350c7dcc04177ed3a663b0670d084518e7fe1R205

We populate the Rule with unique key info:

https://github.com/vitessio/vitess/pull/8364/files#diff-5e69c397e15dcfda72bb6fe8dba93b2d11a8d2d7b594ebfbbdcfdc07bfc3791dR165

We then read that as needed to override PK columns:

https://github.com/vitessio/vitess/pull/8364/files#diff-944642971422bb00ed68466f784f8b586f5e800f57e0c46002d6194c517cb1a0R526-R533

etc.

This is what I was thinking of as different approaches to analyzing/using an alternate unique key.

mattlord · 2023-12-19T18:30:39Z

I'm not sure what you're referring to here. The vreplicator, vstreamer, and now vdiff are all using the same function and code path for managing PKEs. Maybe I'm misunderstanding?

No, you were good. I wasn't clear, even to myself. OK, the example I was thinking of was actually Online DDL, where we can use ay non-NULLable unique key for the migration. Context: #8364.

In Online DDL, we get the shared unique key of two tables:

https://github.com/vitessio/vitess/pull/8364/files#diff-552cf284bd2c1a9e4ae1c9f9f1c350c7dcc04177ed3a663b0670d084518e7fe1R107

Having read a table's unique keys:

https://github.com/vitessio/vitess/pull/8364/files#diff-552cf284bd2c1a9e4ae1c9f9f1c350c7dcc04177ed3a663b0670d084518e7fe1R205

We populate the Rule with unique key info:

https://github.com/vitessio/vitess/pull/8364/files#diff-5e69c397e15dcfda72bb6fe8dba93b2d11a8d2d7b594ebfbbdcfdc07bfc3791dR165

We then read that as needed to override PK columns:

https://github.com/vitessio/vitess/pull/8364/files#diff-944642971422bb00ed68466f784f8b586f5e800f57e0c46002d6194c517cb1a0R526-R533

etc.

This is what I was thinking of as different approaches to analyzing/using an alternate unique key.

Ah, yes. I have a standing TODO to unify these. My thinking is that we'll probably eventually drop non-native OnlineDDL support -- meaning GH-OST and PTOSC -- and then we can remove OnlineDDL's handling and rely on VReplication. That make sense?

shlomi-noach · 2023-12-19T18:38:06Z

we'll probably eventually drop non-native OnlineDDL support -- meaning GH-OST and PTOSC -- and then we can remove OnlineDDL's handling and rely on VReplication. That make sense?

The two are unrelated. Meaning, the logic used today for analyzing shared unique key -- is done only for the vreplication migrations.

At any case, this cleanup is for the future - no need to do anything in this PR, I just saw fit to mention this duality.

Support for which was added in: vitessio/vitess#14794 Signed-off-by: Matt Lord <[email protected]>

Signed-off-by: Matt Lord <[email protected]> Signed-off-by: Eduardo J. Ortega U. <[email protected]>

Don't try to get PK col info from I_S when there are no PK cols

c1887b7

Signed-off-by: Matt Lord <[email protected]>

mattlord added Type: Bug Component: VReplication labels Dec 15, 2023

github-actions bot added this to the v19.0.0 milestone Dec 15, 2023

Add unit test case

4910b07

This also more generally adds support for diffing tables without PK columns. Signed-off-by: Matt Lord <[email protected]>

mattlord changed the title ~~VDiff: Don't try to get PK col info from I_S when there are no PK cols~~ VDiff: Support diffing tables without a defined Primary Key Dec 16, 2023

mattlord force-pushed the vdiff_no_pk branch from 7072bca to cff8648 Compare December 16, 2023 06:06

Add e2e test

c44ba67

Signed-off-by: Matt Lord <[email protected]>

mattlord force-pushed the vdiff_no_pk branch 2 times, most recently from 3926e94 to afaa767 Compare December 16, 2023 19:36

Add same PKE support that vreplication has

9c252d5

Signed-off-by: Matt Lord <[email protected]>

mattlord force-pushed the vdiff_no_pk branch from afaa767 to 9c252d5 Compare December 16, 2023 19:52

mattlord marked this pull request as ready for review December 16, 2023 19:52

mattlord requested review from harshit-gangal, systay, shlomi-noach, rohit-nayak-ps, deepthi and GuptaManan100 as code owners December 16, 2023 19:52

mattlord removed the request for review from deepthi December 16, 2023 19:53

mattlord requested review from deepthi and removed request for systay, GuptaManan100 and harshit-gangal December 16, 2023 19:53

mattlord added 2 commits December 16, 2023 15:40

Fix unit tests

a4eb271

Signed-off-by: Matt Lord <[email protected]>

Merge remote-tracking branch 'origin/main' into vdiff_no_pk

6cced63

Signed-off-by: Matt Lord <[email protected]>

rohit-nayak-ps approved these changes Dec 19, 2023

View reviewed changes

shlomi-noach approved these changes Dec 19, 2023

View reviewed changes

mattlord added a commit to planetscale/vitess-website that referenced this pull request Dec 19, 2023

Add warning now that we support tables with PKs

e039de0

Support for which was added in: vitessio/vitess#14794 Signed-off-by: Matt Lord <[email protected]>

mattlord mentioned this pull request Dec 19, 2023

Add warning now that we support tables with PKs vitessio/website#1651

Merged

mattlord removed the NeedsWebsiteDocsUpdate What it says label Dec 19, 2023

mattlord added a commit to planetscale/vitess-website that referenced this pull request Dec 19, 2023

Add warning now that we support tables with PKs

10e5e4d

Support for which was added in: vitessio/vitess#14794 Signed-off-by: Matt Lord <[email protected]>

mattlord added a commit to planetscale/vitess-website that referenced this pull request Dec 20, 2023

Add warning now that we support tables with PKs

a14d4c7

Support for which was added in: vitessio/vitess#14794 Signed-off-by: Matt Lord <[email protected]>

mattlord merged commit be7b670 into vitessio:main Dec 20, 2023
116 of 117 checks passed

mattlord deleted the vdiff_no_pk branch December 20, 2023 02:20

rohit-nayak-ps pushed a commit to vitessio/website that referenced this pull request Dec 20, 2023

Add warning now that we support tables with PKs (#1651)

8aa9aca

Support for which was added in: vitessio/vitess#14794 Signed-off-by: Matt Lord <[email protected]>

ejortegau referenced this pull request in slackhq/vitess Jan 24, 2024

VDiff: Support diffing tables without a defined Primary Key (#14794)

10153c1

Signed-off-by: Matt Lord <[email protected]> Signed-off-by: Eduardo J. Ortega U. <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VDiff: Support diffing tables without a defined Primary Key #14794

VDiff: Support diffing tables without a defined Primary Key #14794

mattlord commented Dec 15, 2023 •

edited

Loading

vitess-bot bot commented Dec 15, 2023

rohit-nayak-ps Dec 19, 2023

mattlord Dec 19, 2023

shlomi-noach Dec 19, 2023

mattlord Dec 19, 2023 •

edited

Loading

shlomi-noach left a comment

shlomi-noach Dec 19, 2023

mattlord Dec 19, 2023

mattlord commented Dec 19, 2023

shlomi-noach commented Dec 19, 2023 •

edited

Loading

mattlord commented Dec 19, 2023

shlomi-noach commented Dec 19, 2023 •

edited

Loading

VDiff: Support diffing tables without a defined Primary Key #14794

VDiff: Support diffing tables without a defined Primary Key #14794

Conversation

mattlord commented Dec 15, 2023 • edited Loading

Description

Related Issue(s)

Checklist

vitess-bot bot commented Dec 15, 2023

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

rohit-nayak-ps Dec 19, 2023

Choose a reason for hiding this comment

mattlord Dec 19, 2023

Choose a reason for hiding this comment

shlomi-noach Dec 19, 2023

Choose a reason for hiding this comment

mattlord Dec 19, 2023 • edited Loading

Choose a reason for hiding this comment

shlomi-noach left a comment

Choose a reason for hiding this comment

shlomi-noach Dec 19, 2023

Choose a reason for hiding this comment

mattlord Dec 19, 2023

Choose a reason for hiding this comment

mattlord commented Dec 19, 2023

shlomi-noach commented Dec 19, 2023 • edited Loading

mattlord commented Dec 19, 2023

shlomi-noach commented Dec 19, 2023 • edited Loading

mattlord commented Dec 15, 2023 •

edited

Loading

mattlord Dec 19, 2023 •

edited

Loading

shlomi-noach commented Dec 19, 2023 •

edited

Loading

shlomi-noach commented Dec 19, 2023 •

edited

Loading