-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VDiff: Support diffing tables without a defined Primary Key #14794
Conversation
Signed-off-by: Matt Lord <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
This also more generally adds support for diffing tables without PK columns. Signed-off-by: Matt Lord <[email protected]>
7072bca
to
cff8648
Compare
Signed-off-by: Matt Lord <[email protected]>
3926e94
to
afaa767
Compare
Signed-off-by: Matt Lord <[email protected]>
afaa767
to
9c252d5
Compare
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
@@ -579,13 +579,7 @@ func (mysqld *Mysqld) ApplySchemaChange(ctx context.Context, dbName string, chan | |||
// defined PRIMARY KEY then it may return the columns for | |||
// that index if it is likely the most efficient one amongst | |||
// the available PKE indexes on the table. | |||
func (mysqld *Mysqld) GetPrimaryKeyEquivalentColumns(ctx context.Context, dbName, table string) ([]string, string, error) { | |||
conn, err := getPoolReconnect(ctx, mysqld.dbaPool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the functionality change for this PR require a change in how we acquire a connection here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first created this I had a mysqlctl.Mysqld
instance readily available in the callsite. In working on this, I realized that I should have changed it when using it in vstreamer as I created a new instance there, which is heavy as it creates conn pools etc (and I wasn't properly calling close in a defer 🤦♂️), so I changed it here to use a callback to talk to the DB instead. It's much lighter and is far better (and is an already established pattern in the mysqlctl package).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we adopt this pattern, then we will find ourselves passing exec
functions in a large amount of functions - not saying it's wrong - just thinking of the code/signature impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already a pattern in that file/package and elsewhere. I agree that it's not one which should be used w/o good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with one question about allowing full table scans.
Also, it looks like we're implementing similar behaviors in different approaches; the vstreamer/vplayer approach uses different code paths, right?
tp.table.PrimaryKeyColumns = append(tp.table.PrimaryKeyColumns, pkeCols...) | ||
} else { | ||
// We use every column together as a substitute PK. | ||
tp.table.PrimaryKeyColumns = append(tp.table.PrimaryKeyColumns, tp.table.Columns...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to allow full scan in VDiff? Should we instead say this table isn't supported? I'm imagining a VDiff running over weeks - as a user of VDiff I'd not want it to run so long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, yes. We allow it in VReplication as well. This is up to the user -- it's not uncommon to have small tables w/o a primary key or PKE. In some cases all queries against the table will be a full scan. It's not our call to say that you shouldn't be able to move those tables and diff them after doing so.
I'm not sure what you're referring to here. The vreplicator, vstreamer, and now vdiff are all using the same function and code path for managing PKEs. Maybe I'm misunderstanding? |
No, you were good. I wasn't clear, even to myself. OK, the example I was thinking of was actually Online DDL, where we can use ay non-NULLable unique key for the migration. Context: #8364. In Online DDL, we get the shared unique key of two tables: Having read a table's unique keys: We populate the Rule with unique key info: We then read that as needed to override PK columns: etc. This is what I was thinking of as different approaches to analyzing/using an alternate unique key. |
Ah, yes. I have a standing TODO to unify these. My thinking is that we'll probably eventually drop non-native OnlineDDL support -- meaning GH-OST and PTOSC -- and then we can remove OnlineDDL's handling and rely on VReplication. That make sense? |
The two are unrelated. Meaning, the logic used today for analyzing shared unique key -- is done only for the vreplication migrations. At any case, this cleanup is for the future - no need to do anything in this PR, I just saw fit to mention this duality. |
Support for which was added in: vitessio/vitess#14794 Signed-off-by: Matt Lord <[email protected]>
Support for which was added in: vitessio/vitess#14794 Signed-off-by: Matt Lord <[email protected]>
Support for which was added in: vitessio/vitess#14794 Signed-off-by: Matt Lord <[email protected]>
Support for which was added in: vitessio/vitess#14794 Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]> Signed-off-by: Eduardo J. Ortega U. <[email protected]>
Description
VReplication supports tables without defined Primary Keys, but VDiff did not. If you executed a VDiff on a workflow that included a table without one, it would fail like this (see issue):
This PR adds support for diffing tables that have no defined Primary Key. It uses a Primary Key equivalent (unique index on non-null column(s)) if one exists, and if not then all of the columns in the table as a substitute — just as does VReplication, and as does MySQL row-based replication (although there it will use any index if there is one first).
Basic manual test:
Related Issue(s)
Checklist