-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Errant GTID detection on the replicas when they set replication source #16833
Conversation
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
…ationSource RPC Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16833 +/- ##
==========================================
- Coverage 69.43% 69.42% -0.01%
==========================================
Files 1571 1571
Lines 203086 203232 +146
==========================================
+ Hits 141013 141101 +88
- Misses 62073 62131 +58 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (with comments). Are we convinced that setReplicationSourceLocked
is the only path (other than restore.go
which I've noticed is handled as well)?
|
||
// Calculate the difference between the replica and primary GTID sets. | ||
diffSet := replicaGTIDSet.Difference(primaryGTIDSet) | ||
return diffSet.String(), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
newSet[sid] = intervals | ||
} | ||
return newSet | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
name: "Single errant GTID", | ||
replicaPosition: "MySQL56/8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1,8bc65cca-3fe4-11ed-bbfb-091034d48bd3:34", | ||
primaryPositionStr: "MySQL56/8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-50,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1-30", | ||
errantGtidWanted: "8bc65cca-3fe4-11ed-bbfb-091034d48bd3:34", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
name: "Multiple errant GTID", | ||
replicaPosition: "MySQL56/8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1-32,8bc65cca-3fe4-11ed-bbfb-091034d48bd3:3-35", | ||
primaryPositionStr: "MySQL56/8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-50,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1-30,8bc65cca-3fe4-11ed-bbfb-091034d48bd3:34", | ||
errantGtidWanted: "8bc65cca-3fe4-11ed-bbfb-091034d48b3e:31-32,8bc65cca-3fe4-11ed-bbfb-091034d48bd3:3-33:35", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
name: "Remove unknown UUID", | ||
initialSet: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1:4-24", | ||
uuid: "8bc65c84-3fe4-11ed-a912-257f0fcde6c9", | ||
wantSet: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1:4-24", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -326,7 +323,7 @@ func (tm *TabletManager) restoreDataLocked(ctx context.Context, logger logutil.L | |||
} else if keyspaceInfo.KeyspaceType == topodatapb.KeyspaceType_NORMAL { | |||
// Reconnect to primary only for "NORMAL" keyspaces | |||
params.Logger.Infof("Restore: starting replication at position %v", pos) | |||
if err := tm.startReplication(context.Background(), pos, originalType); err != nil { | |||
if err := tm.startReplication(ctx, pos, originalType); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should note in the release notes that for people running with vtop on k8s, this change means that the replica with broken replication will show as unready. There will be some kind of manual intervention needed to clean up these errant replica tablets, even if using the vtorc feature of marking them as drained.
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Thank you for the reviews! I've addressed all review comments and added the release notes changes as requested. I will merge the PR once the tests are green. |
…ication source (vitessio#16833)" This reverts commit eab262e. Signed-off-by: Manan Gupta <[email protected]>
Description
This PR adds errant GTID detection logic in
setReplicationSourceLocked
call. The intent is that we want to prevent any replica that has an errant GTID from even joining the replication stream. This is because we want to prevent this replica from sending semi-sync ACKs, which can land us into some really annoying and hard to deal with situations.To do the errant GTID detection, the tablets first query their own position and then ask the primary for its position. Then they compare the two to see if they have an errant GTID or not. If they do, they fail the RPC without setting the new source.
Related Issue(s)
Checklist
Deployment Notes