-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Take replication lag into account while selecting primary candidate #14634
Conversation
Signed-off-by: Manan Gupta <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one suggestion for using Duration to make the final comparison read a bit more higher-level, up to you if you think it's not worth it!
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
@ajm188 that was a great suggestion. I have made the changes. Thank-you! 💕 |
Signed-off-by: Manan Gupta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it! I'm not sure about the default value, so will leave that to you.
…dable Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Description
This PR adds an additional dimension to choosing a new primary for PlannedReparentShard. If a tablet has a lot of replication lag, even more than
wait-replicas-timeout
, then waiting for replication to catch up would time out, so it is better to not consider that tablet for new primaryA new sub-flag
--tolerable-replication-lag
has been added to the commandPlannedReparentShard
that allows users to specify the amount of replication lag that is considered acceptable for a tablet to be eligible for promotion when Vitess makes the choice of a new primary.This feature is opt-in and not specifying this sub-flag makes Vitess ignore the replication lag entirely.
A new flag in VTOrc with the same name has been added to control the behaviour of the PlannedReparentShard calls that VTOrc issues.
Related Issue(s)
lag > wait-replicas-timeout
#14633Checklist
Deployment Notes