-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error messages in tablet gateway when primary tablets are not serving #125
Improve error messages in tablet gateway when primary tablets are not serving #125
Conversation
… serving Signed-off-by: Austen Lacy <[email protected]>
if kev.TargetIsBeingResharded(target) { | ||
err = vterrors.Errorf(vtrpcpb.Code_CLUSTER_EVENT, "current keyspace is potentially being resharded") | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this return false if it's not actually being resharded? Feel like we should address the bug at that level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shanth96 I considered changing the TargetIsBeingResharded
logic, but I'm also not familiar enough with the logic that determines if it's actually being resharded and if we would have enough information to determine that from the keyspace watch events. We can revisit that though if we think this change isn't as helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
I have reservations about changing the order of error checks and their messages unless we can get this upstreamed. If we need community support for issues in the future, there's going to be confusion if we are mentioning error messages that don't exist in Vitess upstream. |
I'm happy to try upstreaming these changes to see what the maintainers think. |
Cool, let's try |
I found this PR that's in related issue: vitessio#13854 |
Description
The error message returned from the tablet gateway is misleading. For example while a primary was not serving and down Vitess was returning
current keyspace is being resharded
to an application.I swapped the tablet checks to check if the primary is serving before checking the resharding state.
For example, during a recent incident where a primary mysql was down Vitess was returning
current keyspace is being resharded
error messages back to an application which is very confusing. If we look at the tablet serving state however we can see that the primary isNOT_SERVING
.With this change we should at least return an error message of
primary is not serving, there could be a reparent operation in progress
which is more valuable than a "potential" resharding message.