-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online DDL: better error messages in cut-over phase #17052
Online DDL: better error messages in cut-over phase #17052
Conversation
…ErrorFromError Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
@@ -896,7 +896,10 @@ func (e *Executor) cutOverVReplMigration(ctx context.Context, s *VReplStream, sh | |||
defer cancel() | |||
// Wait for target to reach the up-to-date pos | |||
if err := tmClient.VReplicationWaitForPos(ctx, tablet.Tablet, s.id, replication.EncodePosition(pos)); err != nil { | |||
return err | |||
if s, _ := e.readVReplStream(ctx, s.workflow, true); s != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As added bonus, why not also report what position we did land at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also in the error message from the VReplication Engine FWIW.
if merr, ok := err.(*sqlerror.SQLError); ok { | ||
switch merr.Num { | ||
|
||
if sqlErr, isSQLErr := sqlerror.NewSQLErrorFromError(err).(*sqlerror.SQLError); isSQLErr && sqlErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is required because the underlying SQL error is now likely to be wrapped by a vitess error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we have any wrapping, but I think places where we cast errors directly probably should use https://pkg.go.dev/errors#example-As these days. So if we're changing this, maybe also switch to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlerror.NewSQLErrorFromError
is even a bit more elaborate and can parse a MySQL error out of a string. We use that pretty consistently across the repo.
I'd say if we wanted to change that, it would be in a dedicated PR in one giant sweep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlerror.NewSQLErrorFromError
It's not about this function though? It's about the cast after?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the cast at that same line... Rich story. We tried to get rid of it altogether, but hit the (weird, IMO) golang
design decision where an interface (SQLError
) can be non-nil
where the underlying implementation is nil
. So we kind of have to use the cast. What we know, though, is that the result of sqlerror.NewSQLErrorFromError
is either a fully unwrapped sqlerror.SQLError
or it is nil
. So no need to use As
on the result of this function.
Reference: #12574
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17052 +/- ##
==========================================
+ Coverage 67.14% 67.15% +0.01%
==========================================
Files 1571 1571
Lines 252060 252064 +4
==========================================
+ Hits 169249 169286 +37
+ Misses 82811 82778 -33 ☔ View full report in Codecov by Sentry. |
@@ -785,7 +785,7 @@ func (e *Executor) killTableLockHoldersAndAccessors(ctx context.Context, tableNa | |||
} | |||
rs, err := conn.Conn.ExecuteFetch(query, -1, true) | |||
if err != nil { | |||
return err | |||
return vterrors.Wrapf(err, "finding queries potentially operating on table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not make more sense to say ~ "failed to find queries ..." and similar for some others like the one just below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is later wrapped in:
Which prepends "failed killing table lock holders and accessors".
@@ -896,7 +896,10 @@ func (e *Executor) cutOverVReplMigration(ctx context.Context, s *VReplStream, sh | |||
defer cancel() | |||
// Wait for target to reach the up-to-date pos | |||
if err := tmClient.VReplicationWaitForPos(ctx, tablet.Tablet, s.id, replication.EncodePosition(pos)); err != nil { | |||
return err | |||
if s, _ := e.readVReplStream(ctx, s.workflow, true); s != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also in the error message from the VReplication Engine FWIW.
Related: #17264 |
Description
This PR adds more wrapping/clarifications to the
error
returned bycutOverVReplMigration()
. The function already logs the required information. However, the error returned by the function is also captured invitess_migrations.message
column, and right nwo it is not informative enough. For example, it might read "due to context timeout" which isn't helpful if you don't know what specific operation was timing out. Again, the logs have the information, but it will be so much nicer to have more clarity in themessage
column.Related Issue(s)
#17264
Checklist
Deployment Notes