-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VStreams NOT_FOUND Error Retries and Omits Tablet #154
Conversation
original discussion for posterity: https://slack-pde.slack.com/archives/C01P84R7L02/p1695329095557079 |
@makinje16 / @pbibra: are we seeing VStreams use |
Sorry to clarify, we're not streaming from a tablet that is still in Restore, we'd be trying to stream from a tablet that's being replaced due to an issue. Since the current tablet picker version does not perform the health check before a tablet is chosen for streaming, we run into this issue. |
Ahh I understand, thanks for clarifying 👍 |
Description
Currently, vstreams fail as expected when trying to stream from a tablet that is being replaced for some reason. However, when the
NOT_FOUND
error is returned to the vstream client it does not omit the restore tablet from being used during the next retry. This can lead to streams being blocked for multiple hours.The reason this is added to our fork instead of committed upstream is because this has been fixed in future versions of Vitess, but, when bringing that fix in, there were many dependencies that caused issues with the merge. This change will be added in order to unblock CDC until we are more caught up with upstream.
Testing
This change has been in dev over the last several weeks and recently it was noted that the
NOT_FOUND
errors have not been in over 4 weeks.Related Issue(s)
Checklist
Deployment Notes