Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VStreams NOT_FOUND Error Retries and Omits Tablet #154

Merged
merged 4 commits into from
Dec 20, 2023

Conversation

makinje16
Copy link
Member

@makinje16 makinje16 commented Nov 6, 2023

Description

Currently, vstreams fail as expected when trying to stream from a tablet that is being replaced for some reason. However, when the NOT_FOUND error is returned to the vstream client it does not omit the restore tablet from being used during the next retry. This can lead to streams being blocked for multiple hours.

The reason this is added to our fork instead of committed upstream is because this has been fixed in future versions of Vitess, but, when bringing that fix in, there were many dependencies that caused issues with the merge. This change will be added in order to unblock CDC until we are more caught up with upstream.

Testing

This change has been in dev over the last several weeks and recently it was noted that the NOT_FOUND errors have not been in over 4 weeks.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

@makinje16 makinje16 requested a review from a team as a code owner November 6, 2023 21:32
@makinje16 makinje16 added the enhancement New feature or request label Nov 7, 2023
@pbibra
Copy link

pbibra commented Nov 7, 2023

original discussion for posterity: https://slack-pde.slack.com/archives/C01P84R7L02/p1695329095557079
decision to not re-try on this error upstream: vitessio#14224 (comment)

@timvaillancourt
Copy link
Member

when trying to stream from a tablet that is still in Restore

@makinje16 / @pbibra: are we seeing VStreams use RESTORE tablets when specifying RDONLY+REPLICA as the allowed tablet types?

@pbibra
Copy link

pbibra commented Nov 7, 2023

when trying to stream from a tablet that is still in Restore

@makinje16 / @pbibra: are we seeing VStreams use RESTORE tablets when specifying RDONLY+REPLICA as the allowed tablet types?

Sorry to clarify, we're not streaming from a tablet that is still in Restore, we'd be trying to stream from a tablet that's being replaced due to an issue. Since the current tablet picker version does not perform the health check before a tablet is chosen for streaming, we run into this issue.

@timvaillancourt
Copy link
Member

Sorry to clarify, we're not streaming from a tablet that is still in Restore, we'd be trying to stream from a tablet that's being replaced due to an issue. Since the current tablet picker version does not perform the health check before a tablet is chosen for streaming, we run into this issue.

Ahh I understand, thanks for clarifying 👍

@makinje16 makinje16 merged commit 01b463b into slack-vitess-r14.0.5 Dec 20, 2023
241 checks passed
@makinje16 makinje16 deleted the vstream-not-found-patch branch December 20, 2023 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants