Fix v18 tablets removed from healthcheck when topo server GetTablet call fails #201
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the v18 fix for this bug: vitessio#15632
The code for getting tablets for healthchecks was significantly altered between v18 and v19, so even though the bug is similar, the fix is different.
Description
In v18,
TopologyWatcher.loadTablets()
roughly works like this:newTablets
mapnewTablets
newTablets
:tw.tablets
(the current set of tracked tablets)tw.tablets
(the current set of tracked tablets)newTablets
So the bug in v18 is that if a GetTablet call fails, the tablet is not added to
newTablets
, and thus removed from the healthcheck when iteratingtw.tablets
.Changes
TopologyWatcher.loadTablets()
to handle GetTablet errors by backfillingnewTablets
with the current value fromtw.tablets
. This is the same approach used in the v19+ fix.TestGetTabletErrorDoesNotRemoveFromHealthcheck
test from the v19+ fixTestGetTabletNoNodeErrorRemovesFromHealthcheck
to cover the case where a tablet is removed from the topo server bewteen the call togetTablets
and the call toGetTablet
for that alias. That scenario is handled differently in v19+ so I added explicit handling for it in this fix.