-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a retry to remove the vttablet directory during upgrade/downgrade backup tests #14753
Conversation
Signed-off-by: Florent Poinsard <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Florent Poinsard <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Only had a small nit about quoting the variable when used (it could potentially have whitespace). FWIW, I also think it's more descriptive to say that we're adding a retry rather than a timeout.
I remember running into this same Directory not empty
error when dealing some other backup related test flakiness some time ago. I think it might have been here: https://github.com/vitessio/vitess/pull/11352/files#diff-4f8ae786620a21df8ced1435fc0318b8abba570dfea4c2177b1ecb725b8c5073L236-R244
Thank you for working on this!
examples/backups/stop_tablets.sh
Outdated
|
||
if grep -q 'Directory not empty' $temp_file; then | ||
echo "Directory not empty, retrying..." | ||
elif [ ! -s $temp_file ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should quote the $temp_file
variable everywhere to be safe (can run shellcheck on the file too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed via 8841067
I will go ahead and fix it, thanks!
Got it! I modified both title and description.
Nice, thank you! This is useful. |
Signed-off-by: Florent Poinsard <[email protected]>
… backup tests (#14753) Signed-off-by: Florent Poinsard <[email protected]>
… backup tests (#14753) Signed-off-by: Florent Poinsard <[email protected]>
…grade/downgrade backup tests (#14753) (#14757) Co-authored-by: Florent Poinsard <[email protected]>
…grade/downgrade backup tests (#14753) (#14756) Signed-off-by: Florent Poinsard <[email protected]> Co-authored-by: Florent Poinsard <[email protected]>
Description
This PR fixes a small issue during the manual upgrade/downgrade backup tests. The issue happens when for some reason the deletion of the VTTablet directory fails due to used files, leading to an error code
39
fromrm -Rf
. This issue can be seen as follows in the logs of theStop tablets
step:This silent error leads to the failure of step
Start new tablet and restore
. Since the VTTablet's directory was partially removed, VTTablet thinks it has to start from an existing data dir (which is not what we want). And because the VTTablet directory has been mostly emptied already, themy.cnf
file does not exist, leading to a fatal error (also visible in the logs):After some local tests, it seems like adding a sleep before the
rm -Rf
is sufficient. I then decided to add a 30 second-long retry loop where we attempt to remove the VTTablet directory every second.Moreover, the list of files to watch in the backups manual workflows have changed to include the
examples/**
path as we use it in our tests.