-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix timeout when stopping KVM machine with CRI-O container runtime #19758
base: master
Are you sure you want to change the base?
Conversation
/ok-to-test |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Hi @prezha, following up on this according to cri-o/cri-o#8609 (comment), this was fixed in crun 1.17. We've updated to crun 1.17 in #19640, is this PR still needed? |
hey @spowelljr, i think we do as we've occasionally seen these timeouts even before (for reasons related to other/different issues), so it would also reduce the flakiness of our integration tests |
76887a9
to
043f561
Compare
pkg/drivers/kvm/kvm.go
Outdated
} | ||
|
||
if d.ExtraDisks > 20 { | ||
// Limiting the number of disks to 20 arbitrarily. If more disks are | ||
// needed, the logical name generation has to changed to create them if | ||
// the form hdaa, hdab, etc | ||
return errors.Wrap(err, "cannot create more than 20 extra disks") | ||
return errors.New("creating more than 20 extra disks") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The before message is a better user facing error, cannot create
vs creating
, although we should have validation for this way before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, reworded it
Co-authored-by: Steven Powell <[email protected]>
Co-authored-by: Steven Powell <[email protected]>
This comment has been minimized.
This comment has been minimized.
kvm2 driver with docker runtime
Times for minikube ingress: 16.6s 19.1s 19.6s 19.5s 17.1s Times for minikube start: 52.2s 53.5s 49.5s 52.9s 54.3s docker driver with docker runtime
Times for minikube start: 24.7s 24.4s 21.6s 22.4s 25.0s Times for minikube ingress: 13.8s 12.3s 12.9s 13.3s 12.8s docker driver with containerd runtime
Times for minikube start: 23.0s 21.2s 23.9s 22.8s 24.6s Times for minikube ingress: 22.8s 31.8s 27.8s 22.8s 38.8s |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a typo, but the tests look good!
} | ||
|
||
return fmt.Errorf("unable to stop vm, current state %q", s.String()) | ||
return s, fmt.Errorf("timed out waitig for domain %s, current state is %q", method, s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return s, fmt.Errorf("timed out waitig for domain %s, current state is %q", method, s) | |
return s, fmt.Errorf("timed out waiting for domain %s, current state is %q", method, s) |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: prezha, spowelljr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Here are the number of top 10 failed tests in each environments with lowest flake rate.
Besides the following environments also have failed tests:
To see the flake rates of all tests by environment, click here. |
we noticed in our integration tests that kvm machine with cri-o container runtime will sometime not gracefully shutdown waiting a given time of two minutes (eg: https://storage.googleapis.com/minikube-builds/logs/19736/36442/KVM_Linux_crio.html#fail_TestStartStop%2fgroup%2fembed-certs%2fserial%2fStop)
there is a known upstream issue that is tracking that
as we had similar issues in the past (and increased the timeout value several times), this pr aims to solve that problem by:
in total, this should: self-resolve the issue in many cases, while reducing the total wait period
bonus: