-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(vmm): call KVMCLOCK_CTRL when pausing #4460
Conversation
ab32974
to
8fbd685
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4460 +/- ##
=======================================
Coverage 84.07% 84.08%
=======================================
Files 251 251
Lines 28052 28060 +8
=======================================
+ Hits 23586 23594 +8
Misses 4466 4466
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
3294d86
to
a8ad920
Compare
a8ad920
to
da947f9
Compare
I created a test that should trigger the guest kernel to detect lockups. I will let the CI run once, so we can make sure that the test works on our CI and then reapply the commits that set the KVM_KVMCLOCK_CTRL bit when we pause vCPUs. These should make the failure go away. |
The test does trigger the lockup: https://buildkite.com/firecracker/firecracker-pr/builds/11546. |
ea28a13
to
18b2f34
Compare
I've pushed again Nikita's commits that add the call the |
1d6683a
to
886eb2e
Compare
The last commit is not related with PR per se. However, it fixes an intermittent issue in the CI which I was hitting in this PR's pipelines runs and I was too lazy to open a separate PR. |
Launch a script in the guest that continuously calls `ls -R /` and on the host side, continuously pause and resume the microVM trying to cause an RCU soft lockup. Signed-off-by: Babis Chalios <[email protected]>
This is to avoid guest kernel panic on resume path due to softlockup detection. Signed-off-by: Nikita Kalyazin <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
This is to be able to call KVMCLOCK_CTRL ioctl in a vCPU thread. Signed-off-by: Nikita Kalyazin <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
Add a metric that counts KVM_KVMCLOCK_CTRL ioctl call failures. These failures might happen because the guest simply doesn't use the KVM clock. In these cases, the metric will increase expectedly. Otherwise, non-zero values will indicate something actually going wrong. Signed-off-by: Babis Chalios <[email protected]>
Mention that we now call KVM_KVMCLOCK_CTRL ioctl on x86_64 after pausing vCPUs. Clarify that failures to call this ioctl are not fatal and that we log the failure and increase a metric when these happen. Signed-off-by: Babis Chalios <[email protected]>
`pidfd_open` will fail if there is not a process with the requested PID. According to `man pidfd_open(2)`, it will return EINVAL when `PID` is not valid and `ESRCH` when the `PID` does not exist. Right now, we were checking only for the latter condition. Change the logic to also care for the former, which materializes as an OSError exception with errno == EINVAL. Signed-off-by: Babis Chalios <[email protected]>
886eb2e
to
bc6417c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I can't approve the change, apparently because the PR originated from me, but it does LGTM.
Changes
Call
KVM_KVMCLOCK_CTRL
when pausing vCPUs. This allows guest kernel's soft lockup watchdog to know that it was the hypervisor that paused the vCPUs and don't trigger an exception.Fixes: #1859
Reason
This is to avoid guest kernel panic on resume path due to soft lockup detection.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
PR.
[ ] API changes follow the Runbook for Firecracker API changes.CHANGELOG.md
.[ ] NewTODO
s link to an issue.contribution quality standards.
rust-vmm
.