Skip to content

Commit

Permalink
feat: always recreate VM Pod on restart
Browse files Browse the repository at this point in the history
- Use QMP to set action and monitor events
- Add monitor daemon to catch reset/shutdown reason
- Retrieve reset/shutdown from termination log
- Add watcher for VM's Pods
- Refactor: move power state methods to own package

Signed-off-by: Ivan Mikheykin <[email protected]>
  • Loading branch information
diafour committed Feb 13, 2024
1 parent 7bf5e68 commit 717260e
Show file tree
Hide file tree
Showing 11 changed files with 437 additions and 103 deletions.
43 changes: 43 additions & 0 deletions docs/internal/vm_power_state.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# VM power state

## Reboot differences with kubevirt

Kubevirt has 2 types of reboot:
1. In-Pod reboot: restart VM without exiting from Qemu process.
2. External reboot: delete Kubevirt VirtualMachineInstance and create a new one.

Deckhouse Virtualization promote the idea that reboot issued from inside the VM
is equal to reboot issued externally, e.g. with VirtualMachineOperation.

The only possible restart in Deckhouse Virtualization is to delete VirtualMachineInstance
and create a new one with all possible changes made to VirtualMachine spec.

In-Pod reboot is disabled with some additions to virt-launcher image:
1. Qemu event handler on_restart is set to shutdown to exit from qemu process when reboot is issued.
2. Monitor qemu SHUTDOWN events and write them to /dev/termination-log to catch them later and
distinguish between guest-rest and guest-shutdown.
These changes are made in images/virt-launcher/scripts/domain-monitor.sh.

## A relationship between runPolicy and runStrategy

Deckhouse Virtualization has 4 run policies:

- AlwaysOff - The system is asked to ensure that no VM is running. This is achieved by stopping
any VirtualMachineInstance that is associated ith the VM. If a guest is already running,
it will be stopped.
- AlwaysOn - VM will start immediately after the stop. A stopped VM is scheduled to start when runPolicy changed to AlwaysOn.
- Manual - The system will not automatically turn the VM on or off, instead the user manually controls the VM status by creating VirtualMachineOperation or by issuing reboot or poweroff commands inside the VM.
- AlwaysOnUntilStoppedManually - Similar to Always, except that the VM is only restarted if it terminated
in an uncontrolled way (e.g. crash) and due to an infrastructure reason (i.e. the node crashed,
the KVM related process OOMed). This allows a user to determine when the VM should be shut down by
initiating the shut down inside the guest or creating a VirtualMachineOperation.
Note: Guest sided crashes (i.e. BSOD) are not covered by this. In such cases liveness checks or the use of a watchdog can help.

AlwaysOff policy is implemented with kubevirt's `runStrategy: Halted`.

AlwaysOn policy is implemented with kubevirt's `runStrategy: Always`

Manual policy is implemented with kubevirt's `runStrategy: Manual` with addition of VM start on guest-reset event.

AlwaysOnUntilStoppedManually policy is implemented with kubevirt's `runStrategy: Manual` with addition of VM start on guest-reset event and stoping VM on failures.

22 changes: 22 additions & 0 deletions images/virt-launcher/scripts/domain-monitor.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash

set -eo pipefail

# Wait for qemu-kvm process
vmName=
while true ; do
vmName=$(virsh list --name || true)
if [[ -n $vmName ]]; then
break
fi
sleep 1
done

# Set action as libvirt will do for <on_restart>destroy</on_restart>.
echo "Set reboot action to shutdown for domain $vmName"
virsh qemu-monitor-command $vmName '{"execute": "set-action", "arguments":{"reboot":"shutdown"}}'


# Redirect events to termination logs
echo "Monitor domain $vmName events"
virsh qemu-monitor-event --domain $vmName --loop --event SHUTDOWN > /dev/termination-log
16 changes: 16 additions & 0 deletions images/virt-launcher/scripts/virt-launcher-monitor-wrapper.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash

# virt-launcher-monitor execution interceptor:
# - Run qemu customizer as a child process.
# - Exec virt-launcher-monitor in-place to start usual virt-launcher.

echo '{"msg":"Start domain monitor daemon", "level":"info","component":"virt-launcher-monitor-wrapper"}'
nohup bash /scripts/domain-monitor.sh & 2>&1 > /var/log/domain-monitor-daemon.log

# Pass all arguments to the original virt-launcher-monitor.
if [[ ! -f /usr/bin/virt-launcher-monitor-orig ]]; then
echo '{"msg":"Target /usr/bin/virt-launcher-monitor-orig is absent", "level":"info","component":"virt-launcher-monitor-wrapper"}'
exit 1
fi
echo '{"msg":"Exec original virt-launcher-monitor", "level":"info","component":"virt-launcher-monitor-wrapper"}'
exec /usr/bin/virt-launcher-monitor-orig "$@"
14 changes: 14 additions & 0 deletions images/virt-launcher/werf.inc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,20 @@ import:
- 'sys'
to: /
before: setup
git:
- add: /images/{{ $.ImageName }}
to: /
stageDependencies:
setup:
- '**/*'
includePaths:
- scripts
shell:
setup:
# Replace virt-launcher-monitor with script.
- mv /usr/bin/virt-launcher-monitor /usr/bin/virt-launcher-monitor-orig
- cp /scripts/virt-launcher-monitor-wrapper.sh /usr/bin/virt-launcher-monitor
- chmod +x /usr/bin/virt-launcher-monitor
# Source https://github.com/kubevirt/containerized-data-importer/blob/main/cmd/cdi-apiserver/BUILD.bazel
docker:
ENTRYPOINT: ["/usr/bin/virt-launcher"]
86 changes: 0 additions & 86 deletions images/virtualization-controller/pkg/common/kvvm/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,12 @@ import (
"fmt"

corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/equality"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/types"
virtv1 "kubevirt.io/api/core/v1"
"sigs.k8s.io/controller-runtime/pkg/client"

"github.com/deckhouse/virtualization-controller/pkg/common/patch"
"github.com/deckhouse/virtualization-controller/pkg/sdk/framework/helper"
"github.com/deckhouse/virtualization-controller/pkg/util"
)

// PatchRunStrategy returns JSON merge patch to set 'runStrategy' field to the desired value
Expand Down Expand Up @@ -77,86 +74,3 @@ func DeletePodByKVVMI(ctx context.Context, cli client.Client, kvvmi *virtv1.Virt
}
return helper.DeleteObject(ctx, cli, pod, opts)
}

// GetChangeRequest returns the stop/start patch.
func GetChangeRequest(vm *virtv1.VirtualMachine, changes ...virtv1.VirtualMachineStateChangeRequest) ([]byte, error) {
jp := patch.NewJsonPatch()
verb := patch.PatchAddOp
// Special case: if there's no status field at all, add one.
newStatus := virtv1.VirtualMachineStatus{}
if equality.Semantic.DeepEqual(vm.Status, newStatus) {
newStatus.StateChangeRequests = changes
jp.Append(patch.NewJsonPatchOperation(verb, "/status", newStatus))
} else {
failOnConflict := true
if len(changes) == 1 && changes[0].Action == virtv1.StopRequest {
// If this is a stopRequest, replace all existing StateChangeRequests.
failOnConflict = false
}
if len(vm.Status.StateChangeRequests) != 0 {
if failOnConflict {
return nil, fmt.Errorf("unable to complete request: stop/start already underway")
} else {
verb = patch.PatchReplaceOp
}
}
jp.Append(patch.NewJsonPatchOperation(verb, "/status/stateChangeRequests", changes))
}
if vm.Status.StartFailure != nil {
jp.Append(patch.NewJsonPatchOperation(patch.PatchRemoveOp, "/status/startFailure", nil))
}
return jp.Bytes()
}

// StartKVVM starts kvvm.
func StartKVVM(ctx context.Context, cli client.Client, kvvm *virtv1.VirtualMachine) error {
if kvvm == nil {
return fmt.Errorf("kvvm must not be empty")
}
jp, err := GetChangeRequest(kvvm,
virtv1.VirtualMachineStateChangeRequest{Action: virtv1.StartRequest})
if err != nil {
return err
}
return cli.Status().Patch(ctx, kvvm, client.RawPatch(types.JSONPatchType, jp), &client.SubResourcePatchOptions{})
}

// StopKVVM stops kvvm.
func StopKVVM(ctx context.Context, cli client.Client, kvvmi *virtv1.VirtualMachineInstance, force bool) error {
if kvvmi == nil {
return fmt.Errorf("kvvmi must not be empty")
}
if err := cli.Delete(ctx, kvvmi, &client.DeleteOptions{}); err != nil {
return err
}
if force {
return DeletePodByKVVMI(ctx, cli, kvvmi, &client.DeleteOptions{GracePeriodSeconds: util.GetPointer(int64(0))})
}
return nil
}

// RestartKVVM restarts kvvm.
func RestartKVVM(ctx context.Context, cli client.Client, kvvm *virtv1.VirtualMachine, kvvmi *virtv1.VirtualMachineInstance, force bool) error {
if kvvm == nil {
return fmt.Errorf("kvvm must not be empty")
}
if kvvmi == nil {
return fmt.Errorf("kvvmi must not be empty")
}

jp, err := GetChangeRequest(kvvm,
virtv1.VirtualMachineStateChangeRequest{Action: virtv1.StopRequest, UID: &kvvmi.UID},
virtv1.VirtualMachineStateChangeRequest{Action: virtv1.StartRequest})
if err != nil {
return err
}

err = cli.Status().Patch(ctx, kvvm, client.RawPatch(types.JSONPatchType, jp), &client.SubResourcePatchOptions{})
if err != nil {
return err
}
if force {
return DeletePodByKVVMI(ctx, cli, kvvmi, &client.DeleteOptions{GracePeriodSeconds: util.GetPointer(int64(0))})
}
return nil
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
package powerstate

import (
"fmt"

"github.com/deckhouse/virtualization-controller/pkg/common/patch"
"k8s.io/apimachinery/pkg/api/equality"
kvv1 "kubevirt.io/api/core/v1"
)

// BuildPatch creates a patch to request VM state changing via updating KVVM status.
//
// Some combinations lead to an error to not interfere with kvvm controller:
//
// current / desired stop start restart(stop+start)
// stop replace error error
// start replace error error
// restart(stop+start) replace error error
// empty add add add
func BuildPatch(vm *kvv1.VirtualMachine, changes ...kvv1.VirtualMachineStateChangeRequest) ([]byte, error) {
jp := patch.NewJsonPatch()
// Special case: if there's no status field at all, add one.
newStatus := kvv1.VirtualMachineStatus{}
if equality.Semantic.DeepEqual(vm.Status, newStatus) {
newStatus.StateChangeRequests = changes
jp.Append(patch.NewJsonPatchOperation(patch.PatchAddOp, "/status", newStatus))
} else {
verb := patch.PatchAddOp
failOnConflict := true
if len(changes) == 1 && changes[0].Action == kvv1.StopRequest {
// If this is a stopRequest, replace all existing StateChangeRequests.
failOnConflict = false
}
if len(vm.Status.StateChangeRequests) != 0 {
if failOnConflict {
return nil, fmt.Errorf("unable to complete request: stop/start already underway")
} else {
verb = patch.PatchReplaceOp
}
}
jp.Append(patch.NewJsonPatchOperation(verb, "/status/stateChangeRequests", changes))
}
if vm.Status.StartFailure != nil {
jp.Append(patch.NewJsonPatchOperation(patch.PatchRemoveOp, "/status/startFailure", nil))
}
return jp.Bytes()
}

// BuildPatchSafeRestart creates a patch to restart a VM in case no other operations are present.
// This method respects other operations that was issued during VM reboot.
func BuildPatchSafeRestart(kvvm *kvv1.VirtualMachine, kvvmi *kvv1.VirtualMachineInstance) ([]byte, error) {
// Restart only if current request is empty.
if len(kvvm.Status.StateChangeRequests) > 0 {
return nil, nil
}
restartRequest := []kvv1.VirtualMachineStateChangeRequest{
{Action: kvv1.StopRequest, UID: &kvvmi.UID},
{Action: kvv1.StartRequest},
}
jp := patch.NewJsonPatch()

newStatus := kvv1.VirtualMachineStatus{}
if equality.Semantic.DeepEqual(kvvm.Status, newStatus) {
// Add /status if it's not exists.
newStatus.StateChangeRequests = restartRequest
jp.Append(patch.NewJsonPatchOperation(patch.PatchAddOp, "/status", newStatus))
} else {
// Set stateChangeRequests.
jp.Append(patch.NewJsonPatchOperation(patch.PatchAddOp, "/status/stateChangeRequests", restartRequest))
}
return jp.Bytes()
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
package powerstate

import (
"context"
"fmt"

kvvmutil "github.com/deckhouse/virtualization-controller/pkg/common/kvvm"
"github.com/deckhouse/virtualization-controller/pkg/util"
"k8s.io/apimachinery/pkg/types"
kvv1 "kubevirt.io/api/core/v1"
"sigs.k8s.io/controller-runtime/pkg/client"
)

// StartVM starts VM via adding change request to the KVVM status.
func StartVM(ctx context.Context, cl client.Client, kvvm *kvv1.VirtualMachine) error {
if kvvm == nil {
return fmt.Errorf("kvvm must not be empty")
}
jp, err := BuildPatch(kvvm,
kvv1.VirtualMachineStateChangeRequest{Action: kvv1.StartRequest})
if err != nil {
return err
}
return cl.Status().Patch(ctx, kvvm, client.RawPatch(types.JSONPatchType, jp), &client.SubResourcePatchOptions{})
}

// StopVM stops VM via deleting kvvmi.
// It implements force stop by immediately deleting VM's Pod.
func StopVM(ctx context.Context, cl client.Client, kvvmi *kvv1.VirtualMachineInstance, force bool) error {
if kvvmi == nil {
return fmt.Errorf("kvvmi must not be empty")
}
if err := cl.Delete(ctx, kvvmi, &client.DeleteOptions{}); err != nil {
return err
}
if force {
return kvvmutil.DeletePodByKVVMI(ctx, cl, kvvmi, &client.DeleteOptions{GracePeriodSeconds: util.GetPointer(int64(0))})
}
return nil
}

// RestartVM restarts VM via adding stop and start change requests to the KVVM status.
// It implements force stop by immediately deleting VM's Pod.
func RestartVM(ctx context.Context, cl client.Client, kvvm *kvv1.VirtualMachine, kvvmi *kvv1.VirtualMachineInstance, force bool) error {
if kvvm == nil {
return fmt.Errorf("kvvm must not be empty")
}
if kvvmi == nil {
return fmt.Errorf("kvvmi must not be empty")
}

jp, err := BuildPatch(kvvm,
kvv1.VirtualMachineStateChangeRequest{Action: kvv1.StopRequest, UID: &kvvmi.UID},
kvv1.VirtualMachineStateChangeRequest{Action: kvv1.StartRequest})
if err != nil {
return err
}

err = cl.Status().Patch(ctx, kvvm, client.RawPatch(types.JSONPatchType, jp), &client.SubResourcePatchOptions{})
if err != nil {
return err
}
if force {
return kvvmutil.DeletePodByKVVMI(ctx, cl, kvvmi, &client.DeleteOptions{GracePeriodSeconds: util.GetPointer(int64(0))})
}
return nil
}

// SafeRestartVM restarts VM via adding stop and start change requests to the KVVM status if no other requests are in progress.
func SafeRestartVM(ctx context.Context, cl client.Client, kvvm *kvv1.VirtualMachine, kvvmi *kvv1.VirtualMachineInstance) error {
if kvvm == nil {
return fmt.Errorf("kvvm must not be empty")
}
if kvvmi == nil {
return fmt.Errorf("kvvmi must not be empty")
}

jp, err := BuildPatchSafeRestart(kvvm, kvvmi)
if err != nil {
return err
}
if jp == nil {
return nil
}
return cl.Status().Patch(ctx, kvvm, client.RawPatch(types.JSONPatchType, jp), &client.SubResourcePatchOptions{})
}
Loading

0 comments on commit 717260e

Please sign in to comment.