Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM endless boot when pinning to the first allowed CPU of the machine-rt slice. #438

Open
eroussy opened this issue Mar 20, 2024 · 1 comment
Labels
bug Something isn't working Debian

Comments

@eroussy
Copy link
Member

eroussy commented Mar 20, 2024

Describe the bug
When deploying an RT and isolated VM, if the core chosen to isolate the VM is the first of the machine-rt slice, the VM will never boot.
The associated qemu-system-x86 thread will take 100% of one CPU forever.

To Reproduce

  • Deploy a Debian SEAPATH machine (standalone or cluster)
  • Configure the machine-rt and machine-nort allowed CPUs (See my configuration below)
  • Deploy an RT and isolated machine using the first allowed CPU of the machine-rt slice. In my case, I used the two firsts.
  • Try to access the machine with virsh console rtVM
  • Nothing appears on the console

Allowed CPUs in my Ansible inventory :

isolcpus: "2-7" # CPUs to isolate (isolcpus, irqbalance on debian 12)
workqueuemask: "0003" #workqueue mask, here it mean 0 and 1 are the only allowed cpus
cpusystem: "0-1" # CPUs reserves for system
cpuuser: "0-1" # CPUs reserves for user applications
cpumachines: "2-7" # CPUs reserves for VMs
cpumachinesrt: "4-7" # CPUs reserves for VMs realtime
cpumachinesnort: "2-3" # CPUs reserves for VMs non realtime
cpuovs: "0-1" # CPUs reserves for OVS

My RT VM inventory

all:
  children:
    VMs:
      hosts:
        rtVM:
          ansible_host: 192.168.216.24
          vm_template: "../templates/vm/guest.xml.j2"
          vm_disk: "../vm_images/guest.qcow2"
          vm_features: ["rt", "isolated"]
          cpuset: [4, 5]
          bridges:
            - name: "br0"
              mac_address: "52:54:00:e4:ff:03"

Expected behavior
The VM must boot.
The qemu-system-x86 will take 100% of one CPU, but just for a few seconds.

Additional context

On the hypervisor:

root@seapath:/home/virtu# ps -eTo comm,tid,pid,cls,pri,psr | grep -iE "qemu|kvm"
qemu-event       159384  158373  TS  19   0
qemu-system-x86  158603  158603  TS  19   4
CPU 0/KVM        158651  158603  FF  41   4
CPU 1/KVM        158652  158603  FF  41   5
kvm              158626  158626  TS  39   4
kvm-nx-lpage-re  158627  158627  TS  19   4
kvm-pit/158603   158654  158654  TS  19   4

The qemu-system-x86 thread responsible to manage the VM is always running on the first allowed CPU (here the 4th).
The VM's vCPU is also pinned on this CPU.
I think the two threads will interrupt each other and prevent the VM to boot.

Also, first lines of the top command on the hypervisor:

top - 15:17:15 up 17 min,  2 users,  load average: 10.32, 7.45, 4.92
Tasks: 542 total,   2 running, 540 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.6 us,  5.3 sy,  0.0 ni, 89.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  63624.4 total,  59392.4 free,   9856.9 used,   1700.3 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.  53767.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  28143 libvirt+  20   0 3703876 441004  41664 S 100.0   0.7   1:27.42 qemu-system-x86
    154 root     -11   0       0      0      0 S   6.2   0.0   0:00.69 rcuc/13
   1763 ceph      20   0 1217808 300224  36096 S   6.2   0.5   0:04.02 ceph-mgr
   3160 haclust+  20   0   81552  25628  15644 S   6.2   0.0   0:00.60 pacemaker-based
  31122 root      20   0   11640   5376   3264 R   6.2   0.0   0:00.02 top
      1 root      20   0  169984  13788   8796 S   0.0   0.0   0:05.89 systemd

The qemu-system-x86 thread is taking 100% of the CPU.

@eroussy
Copy link
Member Author

eroussy commented Apr 4, 2024

Here are my investigations so far:

Management threads affinity

rtVM vCPUs are running with rt priority on cores 4 and 5.
By taking a look at what's also running on these cores, I find:

root@seapath:~# ps -eTo comm,tid,pid,cls,pri,%cpu,psr  | grep "[4,5]$"
[..] (Linux core management threads)
kworker/3:3-eve  195349  195349  TS  19   0.0   5
qemu-system-x86  197242  197242  TS  19  87.3   4
call_rcu         197265  197242  TS  19   0.0   4
worker           197266  197242  TS  19   0.0   4
vhost-197242     197268  197242  TS  19   0.0   4
IO mon_iothread  197269  197242  TS  19   0.2   4
CPU 0/KVM        197270  197242  FF  41  87.1   4
CPU 1/KVM        197271  197242  FF  41   0.0   5
worker           197398  197242  FF  41   0.0   4
kvm-nx-lpage-re  197267  197267  TS  19   0.0   4

The qemu-system-x86 core is taking too much CPU, so why does it not move to another core ?

root@seapath:~# taskset -cp 197242 #qemu-system
pid 197242's current affinity list: 4-7

The affinity list allows it to move, so why does the scheduler not put it on another core ?
I don't know if it's a libvirt bug or a SEAPATH configuration problem.

Workaround

We can control the management thread of the VM with emulatorpin in libvirt.
This can be done either in the xml :

<emulatorpin cpuset='6,7'/>

Or directly on the target with the command virsh emulatorpin rtVM 6,7

Both of these commands technically solve the problem:

  • rtVM vCPUs run on cores 4 and 5
  • management threads run on cores 6 and 7

But it shouldn't be mandatory to specify this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Debian
Projects
Status: Todo
Development

No branches or pull requests

1 participant