-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questioning our use of cgroups #439
Comments
Another question to ask is about the ease of use. |
I think that in any case we need to have a common way to configure RT capabilities to VM. Best, |
slices/cgroups (cpuset actually), are the recommended way to do cpu isolation (isolcpus is deprecated), so I think it's nice SEAPATH is already proposing something with cpusets. Anyway, all those configurations are optional, so in the end I feel we can't really choose (because a seapath user may need both), but it's not really an issue because we don't have to. |
I think you have missed the issue @insatomcat.
We can pin some processes tweaking the libvirt XML (vcpupin, emulatorpin and iothreadpin) but we can pin it only in the cgroup cpuset and not all processes can be pinned. The unpinned processes are free to run on all CPUs inside the cpuset even pinned CPUs. It is usually not an issue, but if you have KVM RT task pinned on all available CPUs all other none RT tasks will never be scheduled and the VM will never boot. So to avoid this in our implementation, we have to reserve an extra CPU core only for these processes. There are two ways to solve that, remove all cpuset and use isolcpus domaine or keep VM in the machine slice, remove vcpupin in the xml and create a qemu hook to change the slice of KVM thread to machine-rt slice and apply pinning and RT priority. Regarding the isolcpus deprecated flags, it is just the recommended way which has been changed. I don't know if the kernel preempt RT patch modify something in this part. @eroussy if you do not want to use cpuset just do not set it inside the Ansible inventory and add the isolcpus domain kernel parameter. |
I do not notice this on my setup. Of course I use isolcpus since this is still something done by SEAPATH. I have a RT VM with 2 vcpu:
And if I ignore the core for "emulation" this is what I see on the 2 dedicated cores:
Basically nothing besides the vcpu process and bounded kthreads... |
Here you are only looking the processes running exactly on the two cores you choose for the RT VM. For example, on my setup :
And the processes on cores 4 to 7 (I display only one part of it) :
The question is : (These questions are also related to the issue #438 |
@insatomcat you didn't notice because we have a large cpuset range. Reduce your machine-rt cpuset to have a number of CPUs cores equals to the number of your virtual CPUs. |
I have a slice with cpuset 2-6,14-18 and a GUEST with 4 vcpus:
If I look at all the cores, then I see the processes you are talking about, but all on core 16, which is the emulatorpin chosen core:
What's the emulatorpin setting in your setup? |
What we want for emulatorpin is to use the no-rt cpuset (0-1,7-13,19-N in your case) to avoid reserve and loose a core for it. |
I don't think you can do that while at the same time asking libvirt to use the machine-rt slice for the same guest |
It should be possible using qemu hook, but to me, we should just mention it in the documentation. I suggest indicating in the documentation:
|
--> total of number of "isolated" vcpus + 1 You don't have to isolate all the vcpu. You may even need a "non isolated" vcpu for the housekeeping inside the vm. |
I don't think it is a good idea to propose both. Personally,
The only argument I see (for now) in favor of using cgroups is that it prevents from hardware processor attacks like meltdown and spectre. |
Hi all, We need to close this question.
So, regarding the work to do :
@insatomcat @dupremathieu what do you think of that ? Did I miss something ? |
I'm ok with all that. |
Great. Maybe @eroussy it worth documenting it on LFEnergy Wiki ? |
The topic is now covered in this wiki page : https://wiki.lfenergy.org/display/SEAP/Scheduling+and+priorization |
Context
There is currently two ways to handle the CPUs on which a VM has access :
isolated
VM feature in the inventory : This feature will pin the KVM threads running the VM's vCPUs on the CPUs described in thecpuset
list.This feature only pins the KVM threads and not the qemu thread responsible for managing the VM.
machine-rt
ormachine-nort
slice. These cgroups are configured during Ansible setup. They come with allowed CPUs defined in the variablescpumachinesrt
andcpumachinesnort
Ansible variables.Both KVM and qemu threads of the VM will execute on the allowed CPUs
These two configurations have the same purpose but not the same philosophy. They duplicate a feature and doesn't interact easily with each others.
Plus, the cgroup configuration is only on Debian for now.
We have to clarify the isolation feature we want on SEAPATH.
Concerns regarding cgroups
I see two problems with these cgroups today :
qemu-system-x86
) are run on the allowed CPUs of the slice. We may want to run only the KVM threads here.The second point can cause a problem, for example :
machine-rt.slice
In that case, the two RT KVM thread will prevent qemu to execute, and the VM will never boot.
In that case, we have given the slice exactly the number of CPUs we wanted (here : 2) and it will not work.
We should give 3 CPUs to
machine-rt.slice
in order to make it work.Isolation of non-RT VMs
The use of the
machine-nort
cgroup allows isolating threads of non-RT VMs.Is this relevant to isolate them if we do not have special RT needs. Wouldn't it be better to let the Linux scheduler handle these VMs on the system's CPUs ?
We now need to choose the isolation method we want and use it on both on Debian and Yocto versions.
I leave this question open, feel free to add your remarks below.
The text was updated successfully, but these errors were encountered: