-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qubes 4 Admin API is way too inefficient #3293
Comments
Well, But on the Admin API level we prefer to keep it at the current granularity level, even if that means lower performance. |
I have managed to get qvm-ls down to around 200ms on a system with 100 defined VMs. Going lower than 100-200ms is probably impossible without rewriting both client and qubesd in an efficient language like Rust, which is not really worth the effort at least at the moment. It does so by:
Regarding the qrexec policy issue, I'd suggest using qrexec just to allow/disallow the whole admin API for specific VMs, and then use permission logic inside qubesd for any finer control, specifically hiding VMs that users should not be able to see (there are already events that can filter VMs and properties). This also allows to do things like dynamically define users that own VMs and have them be able to control only their own VMs automatically, which would be impossible to do easily in qrexec policy. It's also possible to load the qrexec policy in qubesd and apply it to subtasks of requests, although this doesn't seem worth the effort since it wouldn't support well many use cases that require a dynamic security policy anyway. At any rate, I don't think taking several seconds just for something like qvm-ls is a reasonable design for a production-ready system, so some sort of solution seems needed. Preliminary pull requests: Of course, in addition to qvm-ls, the changes make Qubes faster as a whole since they optimize several code paths used all over the place (such as the libvirt status queries and property code). |
BTW, current qvm-ls performance with my patch set with ~100 VMs on a ~3GHz Intel machine (running it in dom0 with default configuration): If one really wants to preserve qrexec policy at VM granularity, a possible long-term solution could be to add the GetAll calls to get all properties for a single VM or Qubes but not GetAllData, and then replace qvm-ls with a dbus-based tool. In the meantime, one could add GetAllData, and have the client automatically switch to GetAll if it's not available or denied, and in turn switch to Get if GetAll is not available (I changed my patches to do this). The GetAllData can then be removed or disabled with no ill effects. Even with dbus-based tools, requiring seconds and thousands of qrexec calls (each one of which currently creates at least a process on both client and dom0) just for the dbus service to get the initial value of all properties doesn't seem acceptable in the long term, so I think we want the GetAll call at least. |
Another thing to consider is that admin.vm.List has pretty much the same policy/security implications of the proposed admin.vm.GetAllData (which is exactly like admin.vm.List but also sends all properties for each VM) since in Qubes props aren't particularly sensitive (and the name and state are in fact more sensitive than the props usually). So a way to prevent a domain from seeing all VMs in admin.vm.List is needed anyway for advanced configurations, and that can be applied to admin.vm.GetAllData as well. Policy can't be applied per-property, but I don't think you can do that currently anyway even with just admin.vm.property.Get. Thus, I'm thinking there might be no real downside of having GetAllData? |
I'm not convinced of their non-sensitivity. For the properties expected to be set by a management VM and read by that same management VM, then fine, whatever. However:
and others (potentially unknowable others from 3rd party extensions) might indeed be sensitive depending on the given user's real-world situation.
+1 As much as I like the idea of being able to put all policy in a separate front-layer such that qubesd isn't even reachable unless a particular action is allowed, I too fail to see how such policies could be expressed without increasing the complexity of the policy format. For example, what if you wanted to say "Allow It seems to me we'll have to either increase policy format complexity (even beyond {created,managed}-by-x) (undesirable) or have more policy evaluation also in core-admin (also undesirable), or possibly both? I don't like it to begin with, but if we need to do it for
Actually, it is intended for policy to be able to be applied per-property, (e.g. /etc/qubes-rpc/policy/admin.vm.property.Get+netvm). More examples in https://www.qubes-os.org/news/2017/06/27/qubes-admin-api/ |
Yes, the policy cannot be applied to value being set. This is the trade-off we've made between flexibility and complexity of the policy. If in some use case for example you need to limit what templates can be used by particular Mgmt VM, then you need to write extension handling appropriate |
A couple of thoughts; using the local socket from C++ makes the list of requests pretty fasts. It takes 30ms to get all the info into my app. The catch here is that this is inside dom0, so no qrexec used. Now, adding qrexec seems to do a lot of shell execution on the dom0 side and then the time can skyrocket. Without actually measuring this I can't tell. In essence I do agree with OP that something can be done to make this much smoother. The main issue I see with the current setup is the overhead being too big. This is not much different than what OP concluded, but I disagree that you need new API calls, that doesn't scale either. You'll end up with a million similar but not the same API calls. A GetAllData call, as an example, overlaps functionality with the property getter call. More API calls is not the best idea as its working around the problem, not fixing it. Next week you'll want a GetAllDataExceptNetwork call... So lets focus on the actual overhead instead. What happens today;
This is extremely wasteful since setting up the actual connection, especially via qrexec, is probably taking 90% of the time. What I do in a similar solution is this;
The downside of this approach is that, much like OP suggested, you need to have more permission management inside of qubesd. I don't know if that already is the case, if tags for instance are handled in qubed or in qrexec. This will need to be handled in qubesd. From a security perspective this is a good idea anyway (have security as close to the source as possible to limit the attack surface)... As I did this in a previous project and I ended up designing a tagged streaming dataformat, I will point to that here as you guys may want to use it. spec and bindings for several languages (including python) here. In that solution I wrote a tag with ID zero as a separator. If that tag comes along, you know the question or answer is 'done'. |
Actually the most time is spent on qrexec-policy process, called to evaluate policy at each call. Each time new (python) process is started, libraries imported etc. In extreme situation it can take up to 300ms per call... |
The above comment is about non-dom0 case. |
That indeed confirms my worries.
Or an in-the-middle solution where you have a simple daemon that is pre-loaded and can fork off a handling process. This avoids lots of threads and config-file cashing issues while taking advantage of the the fact that your forked process doesn't need to load python or any libraries. It would be interesting to see how fast that is, and if its fast enough. The solution I described is indeed a longer term refactor, one that would need a separate handshake. It will likely be magnitudes faster than current. |
@DemiMarie can you provide some absolute numbers? For example on my system:
|
Allow getting all the VM properties with one call. This greatly improve performance of an applications retrieving many/all of them (qvm-ls, qubes manager etc) QubesOS/qubes-issues#5415 Fixes QubesOS/qubes-issues#3293
Automated announcement from builder-github The package
|
The socket protocol is adjusted to match qrexec socket service protocol. QubesOS/qubes-issues#3293
Power state changes are signaled with events too, so it is possible to cache it and update/invalidate cache with events. Additionally, admin.vm.List returns a power state, so the cache can be populated early. This in particular greatly improves qvm-ls performance - eliminate admin.vm.CurrentState call at all. QubesOS/qubes-issues#3293
Both tools issue a large number of Admin API calls and greatly benefit from a cache filled with a single per-vm Admin API call (admin.vm.property.GetAll). In case of qvm-ls, this also saves multiple admin.vm.CurrentState calls (power state is given in the admin.vm.List response too). QubesOS/qubes-issues#3293
Remove intermediate qubesd-query-fast proxy process. This requires changing socket protocol to match what qrexec is sending in the header. Fixes QubesOS/qubes-issues#3293
Power state changes are signaled with events too, so it is possible to cache it and update/invalidate cache with events. Additionally, admin.vm.List returns a power state, so the cache can be populated early. This in particular greatly improves qvm-ls performance - eliminate admin.vm.CurrentState call at all. QubesOS/qubes-issues#3293
Both tools issue a large number of Admin API calls and greatly benefit from a cache filled with a single per-vm Admin API call (admin.vm.property.GetAll). In case of qvm-ls, this also saves multiple admin.vm.CurrentState calls (power state is given in the admin.vm.List response too). QubesOS/qubes-issues#3293
Application that runs EventsDispatcher can safely use also cache , which greatly improve performance. This is because cache then is properly updated/invalidated when needed. Instead of modifying each application to explicitly enable cache based on this simple rule, make it implicit when EventsDispatcher is created. QubesOS/qubes-issues#3293
The socket protocol is adjusted to match qrexec socket service protocol. QubesOS/qubes-issues#3293
Power state changes are signaled with events too, so it is possible to cache it and update/invalidate cache with events. Additionally, admin.vm.List returns a power state, so the cache can be populated early. This in particular greatly improves qvm-ls performance - eliminate admin.vm.CurrentState call at all. QubesOS/qubes-issues#3293
Both tools issue a large number of Admin API calls and greatly benefit from a cache filled with a single per-vm Admin API call (admin.vm.property.GetAll). In case of qvm-ls, this also saves multiple admin.vm.CurrentState calls (power state is given in the admin.vm.List response too). QubesOS/qubes-issues#3293
Application that runs EventsDispatcher can safely use also cache , which greatly improve performance. This is because cache then is properly updated/invalidated when needed. Instead of modifying each application to explicitly enable cache based on this simple rule, make it implicit when EventsDispatcher is created. QubesOS/qubes-issues#3293
Application that runs EventsDispatcher can safely use also cache , which greatly improve performance. This is because cache then is properly updated/invalidated when needed. Instead of modifying each application to explicitly enable cache based on this simple rule, make it implicit when EventsDispatcher is created. Do not enable caching when EventsDispatcher is created only temporarily in wait_for_domain_shutdown. QubesOS/qubes-issues#3293
The socket protocol is adjusted to match qrexec socket service protocol. QubesOS/qubes-issues#3293
Power state changes are signaled with events too, so it is possible to cache it and update/invalidate cache with events. Additionally, admin.vm.List returns a power state, so the cache can be populated early. This in particular greatly improves qvm-ls performance - eliminate admin.vm.CurrentState call at all. QubesOS/qubes-issues#3293
Both tools issue a large number of Admin API calls and greatly benefit from a cache filled with a single per-vm Admin API call (admin.vm.property.GetAll). In case of qvm-ls, this also saves multiple admin.vm.CurrentState calls (power state is given in the admin.vm.List response too). QubesOS/qubes-issues#3293
Application that runs EventsDispatcher can safely use also cache , which greatly improve performance. This is because cache then is properly updated/invalidated when needed. Instead of modifying each application to explicitly enable cache based on this simple rule, make it implicit when EventsDispatcher is created. Do not enable caching when EventsDispatcher is created only temporarily in wait_for_domain_shutdown. QubesOS/qubes-issues#3293
Automated announcement from builder-github The package
|
Qubes OS version:
4.0-rc2
Steps to reproduce the behavior:
Expected behavior:
The output should be shown instantly, with time spent under 100ms.
Actual behavior:
It takes more than 2 seconds on my machine.
General notes:
The current Qubes API design that requires a call for each property just doesn't work.
A single call should exist and be used to get all data for all VMs or for any set of VMs (including a single one).
Proposed design:
The text was updated successfully, but these errors were encountered: