-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monitoring: expose Prometheus-friendly metrics #228
Comments
Can you elaborate on the use-case? Which information would you want to be returned? Why do you want this information to be returned? Who processes this information? And especially who interprets it? |
The usecase would be for monitoring dbus-broker on a fleet of server nodes part of a large cluster, where there is a centralized monitoring solution. I am not looking for a specific piece of information at this point (the MVP above has two basic examples though). I'd want a stable way to monitor dbus-broker, then letting the developers (you) free to expose what they think is useful to track from a stability/performance/capacity point of view. I want this metrics information to be exposed/returned, in order to do whitebox monitoring of dbus-broker. That is, asking dbus-broker directly about its relevant internal state, and tracking it over time. This interface would be periodically polled (e.g. every minute) and the result recorded externally. This information is processed by the monitoring system (e.g. Prometheus) and is recorded in some time-series databases. It can be used for proactive alerting, performance tracking, post-mortem analysis, capacity planning, dashboarding, and more. It is aggregated, interpreted and queried by the monitoring solution itself. That is, as long there is a standard way (see datastructure above) to retrieve those metrics, the rest of the logic is decoupled from this. Whatever monitoring solution can be used to consume this. As a concrete example using the unrelated service above, these metrics can be used for cluster dashboarding and live performance/status querying. |
I looked into Prometheus a bit more. In general, I like the concept of aggregating metrics. I am also fine with keeping close to Prometheus semantics, as it seems to be a quite established utility in that field. I am, though, a bit worried about using the Prometheus data-format in Anyway, lets put my personal opinion on that aside. Fact is, With this in mind, it would be rather natural to return information nativly marshaled as D-Bus messages. A simply The best approach would be to agree with
A very simple interface that allows to query the current metrics. This could then be easily extended with more data in the future. The labels can be easily converted into Prometheus labels. I wondering, whether to include information about the service that is managed externally. For instance, the start-time sounds like something to query from the service-file, rather than from the running service. I mean, this is information not under control of dbus-broker, but only under control of its parent. A simple set of metrics to start with would be For everything we then add on top, we would have to discuss whether we can get the data without calculating anything at runtime. Anyway, comments welcome! I am open to suggestions! |
@dvdhrm ack, I see your point of being wary of locking yourself into an externally-defined exposition format, it's a legit one. Your suggested approach means there needs to be a smarter middleman/proxy as an exporting point for Prometheus; that's a totally legit pattern. I'd be fine with that. Regarding the dbus signature, If you want to make Regarding service info / start timestamp, you have a good point that this usually belongs to the service manager. However, I think duplicating it here too won't hurt and would make metrics analysis simpler. One difference though is that here it is pretty much a O(1) call, while getting the same through a systemd exporter requires walking the service hierarchy and filtering through properties. |
This is possibly a followup on #220.
It would be nice to have some Prometheus-compatible ways to query internal metrics from dbus-broker.
For reference, Prometheus has its own exposition format which is basically a well-known datastructure over a plaintext response to an HTTP GET.
While the transport and encoding are likely not useful here, the underlying datastructure is:
metric_key -> metric_value
, where:metric_value
is a f64metric_key
ismetric name + map(label_key, label_value)
, where:metric_key
,label_key
,label_value
) are stringsI'm somehow asking for an interface similar to
org.freedesktop.DBus.Debug.Stats.GetStats()
, but returning a datastructure equivalent to the one above or, even better, directly the Prometheus textual format.My initial MVP for metrics to query here would be:
For reference in case this looks very fuzzy, I have an unrelated service implementing something similar (minus the dbus part) and the result can be (temporarily) observed here.
The text was updated successfully, but these errors were encountered: