-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Add request metrics for disaster recovery #13825
Conversation
Heads up @mionaalex - the "Documentation" label was applied to this issue. |
@tomponline Please take a look at this when you can, I would appreciate some comments on the overall structure of the solution so far. |
You can convert this to a checklist in GH so we can see progress as you tick them off. |
9f0c24a
to
ee0e814
Compare
c0cfe23
to
8a77d1e
Compare
@mseralessandri @tomponline This is ready for a full review.
@tomponline There is one caveat that should be mentioned. I am using the 400 status on operations to derive if the request result is a server error. But perhaps the 400 status is too broad and may also include some types of client errors (e.g. trying to add a block device to a container). I am not sure if I should handle that differently, maybe perform a more intricate analysis of the operation instead of just checking the status code. |
f7af662
to
09b329b
Compare
Signed-off-by: hamistao <[email protected]>
This is useful to mark the request that spawned that operation as completed when the operation is done. Signed-off-by: hamistao <[email protected]>
Uses the callback function when the operation finishes to mark the request that sapwned the operation as completed. Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
As a consequence of the introduction of the parameter on Render. Those fields have become obsolete and should be substituted and removed. Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
This ensures that: 1. Internal metrics are not cached and are always updated. Before this change, the new values were computed but only the older cached values were included in the endpoint output. 2. Internal metrics are included when there are no instances on the default project. That happened because if no instances were present, the metric set for the default project would not be initialized and thus the internal metrics wouldn't have a set to be included in. This is included on this PR so that the tests won't fail due to the metrics' values not being updated quick enough. Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]>
@tomponline The requested changes were made. |
|
||
The API rates metrics include `lxd_api_requests_completed_total` and `lxd_api_requests_ongoing`. These metrics can be consumed by an observability tool deployed externally (for example, the [Canonical Observability Stack](https://charmhub.io/topics/canonical-observability-stack) or another third-party tool) to help identify failures or overload on a LXD server. You can set thresholds on the observability tools for these metrics' values to trigger alarms and take programmatic actions. | ||
|
||
These metrics consider all endpoints in the [LXD REST API](../api), with the exception of the `/` endpoint. Requests using an invalid URL are also counted. Requests against the metrics server are also counted. Both introduced metrics include a label `entity_type` based on the main entity type that the endpoint is operating on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requests against the metrics server are also counted.
What entity type is used for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the type is determined by the endpoint and it just handles the /1.0 and the metrics endpoints, it would betypeServer
. Same as the regular rest server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Starting from canonical/lxd#13825 when using `Render()` we also have to pass the request besides the response writer.
This introduces API rates metrics for Disaster Recovery, the spec related to this should be published once this is merged.
Here is a raw output example of
/1.0/metrics
with the new metrics:https://pastebin.canonical.com/p/wvhcnK7tq6/