Add capability to publish metrics to prometheus #2684
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
One of the feature request is to add system metrics to monitoring FLARE running metrics via Prometheus + Grafana or other monitoring systems.
In this PR, we propose components that can listen to the ReservedTopic.APP_METRICS topic and publish the metrics to the metrics server for Prometheus scraping. This plugin is optional, so the system can run with or without it.
This PR doesn't provide the metrics, but add a capability to easily publish metrics. To illustrate this capability, we define a few metrics such as get_task, submit_update to make sure it works as expected.
Here are few pieces to make this work
In the callback, the MetricCollector simply post the received metrics from DataBus to prometheus http metrics server /update_metrics end point.
We developed a custom handler, which will take the newly updated metrics, dynamically define one if not found, update the metrics value. The prometheus client lib will automatically update every Prometheus metrics in a REGISTRY. Then by using start_http_server ( comes with Prometheus client lib), it automatically publish the REGISTRY to the /metrics end point for Proemethus Server scraping
With this two parts, we can simply any metrics record to be know to prometheus by
self.data_bus.publish([ReservedTopic.APP_METRICS], metrics_data)
Note, Once the metrics is published to /metrics endpoint, the prometheus server will retrieve ( scrap or HTTP GET) from the /metrics and displayed from Prometheus HTTP (default port 9090). This can be further used for Grafana as the data source and visualize. All we need to do is to start the prometheus server ./prometheus and start the Grafana with some configuration and we can visualize in Grafana.
This is independent of Prometheus. We can use this
CollectTimeContext collects count, error_count and time_taken metrics values for each action. before we publish for need flatten the label, values.
A few sentences describing the changes proposed in this pull request.
Types of changes
./runtest.sh
.