Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability to publish metrics to prometheus #2684

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

chesterxgchen
Copy link
Collaborator

@chesterxgchen chesterxgchen commented Jul 7, 2024

Description

One of the feature request is to add system metrics to monitoring FLARE running metrics via Prometheus + Grafana or other monitoring systems.

In this PR, we propose components that can listen to the ReservedTopic.APP_METRICS topic and publish the metrics to the metrics server for Prometheus scraping. This plugin is optional, so the system can run with or without it.
This PR doesn't provide the metrics, but add a capability to easily publish metrics. To illustrate this capability, we define a few metrics such as get_task, submit_update to make sure it works as expected.

image

Here are few pieces to make this work

  1. MetricsCollector, this collector will subscribe a callback for the ReservedTopic.APP_METRICS topic in the DataBus; and receive callback when the topic is published.

In the callback, the MetricCollector simply post the received metrics from DataBus to prometheus http metrics server /update_metrics end point.

  1. Develop a Prometheus HTTP Metrics Server:
    We developed a custom handler, which will take the newly updated metrics, dynamically define one if not found, update the metrics value. The prometheus client lib will automatically update every Prometheus metrics in a REGISTRY. Then by using start_http_server ( comes with Prometheus client lib), it automatically publish the REGISTRY to the /metrics end point for Proemethus Server scraping

With this two parts, we can simply any metrics record to be know to prometheus by

self.data_bus.publish([ReservedTopic.APP_METRICS], metrics_data)

Note, Once the metrics is published to /metrics endpoint, the prometheus server will retrieve ( scrap or HTTP GET) from the /metrics and displayed from Prometheus HTTP (default port 9090). This can be further used for Grafana as the data source and visualize. All we need to do is to start the prometheus server ./prometheus and start the Grafana with some configuration and we can visualize in Grafana.

  1. to help collect user count, error count and time taken, we defined a time collection context: CollectTimeContext
    This is independent of Prometheus. We can use this
    try:
           with CollectTimeContext() as context:
                your normal code
    finally:
           self.publish_app_metrics(context.metrics, metrics_group)
  
   def publish_app_metrics(self, metrics: dict, metric_group: str):
       metrics_data = {}
       for metric_name in metrics:
           label = f"{metric_group}_{metric_name}"
           metrics_value = metrics.get(metric_name)
           metrics_data.update({label: metrics_value})

       self.data_bus.publish([ReservedTopic.APP_METRICS], metrics_data)

CollectTimeContext collects count, error_count and time_taken metrics values for each action. before we publish for need flatten the label, values.

A few sentences describing the changes proposed in this pull request.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

@chesterxgchen chesterxgchen marked this pull request as draft July 7, 2024 04:47
@chesterxgchen chesterxgchen marked this pull request as ready for review July 19, 2024 04:39
@chesterxgchen chesterxgchen marked this pull request as draft July 25, 2024 23:16
@chesterxgchen chesterxgchen marked this pull request as ready for review August 17, 2024 02:51
@chesterxgchen chesterxgchen marked this pull request as draft August 17, 2024 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant