Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove gauge metric #1055

Open
danielstankw opened this issue Aug 7, 2024 · 7 comments
Open

Remove gauge metric #1055

danielstankw opened this issue Aug 7, 2024 · 7 comments

Comments

@danielstankw
Copy link

danielstankw commented Aug 7, 2024

Hi all,
I have written a custom exporter for calculating the cost of using a node in AWS.
Here is how it works:
Lets assume I have 3 nodes, each costs 5$/ 1h, when I plot using grafana the sum(cost_metric{}) i get 15$ (3x5$). Lets say after 2 hours one of the nodes get deleted (autoscaling). In that case the total cost should drop to 10$.

The problem is that in my case the metric is preserved and even though the node has been deleted the cost is kept and thus it displays 15$ instead of dropping to 10$

How would I go about saving that problem?

cost_metric = Gauge(
    "cost_metric ",
    "Cost of running an instance for 1 hour",
    ["node_name", "instance_type"],
)
...

node_names = get_nodes()
    for node_name in node_names:
        node_info = get_node_info(node_name)
        if node_info is None:
            continue

        logging.info(f"Updating metrics for node: {node_name}")

        # labels section
        labels = node_info["metadata"]["labels"]
        instance_type = labels.get("beta.kubernetes.io/instance-type", "unknown")
        cost = get_cost_of_instance(instance_type)

        if cost is not None:
            cost_metric.labels(node_name=node_name, instance_type=instance_type).set(cost)

I tried

I collected previous and current nodes in form of dict and then wanted to removed the ones that arent existing, the issue is that :

cost_metric.remove(node_name=node_name, instance_type=instance_type)

Traceback (most recent call last):
  File "/home/XXXX/projects/main.py", line 154, in <module>
    previous_nodes = update_metrics(previous_nodes)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eksohio/projects/main.py", line 131, in update_metrics
    cost.remove(node_name=node_name, instance_type=instance_type)
TypeError: MetricWrapperBase.remove() got an unexpected keyword argument 'node_name'
@danielstankw
Copy link
Author

I can set the cost to 0, and bypass it that way, but it will still result in metric that is no longer needed being preserved and thus over time, consuming space. :/

@csmarchbanks
Copy link
Member

Hello, this sounds like the use case for a custom collector: https://prometheus.github.io/client_python/collector/custom/. You will only add metrics for the nodes that you want to include in the output so no extra series will be left around.

@danielstankw
Copy link
Author

@csmarchbanks thanks for the hint,
Would you be able to elaborate a bit more on how would that work>?

@csmarchbanks
Copy link
Member

That would work by running your get_node and other logic during each scrape and only having cost_metric exist for the lifetime of the scrape. That way if an instance disappears it will automatically just not appear during the next scrape's output. Adapting the example a bit for your case (I have not run/tested this but it should give the idea):

from prometheus_client.core import GaugeMetricFamily, REGISTRY
from prometheus_client.registry import Collector

class CustomCollector(Collector):
    def collect(self):
        cost_metric = GaugeMetricFamily("cost_metric ",
            "Cost of running an instance for 1 hour",
            ["node_name", "instance_type"],
        )

        node_names = get_nodes()
        for node_name in node_names:
            node_info = get_node_info(node_name)
            # ... collect label info, etc... from your code.
            if cost is not None:
                cost_metric.labels(node_name=node_name, instance_type=instance_type).set(cost)

        yield cost_metric

REGISTRY.register(CustomCollector())

@danielstankw
Copy link
Author

@csmarchbanks
I will test it out, thanks a ton for taking your time and providing an example :)

@danielstankw
Copy link
Author

danielstankw commented Nov 1, 2024

@csmarchbanks I have implemented an exporter as suggested.
The issue i am facing now is that because I expose metric every 10min. In Grafana dashboard or prometheus I cant see the metric at any time, but only at specific intervals.

What I mean by that is as follows:
The metric is available for 5 minutes, with 10 minute break. Therefore If I try to query the metric at the time between its available and a new scrape is performed I get empty dashboard.
I would want to see the metric at all times, but it should automatically update when ex. node gets deleted
image

@junneyang
Copy link

@csmarchbanks I have implemented an exporter as suggested. The issue i am facing now is that because I expose metric every 10min. In Grafana dashboard or prometheus I cant see the metric at any time, but only at specific intervals.

What I mean by that is as follows: The metric is available for 5 minutes, with 10 minute break. Therefore If I try to query the metric at the time between its available and a new scrape is performed I get empty dashboard. I would want to see the metric at all times, but it should automatically update when ex. node gets deleted image

same problem, any suggestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants