Add metrics with, at least, success rate #13

abhishekmukherg · 2024-07-15T23:16:55Z

Hi! We're interested in onboarding Pod Identity for our clusters. As we're planning out our installation, we feel a lack of observability into the agent, which may effect our ability to operate the system at scale. If I'm reading the code right, it appears that the only signals we can get, as consumers of the agent, are largely the /healthz and /readyz endpoints (both of which lead to the same probe).

Given the criticality of the system as we onboard it, it would be valuable for us to get one further level of detail. I'm thinking in the best case would be the ability to get success rate per agent running (since, if I understand the code, it seems like it's largely a HTTP service).

One thing we could implement would be a simple Prometheus/OpenMetrics endpoint which could expose just simple 200/300/400/500s (per the default go prometheus client), and that would give us the lion's share of what we need out of the observability story. It could go deeper into other facets, but... baby steps ;). If we had some confidence that the base metrics could be integrated upstream, it's possible we could take on this work to implement it.

Alternatively, these metrics could go to CloudWatch or something, but that's more of a new area for me so don't know what that'd look like.

The text was updated successfully, but these errors were encountered:

abhishekmukherg · 2024-07-16T02:12:36Z

One open question that I don't have the opportunity to look up right this moment, but may need to be solved, is if the monitoring endpoints can be exposed to a wide enough interface/port to actually be monitorable by Prometheuses

prateekgogia · 2024-08-13T16:28:47Z

One open question that I don't have the opportunity to look up right this moment, but may need to be solved, is if the monitoring endpoints can be exposed to a wide enough interface/port to actually be monitorable by Prometheuses

We should be able to scrape metrics from this agent through APIServer -> kubelet -> pod identity agent.

abhishekmukherg · 2024-08-13T20:05:16Z

Excellent, thank you for the response. We'll keep this ticket updated as we approach this. It's looking like it'll be around Sept-Oct timeframe that we'll be able to pick up the work

pkruk · 2024-09-12T15:14:12Z

Hi :)
I'm also missing this one :) I was wondering if exposing prometheus endpoint is ok for you? I prepared a wip here :) If that's direction is good for you I could implement the rest of logic :)

abhishekmukherg changed the title ~~Add metrics with at least success rate~~ Add metrics with, at least, success rate Jul 15, 2024

pkruk mentioned this issue Sep 12, 2024

feat: Add prometheus endpoint #19

Merged

taraspos mentioned this issue Sep 30, 2024

fix: use common pod selector in metrics service #24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics with, at least, success rate #13

Add metrics with, at least, success rate #13

abhishekmukherg commented Jul 15, 2024 •

edited

Loading

abhishekmukherg commented Jul 16, 2024

prateekgogia commented Aug 13, 2024

abhishekmukherg commented Aug 13, 2024

pkruk commented Sep 12, 2024

Add metrics with, at least, success rate #13

Add metrics with, at least, success rate #13

Comments

abhishekmukherg commented Jul 15, 2024 • edited Loading

abhishekmukherg commented Jul 16, 2024

prateekgogia commented Aug 13, 2024

abhishekmukherg commented Aug 13, 2024

pkruk commented Sep 12, 2024

abhishekmukherg commented Jul 15, 2024 •

edited

Loading