Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics with, at least, success rate #13

Open
abhishekmukherg opened this issue Jul 15, 2024 · 4 comments
Open

Add metrics with, at least, success rate #13

abhishekmukherg opened this issue Jul 15, 2024 · 4 comments

Comments

@abhishekmukherg
Copy link

abhishekmukherg commented Jul 15, 2024

Hi! We're interested in onboarding Pod Identity for our clusters. As we're planning out our installation, we feel a lack of observability into the agent, which may effect our ability to operate the system at scale. If I'm reading the code right, it appears that the only signals we can get, as consumers of the agent, are largely the /healthz and /readyz endpoints (both of which lead to the same probe).

Given the criticality of the system as we onboard it, it would be valuable for us to get one further level of detail. I'm thinking in the best case would be the ability to get success rate per agent running (since, if I understand the code, it seems like it's largely a HTTP service).

One thing we could implement would be a simple Prometheus/OpenMetrics endpoint which could expose just simple 200/300/400/500s (per the default go prometheus client), and that would give us the lion's share of what we need out of the observability story. It could go deeper into other facets, but... baby steps ;). If we had some confidence that the base metrics could be integrated upstream, it's possible we could take on this work to implement it.

Alternatively, these metrics could go to CloudWatch or something, but that's more of a new area for me so don't know what that'd look like.

@abhishekmukherg abhishekmukherg changed the title Add metrics with at least success rate Add metrics with, at least, success rate Jul 15, 2024
@abhishekmukherg
Copy link
Author

One open question that I don't have the opportunity to look up right this moment, but may need to be solved, is if the monitoring endpoints can be exposed to a wide enough interface/port to actually be monitorable by Prometheuses

@prateekgogia
Copy link

One open question that I don't have the opportunity to look up right this moment, but may need to be solved, is if the monitoring endpoints can be exposed to a wide enough interface/port to actually be monitorable by Prometheuses

We should be able to scrape metrics from this agent through APIServer -> kubelet -> pod identity agent.

@abhishekmukherg
Copy link
Author

Excellent, thank you for the response. We'll keep this ticket updated as we approach this. It's looking like it'll be around Sept-Oct timeframe that we'll be able to pick up the work

@pkruk
Copy link
Contributor

pkruk commented Sep 12, 2024

Hi :)
I'm also missing this one :) I was wondering if exposing prometheus endpoint is ok for you? I prepared a wip here :) If that's direction is good for you I could implement the rest of logic :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants