Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add prometheus endpoint #19

Merged
merged 7 commits into from
Sep 19, 2024
Merged

feat: Add prometheus endpoint #19

merged 7 commits into from
Sep 19, 2024

Conversation

pkruk
Copy link
Contributor

@pkruk pkruk commented Sep 12, 2024

Issue #13 ,

Description of changes:

I’m currently missing a metrics endpoint in the pod-identity-agent, which is essential for creating alerts and dashboards. I was wondering if you would consider supporting a Prometheus endpoint for this purpose.

I started working on implementing in eks-pod-identity-code but I would like to ask you for your opinion, before I will start implementing more metrics and logic :)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@kmala
Copy link

kmala commented Sep 16, 2024

Thanks @pkruk for the contribution. The approach looks good and i can take a look once you have the changes ready.

@pkruk
Copy link
Contributor Author

pkruk commented Sep 17, 2024

thanks! I will then continue my work and let you know!

@pkruk pkruk changed the title [wip] feat: Add prometheus endpoint feat: Add prometheus endpoint Sep 18, 2024
@pkruk
Copy link
Contributor Author

pkruk commented Sep 18, 2024

I added two more metrics and I tested it locally in one of our test clusters. I will be very happy to hear feedback :)!

Name: "pod_identity_cache_non_recoverable_error",
Help: "Removing credentials from cache, got non recoverable error",
})
promCacheHit = promauto.NewCounterVec(prometheus.CounterOpts{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would suggest to rename to promCacheState as one of the state is hit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed :)

@@ -52,6 +54,18 @@ type internalClock func() time.Time
// type assertion
var _ credentials.CredentialRetriever = &cachedCredentialRetriever{}

var (
promCacheNonRecoverableError = promauto.NewCounter(prometheus.CounterOpts{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can just keep this as pod_identity_cache_errors and may be we might want to capture both recoverable and non-recoverable errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed :) thx!

@pkruk
Copy link
Contributor Author

pkruk commented Sep 19, 2024

Screenshot 2024-09-19 at 17 19 11

@kmala kmala merged commit dbfb48f into aws:main Sep 19, 2024
@kmala
Copy link

kmala commented Sep 19, 2024

@pkruk thanks for the contribution

@pkruk
Copy link
Contributor Author

pkruk commented Sep 20, 2024

thank you :)!

@pkruk pkruk deleted the metrics branch September 20, 2024 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants