Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] Alert "KubeClientCertificateExpiration" expression output showing wrong values #3441

Open
jaisegrg opened this issue May 29, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@jaisegrg
Copy link

jaisegrg commented May 29, 2023

Describe the bug a clear and concise description of what the bug is.

Kube-prometheus-stack helm chart is installed in an AKS cluster, but there is an issue with "KubeClientCertificateExpiration" alert, which shows wrong values for the expression output. Validated the "kube-apiserver" certificate expiration based on the output by converting the output value which is in seconds to days, but its not matching the alerts.

Version: -kube-prometheus-stack-45.8.1 v0.63.0

  **- alert: KubeClientCertificateExpiration**
    annotations:
      description: A client certificate used to authenticate to kubernetes apiserver
        is expiring in less than 7.0 days.
      runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeclientcertificateexpiration
      summary: Client certificate is about to expire.
    expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"}
      > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
      < 604800
    for: 5m
    labels:
      severity: warning

  **- alert: KubeClientCertificateExpiration**
    annotations:
      description: A client certificate used to authenticate to kubernetes apiserver
        is expiring in less than 24.0 hours.
      runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeclientcertificateexpiration
      summary: Client certificate is about to expire.
    expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"}
      > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
      < 86400
    for: 5m
    labels:
      severity: critical

image

What's your helm version?

version.BuildInfo{Version:"v3.8.0", GitCommit:"d14138609b01886f544b2025f5000351c9eb092e", GitTreeState:"clean", GoVersion:"go1.17.5"}

What's your kubectl version?

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"fd6aae27a28fca7e8b996d7201b0da6fbf6f732a", GitTreeState:"clean", BuildDate:"2023-04-08T13:27:20Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

Which chart?

prometheus-community/kube-prometheus-stack

What's the chart version?

kube-prometheus-stack-45.8.1 v0.63.0

What happened?

No response

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

helm upgrade --install prometheus-central
--namespace monitoring
prometheus-community/kube-prometheus-stack

Anything else we need to know?

No response

@jaisegrg jaisegrg added the bug Something isn't working label May 29, 2023
@zeritti zeritti changed the title Alert "KubeClientCertificateExpiration" expression output showing wrong values [kube-prometheus-stack] Alert "KubeClientCertificateExpiration" expression output showing wrong values May 29, 2023
@ykfq
Copy link

ykfq commented Jun 14, 2023

Faced with same issue when I restart promtheus, then I upgrade promtheus to latest 2.24.0 and restart I again, the problem disappeared.

@stale
Copy link

stale bot commented Aug 7, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale label Aug 7, 2023
@koooge
Copy link
Contributor

koooge commented Jul 5, 2024

Hi there, I encountered the same issue, and I don't think the expr query work as expected.
I think the apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and on(job) part is not correct. As the result, the value of the whole query is not decreasing but monotonically increasing. I know it should be fixed in https://github.com/kubernetes-monitoring/kubernetes-mixin

@stale stale bot removed the lifecycle/stale label Jul 5, 2024
@koooge
Copy link
Contributor

koooge commented Jul 5, 2024

As the workaround this worked to me:

histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
and on (job) apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0

refs kubernetes-monitoring/kubernetes-mixin#941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants