You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I installed the cni-metric-helper via the helm chart, 1.18.5 1.30 EKS cluster, all the addons pretty must updated to at least a week ago
My logs are showing a failure when it attempts to pull the metrics from the aws-node pods
{"level":"info","ts":"2024-10-14T20:06:11.547Z","caller":"cni-metrics-helper/main.go:69","msg":"Constructed new logger instance"}
{"level":"info","ts":"2024-10-14T20:06:11.548Z","caller":"runtime/proc.go:271","msg":"Starting CNIMetricsHelper. Sending metrics to CloudWatch: false, Prometheus: true, LogLevel DEBUG, me
tricUpdateInterval 30"}
{"level":"info","ts":"2024-10-14T20:06:41.588Z","caller":"runtime/proc.go:271","msg":"Collecting metrics ..."}
{"level":"info","ts":"2024-10-14T20:06:41.689Z","caller":"metrics/cni_metrics.go:211","msg":"Total aws-node pod count: 5"}
{"level":"debug","ts":"2024-10-14T20:06:41.689Z","caller":"metrics/metrics.go:439","msg":"Total TargetList pod count: 5"}
{"level":"error","ts":"2024-10-14T20:08:51.287Z","caller":"metrics/metrics.go:399","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-n929t:61678)"}
{"level":"error","ts":"2024-10-14T20:11:02.359Z","caller":"metrics/metrics.go:399","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-xlz6m:61678)"}
{"level":"error","ts":"2024-10-14T20:13:13.431Z","caller":"metrics/metrics.go:399","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-6kvmk:61678)"}
{"level":"error","ts":"2024-10-14T20:15:24.503Z","caller":"metrics/metrics.go:399","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-8gnpw:61678)"}
{"level":"error","ts":"2024-10-14T20:17:35.575Z","caller":"metrics/metrics.go:399","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-fj6n5:61678)"}
{"level":"info","ts":"2024-10-14T20:17:35.575Z","caller":"runtime/proc.go:271","msg":"Collecting metrics ..."}
{"level":"info","ts":"2024-10-14T20:17:35.576Z","caller":"metrics/cni_metrics.go:211","msg":"Total aws-node pod count: 5"}
{"level":"debug","ts":"2024-10-14T20:17:35.576Z","caller":"metrics/metrics.go:439","msg":"Total TargetList pod count: 5"}
{"level":"error","ts":"2024-10-14T20:19:46.647Z","caller":"metrics/metrics.go:399","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-6kvmk:61678)"}
{"level":"error","ts":"2024-10-14T20:21:57.719Z","caller":"metrics/metrics.go:399","msg":"grabMetricsFromTarget: Failed to grab CNI endpoint: the server is currently unable to handle the request (get pods aws-node-8gnpw:61678)"}
What's the scale of your cluster ? Is the node healthy when it tries to pull the metrics from the failed pods ? ^
We have not seen this behavior recently. Is there anything installed on the node which might be blocking the connections ?Any Network Policy ?
I'm working on reproducing the error you've encountered by setting up an EKS environment similar to yours. To better understand your setup, could you please provide the following information:
1. EKS Cluster Setup
Could you share details about how your EKS cluster was created?
2. Helm Chart Installation Method
Which approach are you using to install the helm chart?
I then manually installed the helm chart with configurations matching yours. The additional context you provide will help me better understand and troubleshoot the issue you're experiencing.
I am having a similar issue as reported in #1912
I installed the cni-metric-helper via the helm chart, 1.18.5 1.30 EKS cluster, all the addons pretty must updated to at least a week ago
My logs are showing a failure when it attempts to pull the metrics from the aws-node pods
My config for helm is
Other than that, there is little other config. The helm targeted the kube-system namespace.
The cluster role binding seems correct
the sa is in the right place
Traced the call down to https://github.com/aws/amazon-vpc-cni-k8s/blob/master/cmd/cni-metrics-helper/metrics/metrics.go#L89
We have istio installed on this cluster, but it's not in the kube-system namespace.
There was talk about needing to set the REGION and cluster in the other issue. I did that manually to see if it would help, and no dice
There isn't a security group that prevents any inter-node communications > port 1024
Thanks!
The text was updated successfully, but these errors were encountered: