Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eks pod identity agent is in CrashLoopBackOff due to Readiness/Liveness probe failure. #26

Open
imbohyun1 opened this issue Oct 9, 2024 · 0 comments

Comments

@imbohyun1
Copy link

imbohyun1 commented Oct 9, 2024

Issue:
Some of eks pod identity agent pods are in CrashLoopBackOff due to Readiness/Liveness probe failure "InternalServerError: unexpected status code recieved, expected 404, got 503", it kept starting and terminating the container. I would like to know what can be the cause.

$ kubectl get po -n kube-system -o wide | grep eks-pod-identity-agent
eks-pod-identity-agent-8j5v7                    1/1     Running            0                 4d22h   172.20.92.8      ip-172-20-92-8.ap-northeast-2.compute.internal    <none>           <none>
eks-pod-identity-agent-bj22m                    0/1     CrashLoopBackOff   1428 (2m4s ago)   3d10h   172.20.92.78     ip-172-20-92-78.ap-northeast-2.compute.internal   <none>           <none>
eks-pod-identity-agent-fgl5s                    1/1     Running            0                 4d22h   172.20.92.70     ip-172-20-92-70.ap-northeast-2.compute.internal   <none>           <none>
eks-pod-identity-agent-gs76f                    1/1     Running            0                 4d22h   172.20.92.62     ip-172-20-92-62.ap-northeast-2.compute.internal   <none>           <none>
eks-pod-identity-agent-kgspj                    1/1     Running            0                 4d22h   172.20.92.89     ip-172-20-92-89.ap-northeast-2.compute.internal   <none>           <none>
eks-pod-identity-agent-qg4ps                    1/1     Running            0                 4d22h   172.20.92.82     ip-172-20-92-82.ap-northeast-2.compute.internal   <none>           <none>
eks-pod-identity-agent-tqpkl                    0/1     CrashLoopBackOff   1438 (4m4s ago)   3d10h   172.20.92.76     ip-172-20-92-76.ap-northeast-2.compute.internal   <none>           <none>

Upon checking eks-pod-identity-agent-tqpkl log, the agent was started and was able to handle EKS Auth request. But it got terminated due to probe failure.

2024-10-07T21:13:52.995199207+09:00 stderr F 2024/10/07 12:13:52 Running command:
2024-10-07T21:13:52.995245408+09:00 stderr F Command env: (log-file=, also-stdout=false, redirect-stderr=true)
2024-10-07T21:13:52.995251278+09:00 stderr F Run from directory: 
2024-10-07T21:13:52.995255848+09:00 stderr F Executable path: /eks-pod-identity-agent
2024-10-07T21:13:52.995260298+09:00 stderr F Args (comma-delimited): /eks-pod-identity-agent,server,--port,80,--cluster-name,xxxxxxx,--probe-port,2703
2024-10-07T21:13:52.996410134+09:00 stderr F 2024/10/07 12:13:52 Now listening for interrupts
2024-10-07T21:13:52.999479468+09:00 stdout F 2024/10/07 12:13:52 Setting logging verbosity level to: info (4)
2024-10-07T21:13:53.000405541+09:00 stdout F {"bind-addr":"[fd00:ec2::23]:80","level":"info","msg":"Starting server...","time":"2024-10-07T12:13:53Z"}
2024-10-07T21:13:53.000429722+09:00 stdout F {"bind-addr":"localhost:2703","level":"info","msg":"Starting server...","time":"2024-10-07T12:13:53Z"}
2024-10-07T21:13:53.000672385+09:00 stdout F {"bind-addr":"169.254.170.23:80","level":"info","msg":"Starting server...","time":"2024-10-07T12:13:53Z"}

2024-10-07T21:13:53.028863236+09:00 stdout F {"client-addr":"127.0.0.1:35778","level":"warning","msg":"Failed probe: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:13:53Z"}
2024-10-07T21:13:53.029003738+09:00 stdout F {"client-addr":"127.0.0.1:35778","level":"error","msg":"InternalServerError: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:13:53Z"}
2024-10-07T21:13:53.702324601+09:00 stdout F {"client-addr":"100.64.245.48:55592","cluster-name":"xxxxxxx","level":"info","msg":"handling new request request from 100.64.245.48:55592","time":"2024-10-07T12:13:53Z"}
2024-10-07T21:13:53.702430933+09:00 stdout F {"client-addr":"100.64.245.48:55592","cluster-name":"xxxxxxx","level":"info","msg":"Calling EKS Auth to fetch credentials","time":"2024-10-07T12:13:53Z"}
2024-10-07T21:13:53.867604399+09:00 stdout F {"client-addr":"100.64.245.48:55592","cluster-name":"xxxxxxx","fetched_role_arn":"arn:aws:sts::xxxxxxxxxx:assumed-role/teleport-agent/xxxxxxx","fetched_role_id":"xxxxxxxxxxxx:eks-nxxxxxxxxxxx","level":"info","msg":"Successfully fetched credentials from EKS Auth","request_time_ms":165,"time":"2024-10-07T12:13:53Z"}
2024-10-07T21:13:53.867633869+09:00 stdout F {"client-addr":"100.64.245.48:55592","cluster-name":"xxxxxxx,"level":"info","msg":"Storing creds in cache","refreshTtl":10800000000000,"time":"2024-10-07T12:13:53Z"}
2024-10-07T21:13:54.033368273+09:00 stdout F {"client-addr":"127.0.0.1:35782","level":"warning","msg":"Failed probe: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:13:54Z"}
2024-10-07T21:13:54.033396293+09:00 stdout F {"client-addr":"127.0.0.1:35782","level":"error","msg":"InternalServerError: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:13:54Z"}
2024-10-07T21:14:02.540275555+09:00 stdout F {"client-addr":"127.0.0.1:50366","level":"warning","msg":"Failed probe: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:02Z"}
2024-10-07T21:14:02.540305586+09:00 stdout F {"client-addr":"127.0.0.1:50366","level":"error","msg":"InternalServerError: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:02Z"}
2024-10-07T21:14:12.542465432+09:00 stdout F {"client-addr":"127.0.0.1:39010","level":"warning","msg":"Failed probe: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:12Z"}
2024-10-07T21:14:12.542494662+09:00 stdout F {"client-addr":"127.0.0.1:39010","level":"error","msg":"InternalServerError: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:12Z"}
2024-10-07T21:14:22.540732136+09:00 stdout F {"client-addr":"127.0.0.1:40738","level":"warning","msg":"Failed probe: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:22Z"}
2024-10-07T21:14:22.540780867+09:00 stdout F {"client-addr":"127.0.0.1:40738","level":"error","msg":"InternalServerError: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:22Z"}
2024-10-07T21:14:22.541252734+09:00 stdout F {"client-addr":"127.0.0.1:40734","level":"warning","msg":"Failed probe: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:22Z"}
2024-10-07T21:14:22.541263384+09:00 stdout F {"client-addr":"127.0.0.1:40734","level":"error","msg":"InternalServerError: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:22Z"}
2024-10-07T21:14:32.541967053+09:00 stdout F {"client-addr":"127.0.0.1:55296","level":"warning","msg":"Failed probe: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:32Z"}
2024-10-07T21:14:32.541999734+09:00 stdout F {"client-addr":"127.0.0.1:55296","level":"error","msg":"InternalServerError: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:32Z"}
2024-10-07T21:14:32.543430297+09:00 stdout F {"client-addr":"127.0.0.1:55302","level":"warning","msg":"Failed probe: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:32Z"}
2024-10-07T21:14:32.543443708+09:00 stdout F {"client-addr":"127.0.0.1:55302","level":"error","msg":"InternalServerError: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:32Z"}
2024-10-07T21:14:42.542296537+09:00 stdout F {"client-addr":"127.0.0.1:39784","level":"warning","msg":"Failed probe: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:42Z"}
2024-10-07T21:14:42.542326987+09:00 stdout F {"client-addr":"127.0.0.1:39784","level":"error","msg":"InternalServerError: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:42Z"}
2024-10-07T21:14:42.542331237+09:00 stdout F {"client-addr":"127.0.0.1:39788","level":"warning","msg":"Failed probe: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:42Z"}
2024-10-07T21:14:42.542334937+09:00 stdout F {"client-addr":"127.0.0.1:39788","level":"error","msg":"InternalServerError: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:42Z"}
2024-10-07T21:14:42.551913734+09:00 stdout F {"client-addr":"127.0.0.1:39800","level":"warning","msg":"Failed probe: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:42Z"}
2024-10-07T21:14:42.551941895+09:00 stdout F {"client-addr":"127.0.0.1:39800","level":"error","msg":"InternalServerError: unexpected status code recieved, expected 404, got 503","time":"2024-10-07T12:14:42Z"}
2024-10-07T21:14:42.552485932+09:00 stderr F 2024/10/07 12:14:42 Got signal: terminated. Sending down to process (PID: 11)
2024-10-07T21:14:42.552497472+09:00 stderr F 2024/10/07 12:14:42 Signalled process 11 successfully.
2024-10-07T21:14:42.552691435+09:00 stdout F {"bind-addr":"[fd00:ec2::23]:80","level":"info","msg":"Shutting down server...","time":"2024-10-07T12:14:42Z"}
2024-10-07T21:14:42.552721276+09:00 stdout F {"bind-addr":"[fd00:ec2::23]:80","level":"info","msg":"Server gracefully stopped","time":"2024-10-07T12:14:42Z"}
2024-10-07T21:14:42.552726196+09:00 stdout F {"bind-addr":"localhost:2703","level":"info","msg":"Shutting down server...","time":"2024-10-07T12:14:42Z"}
2024-10-07T21:14:42.552810057+09:00 stdout F {"bind-addr":"169.254.170.23:80","level":"info","msg":"Shutting down server...","time":"2024-10-07T12:14:42Z"}
2024-10-07T21:14:42.552844757+09:00 stdout F {"bind-addr":"169.254.170.23:80","level":"info","msg":"Server gracefully stopped","time":"2024-10-07T12:14:42Z"}
2024-10-07T21:14:42.552853478+09:00 stdout F {"bind-addr":"localhost:2703","level":"info","msg":"Server gracefully stopped","time":"2024-10-07T12:14:42Z"}

There was no resource contention in worker nodes, same configuration with all nodes, but only some of them got failed to run eks-pod-identity-agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant