-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EKS] [Managed Workers]: Send kubelet logs to CloudWatch #903
Comments
Hi Folks, We now have the ability to solve the above ask. Check out this blog to learn more - https://aws.amazon.com/blogs/containers/fluent-bit-integration-in-cloudwatch-container-insights-for-eks/ |
I dont see how that addresses the above |
Is this already doable? |
When had an EKS node unexpectedly changing its status to |
same situation happened with us recently
|
did u solve @joeynaor @michaelswierszcz ? i'm thinking related to aws/amazon-vpc-cni-k8s#2808 |
ended up scraping kubelet logs with our exists logging infrastructure (fluent-bit -> loki) |
@tooptoop4 Our workaround was to disable "delete on terminate" for the EKS nodes disks. After an incident, we mounted the disk of the faulty node to a regular EC2 instance and inspected the logs. In our case, the only incident ever since was caused by a hardware failure on AWS side, confirmed by AWS support. |
Community Note
Tell us about your request
Would be great for kubelet / other managed worker node logs to be sent to CloudWatch.
Which service(s) is this request for?
EKS (Managed worker nodes)
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Developers are already using the control plane logs (specifically the audit log) to assist in debugging, but occasionally the platform team has to step in and SSH into worker nodes to pull kubelet logs.
It would be super helpful if these were sent to CloudWatch like the control plane logs.
Are you currently working around this issue?
Workaround is to just SSH to the worker node, but obviously this has some limitations, for example when using cluster-autoscaler the node might not live for very long.
The text was updated successfully, but these errors were encountered: