-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log agent not running as root sometimes fails to tail log files #1140
Comments
Having the same issue |
This issue was marked stale due to lack of activity. |
I understand this will be harder to fix than some, given it is intermittent, but the workaround is elevated permissions, which reduces security, so perhaps this will make the cut to look at one day? |
The problem is that the cloudwatch agent doesn't log an error if it can't read a particular log config. It silently just skips it and doesn't write anything out. And exactly like the OP described a simple flip of a group owner on a directory or modification of permissions from an RPM/DEB and boom all of your logs can stop being posted and NOTHING is said about it. It's a simple fix just to put a log statement in the cloudwatch agent to just log this out so we don't spend months pulling our hairout staring at aws permissions wondering "what permission does this thing NOT have?!" |
I think this is the same issue as #943.
I had configuration which was running fine for perhaps years. In the Summer I was making upgrades, e.g. flip to Ubuntu:22.04, I probably also absorbed the latest AMI.
The service I was running would periodically get stuck (I don't think related to AWS), so I used CloudWatch to look for a log message indicating it was sick. I'd then use a Lambda to kill the sick server and have it re-start.
But then in early November, I found that periodically the server would stop emitting log messages, meaning I didn't know it was stuck. I didn't root cause this issue properly, and instead created another alarm to check I was receiving log messages; and if I was not, I would kill the server and re-start it.
I've since upgraded another ECS task to Ubuntu 22.04 and found the same problem is happening with that service too.
If I log in to the server, I find cwagent struggling to read log files.
But sudo -u cwagent has no problem at all tailing the logs:
Killing the agent the first time didn't fix the issue:
Killing it a second time did
Between these attempts I made no changes to the permissions. I conclude there must be a software fault in the cloud watch logs agent and recommend further investigation.
In the mean time I'll amend my cwagent to run as root, which is not really a preferred configuration.
FYI I happen to work for Amazon, but this project is not related to my Amazon employment. But if AWS tech staff want to contact me, you'll find me in the corporate directory.
The text was updated successfully, but these errors were encountered: