Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log agent not running as root sometimes fails to tail log files #1140

Open
NigelSwinson opened this issue Apr 17, 2024 · 4 comments
Open

Log agent not running as root sometimes fails to tail log files #1140

NigelSwinson opened this issue Apr 17, 2024 · 4 comments

Comments

@NigelSwinson
Copy link

NigelSwinson commented Apr 17, 2024

I think this is the same issue as #943.

I had configuration which was running fine for perhaps years. In the Summer I was making upgrades, e.g. flip to Ubuntu:22.04, I probably also absorbed the latest AMI.

The service I was running would periodically get stuck (I don't think related to AWS), so I used CloudWatch to look for a log message indicating it was sick. I'd then use a Lambda to kill the sick server and have it re-start.

But then in early November, I found that periodically the server would stop emitting log messages, meaning I didn't know it was stuck. I didn't root cause this issue properly, and instead created another alarm to check I was receiving log messages; and if I was not, I would kill the server and re-start it.

I've since upgraded another ECS task to Ubuntu 22.04 and found the same problem is happening with that service too.

If I log in to the server, I find cwagent struggling to read log files.

2024-04-17T01:02:42Z E! [inputs.logfile] Failed to tail file /var/log/syslog with error: open /var/log/syslog: permission denied
2024-04-17T01:02:42Z E! [inputs.logfile] Failed to tail file /var/log/mailman/smtp.log with error: open /var/log/mailman/smtp.log: permission denied
2024-04-17T01:02:42Z E! [inputs.logfile] Failed to tail file /var/log/mailman/bounce.log with error: open /var/log/mailman/bounce.log: permission denied
2024-04-17T01:02:42Z E! [inputs.logfile] Failed to tail file /var/log/mailman/mailman.log with error: open /var/log/mailman/mailman.log: permission denied
2024-04-17T01:02:42Z E! [inputs.logfile] Failed to tail file /var/log/mailman/http.log with error: open /var/log/mailman/http.log: permission denied
2024-04-17T01:02:42Z E! [inputs.logfile] Failed to tail file /var/log/uwsgi-backend.log with error: open /var/log/uwsgi-backend.log: permission denied
2024-04-17T01:02:42Z E! [inputs.logfile] Failed to tail file /var/log/uwsgi.log with error: open /var/log/uwsgi.log: permission denied

But sudo -u cwagent has no problem at all tailing the logs:

root@405f80ef7893:/var/log/amazon/amazon-cloudwatch-agent# sudo -u cwagent tail /var/log/syslog
Apr 16 20:17:01 405f80ef7893 CRON[879]: (root) CMD ([880]    cd / && run-parts --report /etc/cron.hourly)
Apr 16 20:17:01 405f80ef7893 CRON[879]: (root) END ([880]    cd / && run-parts --report /etc/cron.hourly)
Apr 16 21:17:01 405f80ef7893 CRON[899]: (root) CMD ([900]    cd / && run-parts --report /etc/cron.hourly)
Apr 16 21:17:01 405f80ef7893 CRON[899]: (root) END ([900]    cd / && run-parts --report /etc/cron.hourly)
Apr 16 22:17:01 405f80ef7893 CRON[918]: (root) CMD ([919]    cd / && run-parts --report /etc/cron.hourly)
Apr 16 22:17:01 405f80ef7893 CRON[918]: (root) END ([919]    cd / && run-parts --report /etc/cron.hourly)
Apr 16 23:17:01 405f80ef7893 CRON[937]: (root) CMD ([938]    cd / && run-parts --report /etc/cron.hourly)
Apr 16 23:17:01 405f80ef7893 CRON[937]: (root) END ([938]    cd / && run-parts --report /etc/cron.hourly)
Apr 17 00:17:01 405f80ef7893 CRON[958]: (root) CMD ([959]    cd / && run-parts --report /etc/cron.hourly)
Apr 17 00:17:01 405f80ef7893 CRON[958]: (root) END ([959]    cd / && run-parts --report /etc/cron.hourly)

Killing the agent the first time didn't fix the issue:

2024/04/17 01:04:44 I! Config has been translated into YAML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.yaml
2024/04/17 01:04:44 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2024/04/17 01:04:44 I! Valid Json input schema.
2024/04/17 01:04:44 I! Detected runAsUser: cwagent
2024/04/17 01:04:44 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 999:999
2024/04/17 01:04:44 I! Set HOME: /home/cwagent
2024-04-17T01:04:44Z I! Starting AmazonCloudWatchAgent CWAgent/1.300035.0b547 (go1.22.1; linux; amd64) with log file /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log with log tar>
2024-04-17T01:04:44Z I! AWS SDK log level not set
2024-04-17T01:04:44Z I! creating new logs agent
2024-04-17T01:04:44Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"405f80ef7893", Flush Interval:1s
2024-04-17T01:04:44Z I! [logagent] start logs plugin file paths [/var/log/syslog /var/log/mailman/smtp.log /var/log/mailman/bounce.log /var/log/mailman/mailman.log /var/log/mailman/http.log /var/>
2024-04-17T01:04:44Z I! [logagent] starting
2024-04-17T01:04:44Z I! [logagent] found plugin cloudwatchlogs is a log backend
2024-04-17T01:04:44Z I! [logagent] found plugin logfile is a log collection
2024-04-17T01:04:44Z I! [logagent] start logs plugin file paths [/var/log/syslog /var/log/mailman/smtp.log /var/log/mailman/bounce.log /var/log/mailman/mailman.log /var/log/mailman/http.log /var/>
2024-04-17T01:04:44Z I! [inputs.logfile] turned on logs plugin
2024-04-17T01:04:44Z I! [inputs.logfile] turned on logs plugin
2024-04-17T01:04:45Z E! [inputs.logfile] Failed to tail file /var/log/syslog with error: open /var/log/syslog: permission denied
2024-04-17T01:04:45Z E! [inputs.logfile] Failed to tail file /var/log/mailman/smtp.log with error: open /var/log/mailman/smtp.log: permission denied

Killing it a second time did

2024/04/17 01:07:02 I! Config has been translated into YAML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.yaml
2024/04/17 01:07:02 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2024/04/17 01:07:02 I! Valid Json input schema.
2024/04/17 01:07:02 I! Detected runAsUser: cwagent
2024/04/17 01:07:02 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 999:999
2024/04/17 01:07:02 I! Set HOME: /home/cwagent
2024-04-17T01:07:02Z I! Starting AmazonCloudWatchAgent CWAgent/1.300035.0b547 (go1.22.1; linux; amd64) with log file /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log with log tar>
2024-04-17T01:07:02Z I! AWS SDK log level not set
2024-04-17T01:07:02Z I! creating new logs agent
2024-04-17T01:07:02Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"405f80ef7893", Flush Interval:1s
2024-04-17T01:07:02Z I! [logagent] start logs plugin file paths [/var/log/syslog /var/log/mailman/smtp.log /var/log/mailman/bounce.log /var/log/mailman/mailman.log /var/log/mailman/http.log /var/>
2024-04-17T01:07:02Z I! [inputs.logfile] turned on logs plugin
2024-04-17T01:07:02Z I! [logagent] starting
2024-04-17T01:07:02Z I! [logagent] found plugin cloudwatchlogs is a log backend
2024-04-17T01:07:02Z I! [logagent] found plugin logfile is a log collection
2024-04-17T01:07:02Z I! [logagent] start logs plugin file paths [/var/log/syslog /var/log/mailman/smtp.log /var/log/mailman/bounce.log /var/log/mailman/mailman.log /var/log/mailman/http.log /var/>
2024-04-17T01:07:02Z I! [inputs.logfile] turned on logs plugin
2024-04-17T01:07:03Z I! [inputs.logfile] Reading from offset 1120 in /var/log/rescuequeue.log
2024-04-17T01:07:03Z I! [outputs.cloudwatchlogs] Configured middleware on AWS client
2024-04-17T01:07:03Z I! [logagent] piping log from /ecs/mailman/172.31.2.167_i-031718c3e8271da75_syslog(/var/log/syslog) to cloudwatchlogs with retention -1
2024-04-17T01:07:03Z I! [outputs.cloudwatchlogs] Configured middleware on AWS client
2024-04-17T01:07:03Z I! [logagent] piping log from /ecs/mailman/172.31.2.167_i-031718c3e8271da75_smtp(/var/log/mailman/smtp.log) to cloudwatchlogs with retention -1
2024-04-17T01:07:03Z I! [outputs.cloudwatchlogs] Configured middleware on AWS client

Between these attempts I made no changes to the permissions. I conclude there must be a software fault in the cloud watch logs agent and recommend further investigation.

In the mean time I'll amend my cwagent to run as root, which is not really a preferred configuration.

FYI I happen to work for Amazon, but this project is not related to my Amazon employment. But if AWS tech staff want to contact me, you'll find me in the corporate directory.

@jpSimkins
Copy link

jpSimkins commented Apr 26, 2024

Having the same issue

Copy link
Contributor

This issue was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label Jul 26, 2024
@NigelSwinson
Copy link
Author

I understand this will be harder to fix than some, given it is intermittent, but the workaround is elevated permissions, which reduces security, so perhaps this will make the cut to look at one day?

@github-actions github-actions bot removed the Stale label Aug 8, 2024
@chubbard
Copy link

The problem is that the cloudwatch agent doesn't log an error if it can't read a particular log config. It silently just skips it and doesn't write anything out. And exactly like the OP described a simple flip of a group owner on a directory or modification of permissions from an RPM/DEB and boom all of your logs can stop being posted and NOTHING is said about it. It's a simple fix just to put a log statement in the cloudwatch agent to just log this out so we don't spend months pulling our hairout staring at aws permissions wondering "what permission does this thing NOT have?!"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants