Skip to content
This repository has been archived by the owner on Apr 22, 2020. It is now read-only.

Scalyr can't keep up with our logs #525

Open
mo-gr opened this issue Nov 14, 2018 · 9 comments
Open

Scalyr can't keep up with our logs #525

mo-gr opened this issue Nov 14, 2018 · 9 comments

Comments

@mo-gr
Copy link
Contributor

mo-gr commented Nov 14, 2018

One of our applications is creating a lot of logs. They are mostly access logs, so there isn't much we can do to reduce the amount of logs. We create so much logs that the scalyr agent is having trouble processing them. We very frequently see lines like  

10:53:03.131 skipper /var/log/scalyr-agent-2/agent.log  2018-11-14 09:53:03.131Z WARNING [core] [log_processing.py:1734] [error="skipForTooFarBehind"] Skipped copying 105580980 bytes in '/var/log/application.log' due to: Too far behind end of log.  Num of bytes to end is 105580980. 

The way I understand this, scalyr is frequently dropping many MB of logs. This results in very frequent 1-2min long gaps in the scalyr logs. Is there anything we (or you) can do about this?

@mikkeloscar
Copy link
Member

@mo-gr
Copy link
Contributor Author

mo-gr commented Nov 14, 2018

We currently observe this on Taupage-AMI-20181101-120344 (ami-0c8c1409048d397a5)

@szuecs
Copy link
Member

szuecs commented Nov 14, 2018

@mo-gr IMHO you should reduce the log volume, because you create a lot of I/O in a latency critical application affecting the whole business. What you can do is to use https://opensource.zalando.com/skipper/reference/filters/#disableaccesslog to disable logging on some of the routes.

@aryszka
Copy link

aryszka commented Nov 14, 2018

i think these access logs all have to be collected for compliance or similar reasons. Sampling is not an option in this case.

@szuecs
Copy link
Member

szuecs commented Nov 14, 2018

@aryszka who said that?
And "I think" is not a good base for a decision like this ;)

@ChristianLohmann @mrandi do you know if this is true or can find someone responsible to answer the question if team pathfinder has to log all accesslogs in the main shop http router?
This creates a lot of I/O and costs a lot of money. Additionally there is now a technical challenge that has to be solved, if this is the case.

@aermakov-zalando
Copy link
Contributor

@szuecs I'd argue that this is a bug or misconfiguration that should be fixed in Scalyr agent. What's the point of a logging service that can't keep up with a reasonable logging volume for one of our most important applications? And why do the users now have to babysit something as simple as logging and try to customise it on a per-route basis?

@szuecs
Copy link
Member

szuecs commented Nov 14, 2018

@aermakov-zalando I let team logging answer your question. In the end technically you want to do sampling for high volume logs, if this is possible, because of compliance is different.

@christianberg
Copy link
Contributor

I'd suggest not having this discussion in a public GitHub repo. I'll follow up via email.

@mrandi
Copy link
Member

mrandi commented Nov 19, 2018

@szuecs plz create an internal ticket for this. Thx!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants