Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large files are very slow to read locally #219

Open
yogevyuval opened this issue Jan 13, 2021 · 2 comments
Open

Large files are very slow to read locally #219

yogevyuval opened this issue Jan 13, 2021 · 2 comments

Comments

@yogevyuval
Copy link

Trying to read a 15M gzipped file (200M uncompressed) shows a strange behaviour, with logstash 7.9.2. I tried reading plain files and it doesnt seem to have a difference.

  1. It takes about a minute to read the file entirely (reading from an EC2 machine on the same region)
  2. Trying locally with python or bash, takes about 5 seconds to read the entire file.

After looking at the debug logs it seems that the download part is really fast, and the local processing is very slow.

[2021-01-12T14:45:50,258][DEBUG][logstash.inputs.s3] Processing {:bucket=>"test", :key=>"logs/sample1.gz"}
[2021-01-12T14:45:50,259][DEBUG][logstash.inputs.s3] Downloading remote file {:remote_key=>"logs/sample1.gz", :local_filename=>"/tmp/logstash/sample1.gz"}
[2021-01-12T14:45:50,572][DEBUG][logstash.inputs.s3] Processing file {:filename=>"/tmp/logstash/sample1.gz"}
[2021-01-12T14:46:40,435][DEBUG][logstash.inputs.s3] Processing {:bucket=>"test", :key=>"logs/sample2.gz"}

I tried looking into the source code of the plugin, removing some of the code to be able to pinpoint the problem. Eventually I removed almost every line in the process_local_log function, keeping only the codec (which is plain by default) decoding, and the queue << event line. It seems that this is the line that is taking most of the time.

Any idea what could be the cause of this? This makes the plugin almost unusable in use cases with large volumes for example flow logs forwarding

@kaisecheng
Copy link
Contributor

Could you share your pipeline config? What is the output plugin?
Do you see the same problem in 7.12?

@yogevyuval
Copy link
Author

Could you share your pipeline config? What is the output plugin?
Do you see the same problem in 7.12?

The output plugin was file output plugin, and we havent tried it in 7.12. Unfortunately I don't have the exact pipeline but it was very simple (read from s3, write to file)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants