Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 plugin not functioning correctly for GZ files from Firehose #180

Open
apatnaik14 opened this issue Jul 12, 2019 · 9 comments
Open

S3 plugin not functioning correctly for GZ files from Firehose #180

apatnaik14 opened this issue Jul 12, 2019 · 9 comments
Assignees

Comments

@apatnaik14
Copy link

apatnaik14 commented Jul 12, 2019

I was testing the s3 plugin for a production POC where a Firehose delivery system is delivering Cloudwatch logs into an S3 bucket from where I am reading it with the S3 plugin into logstash

My logstash config is as below:

input {
s3 {
bucket => "test"
region => "us-east-1"
role_arn => "test"
interval => 10
additional_settings => {
"force_path_style" => true
"follow_redirects" => false
}
}
}

output {
elasticsearch {
hosts => ["http://localhost:9200"]
sniffing => false
index => "s3-logs-%{+YYYY-MM-dd}"
}
stdout { codec => rubydebug }
}

As I start up logstash locally, I can see the data reaching to logstash but its not in proper format, like below.

{
"type" => "s3",
"message" => "\u001F�\b\u0000\u0000\u0000\u0000\u0000\u0000\u0000͒�n\u00131\u0010�_��\u0015�����x���MC)\u0005D\u0016!**************************************",
"@Version" => "1",
"@timestamp" => 2019-07-12T15:32:37.328Z
}

I also tried adding a codec => "gzip_lines" into the configuration, but then logstash was not able to process those files at all. The documentation suggests S3 plugin is supposed to support GZ files out of the box. I was hoping if anyone could point out what I am doing wrong?

Regards,
Arpan

Please find below version and OS information.

  • Version: Logstash 7.1.1 (Plugin logstash-input-s3-3.4.1)
  • Operating System: Ubuntu 17.04
  • Config File (if you have sensitive info, please remove it): Added above
  • Sample Data: N.A
  • Steps to Reproduce: Mentioned above.
@apatnaik14 apatnaik14 changed the title [Issue] S3 plugin not functioning correctly for GZ files from Firehose S3 plugin not functioning correctly for GZ files from Firehose Jul 12, 2019
@yaauie yaauie self-assigned this Jul 23, 2019
@Luk3rson
Copy link

Hi @yaauie,
I am having the same issue. Is there an update on this?
I tried to use several different decoders. Without any results.

Thanks a lot.

@apatnaik14
Copy link
Author

Hi @yaauie !

I was hoping to check on the plan to merge the above changes into the plugin?

Regards,
Arpan

@mrudrara
Copy link

@apatnaik14 I am in similar boat as you! Wondering if you had any luck with other workarounds you may have tried?

@Luk3rson
Copy link

Hey @mrudrara ,
I created simple Lambda function which adds the extension to each file uploaded to the S3 bucket.
This lambda is invoked by the PUT Event rule of the S3 bucket.
I can share the function if you like to.

@mrudrara
Copy link

Thanks @Luk3rson! Really appreciate it. Wondering if you had issues with too many lambda invocations ever?

@mrudrara
Copy link

@Luk3rson can you share the function may be gist

thanks in advance

@Luk3rson
Copy link

Hi @mrudrara
Apologize for the late reply,
Here is my function Luk3rson's GZIP Lambda convertor
Regards

@mrudrara
Copy link

Hi @Luk3rson Really appreciate it. Meanwhile while working AWS Support engineer they also recommended "Data Transformation with Lambda"

@glen-uc
Copy link

glen-uc commented Apr 12, 2021

Hi @Luk3rson,@mrudrara

If the folder only contains gz logs then you can add this filter in the s3 plugin (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3.html#plugins-inputs-s3-gzip_pattern)

gzip_pattern >= ".*?$"

So that input plugin will treat the files as gz without appending a gz extension using the lambda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants