Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty file check for GZip content-type #55

Open
sawarkarma opened this issue Sep 19, 2024 · 2 comments
Open

Empty file check for GZip content-type #55

sawarkarma opened this issue Sep 19, 2024 · 2 comments
Labels

Comments

@sawarkarma
Copy link

We are evaluating the GZip content-type to reduce the network latency between the Logstash and Google CloudStorage, and we identified one issue related to empty files.
like when there is no content to flush within configured interval it will just create empty files in google cloud storage.
Will check if someone has any workaround for the same. anyway I will propose a change
please approve or suggest a better solution / workaround for the same

Logstash information:

Please include the following information:

  1. Logstash version (e.g. bin/logstash --version) 8.14.3
  2. Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker) : expanded from tar or zip archive
  3. How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes) : command run
  4. How was the Logstash Plugin installed : bin/logstash-plugin install /path/to/gem/file.gem

JVM (e.g. java -version): temurin-11

If the affected version of Logstash is 7.9 (or earlier), or if it is NOT using the bundled JDK or using the 'no-jdk' version in 7.10 (or higher), please provide the following information:

  1. JVM version (java -version)
  2. JVM installation source (e.g. from the Operating System's package manager, from source, etc).
  3. Value of the JAVA_HOME environment variable if set.

OS version (uname -a if on a Unix-like system): 23.6.0 Darwin

Description of the problem including expected versus actual behavior: the expected behaviour is once the interval is finished and there is no content to flush to GCS Bucket it should ignore and rotate the temp file. The Actual behaviour is once the interval period is over and there is no content in the temp GZip it just writes the empty file to GCS Bucket

Steps to reproduce:

Please include a minimal but complete recreation of the problem,
including (e.g.) pipeline definition(s), settings, locale, etc. The easier
you make for us to reproduce it, the more likely that somebody will take the
time to look at it.

  1. use below output plugin configuration
    google_cloud_storage {
    bucket => "bucket-abc"
    temp_directory => "/tmp/"
    log_file_prefix => "AnyPrefix"
    max_file_size_kbytes => 5120
    max_concurrent_uploads => 5
    codec => plain { format => "%{message}" }
    output_format => "json"
    date_pattern => "%Y-%m-%dT%H-%M-00"
    flush_interval_secs => 5
    gzip => true
    gzip_content_encoding => false
    uploader_interval_secs => 60
    include_uuid => true
    include_hostname => true
    }

  2. Install the google_cloud_storage plugin

  3. start the logstash with the given conf

  4. wiat for some time, the empty GZips will be created in the GCS bucket

Provide logs (if relevant): none as of now

@sawarkarma
Copy link
Author

proposing change
#56
please review if possible

@sawarkarma
Copy link
Author

please help us resolving this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant