-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch writes in the file
sink
#20784
Comments
It would seem that one event at a time with compression would lower the symbol set massively, leading to very poor compression values. Isn't this what the "buffer" keyword implies? My expectation is that the number of events indicated by "buffer" would be put into a queue, and then compressed/written when that number was exceeded. (This begs the question of "why not a time- or bytes-based buffer flush setting as well?") |
The |
Thanks for the clarification @jszwedko - I didn't quite understand those differences and that is useful for other things I'm working on. So what you're suggesting is actually adding "batch" parameters to the "file" sink? Does compression happen after buffering, or after batching, and is that consistent in all the sinks? |
Yeah, that's correct, I'd like to add |
A note for the community
Use Cases
The
file
sink performance, especially when compressing, seems to be particular poor as evidenced by the investigation in #20739. One culprit of this seems likely to be that thefile
sink writes (and compresses) one event at a time. I think the throughput is likely to improve through batching writes.Attempted Solutions
No response
Proposal
Add
batch
configuration to thefile
sink and batch writes.References
Version
v0.39.0
The text was updated successfully, but these errors were encountered: