-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leaks (doubles the input buffer size at regular intervals, with no limit on that) #154
Comments
The size issue is almost exactly 3 x (not double, like I said before). i.e. Each time it's consumed X bytes, it flushes its buffer, and X=X*3 for next time... Same problem using brotli instead of bzip2 (in case you're wondering)
|
Digging further into this... I notice that using a buffer size of 310 or less bytes tends not to cause the problem Using 320 or more bytes for the buffer, and the problem rapidly grows. Strange, but true. i.e. This works, albeit annoyingly slowly, owing to the tiny buffer...
Output:-
Memory:-
|
I spent all day trying to find and fix the bug, with no luck. I attempted to use IPC::Run3 as an alternative, but that has even more bugs, and doesn't seem to have any pump() mechanism, making it not very useful. If anyone can give me some clues about where and/or why this buffer-size-tripling might be taking place, do please let me know!! The code is almost beyond comprehension to me - I only have 26 years perl experience... |
OK - so another whole day on this problem, and I discovered this workaround. Basically - if IPC::Run does not consume the entire input you last gave it, you must back-off the amount of data you give it next time (to 240 bytes max) until pump() has finally consumed it all, then you can go back to sending larger buffers. 65536 seems to be the most it ever reads in one go - so use that. Working example:-
|
Two+ more days on this problem, again... the bug appears to be inside Run.pm in the _select_loop handler: When the pipe is finish()'d it enters a loop which drains all the process output into the buffer "while ( $self->pumpable )" That's wrong. e.g. A brotli-compressed logfile can easily return tens or hundreds of gigs at this point. I'm not sure how to fix... |
The docs show the example code assigning to (i.e. overwriting) the contents of |
$in is not necessarily always consumed entirely in each "pump" (depends on the simplicity/triviality of the receiving process I expect, but for brotli and bzip2, and in-general all well-behaved code etc, you can't just overwrite $in) The bug is in the finish() sub in Run.pm:-
It doesn't yield to the caller at any time until it has accumulated the entirety of the decompressed-output in RAM... which, for my brotli-compressed log files when you finally reach the end of the compressed input, is way more ram and swap than I've got... |
Having stared at the various versions of your code for about 15 mins, I am somewhat stumped: all the abbreviated variable names may make perfect sense to you, but I am not going to write myself a lookup chart to constantly refer back to, to keep decoding what e.g. What you say about
I don't understand how there is room for any large amount of data to be pending in the above scenario, for |
@mohawk2 - compression programs consume input, and spit back expanded output - but it's not a "give one get one" relationship - you might give it data, and get nothing back (yet). When it comes to the end of the file, the decompressor, before it knows that no more input is coming, has not necessarily given you back all the expanded output it can - only when you finally close the input handle, does the decompressor know that no more data is coming in - and at this point, it finally returns the expansion of everything you gave it before. That's where the bug is. It's an architectural error on the part of the module author - not something that can be trivially fixed by changing the code - the entire way the module works needs to be changed... The bug is that after you close the handle, it accumulates all the output into memory, because there's no way that the caller can continue to iterate over the (now closed, or half-closed) pump loop after performing the close. I fixed this by writing my own code to replace the flawed IPC::Run version entirely - you can see my approach here: https://github.com/gitcnd/perl-IPC-Open3-example/blob/master/ipc_open3_example.pl |
It sounds like this module's API needs a |
Seems to indefinitely grow the input buffer by double at regular intervals, resulting in system memory starvation eventually.
Example of problem:-
The code causing the leak:-
Outputs:-
Example input file size:-
Version:-
The text was updated successfully, but these errors were encountered: