-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: Fix Reader leak by removing useless copy #12079
base: main
Are you sure you want to change the base?
Conversation
The ReadConf copy constructor will nullify the reader of source, leaving the reader of original unclosed
leak stacktrace looks like(iceberg 1.2.x)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the history, this copy
has been in there since the beginning. I don't think it is needed since the conf
doesn't go outside of the class.
Wait, I think the fix is incorrect.
The readConf do transfer ownership to FileIterator via init return, in
which should be cloused by caller.
…On Fri, Jan 24, 2025 at 19:10 Eduard Tudenhoefner ***@***.***> wrote:
***@***.**** approved this pull request.
—
Reply to this email directly, view it on GitHub
<#12079 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEKPIB4YB5LZQQHME57ZDT2MINTFAVCNFSM6AAAAABVYZGKW2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNZSGM3DENBXHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I think I found the root cause.
comments inline. static SeekableInputStream stream(org.apache.iceberg.io.SeekableInputStream stream) {
if (stream instanceof DelegatingInputStream) {
// for somehow, it tries to optimistic steal the underlying stream, creating a new one by wrapping it.
// the caller close the wrapper and underlying stream but not this intermediate one.
InputStream wrapped = ((DelegatingInputStream) stream).getDelegate();
if (wrapped instanceof FSDataInputStream) {
return HadoopStreams.wrap((FSDataInputStream) wrapped);
}
}
return new ParquetInputStreamAdapter(stream);
} A similar approach for the output version. Can we just go through the Adapter path? |
This reverts commit 6b655c0.
Wrapping required a clear way to call close of the provided stream, without also closing the underlying stream.
@@ -82,22 +82,10 @@ static OutputFile file(org.apache.iceberg.io.OutputFile file, Configuration conf | |||
} | |||
|
|||
static SeekableInputStream stream(org.apache.iceberg.io.SeekableInputStream stream) { | |||
if (stream instanceof DelegatingInputStream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please add a test that tries to reproduce the original problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
The ReadConf copy constructor will nullify the reader of source, leaving the reader of original unclosed