GH-3080: HadoopStreams to support ByteBufferPositionedReadable #3096
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rationale for this change
If a stream declares in its StreamCapabilities that it supports
ByteBufferPositionedReadable
, then use that API forreadFully(ByteBuffer)
Adding support for Hadoop
ByteBufferPositionedReadable
streams may improve performanceby pushing retry/recovery logic into the filesystem client library.
This interface is implemented by the HDFS input stream; we are considering adding
it elsewhere.
What changes are included in this PR?
H3ByteBufferInputStream
HadoopStreams
if theFSDataInputStream
is considered suitable.Class
H3ByteBufferInputStream
The reading is done in a new class,
H3ByteBufferInputStream
, which subclassesH2ByteBufferInputStream
. This reduces the amount of duplicate code, it just makes it a bit unclean.The purist way to do it would be to create an abstract superclass
HadoopInputStream
to hold all commonality between the the three input streams.I'm happy to do this, just didn't want to doing some larger refactoring without (a) showing the core design worked and (b) getting permission to do this. Should I do this?
HadoopStreams
changesSelection of the new input stream is done if and only if the stream declares the capability
in:preadbytebuffer
.There is no equivalent of
isWrappedStreamByteBufferReadable()
which recurses througha chain of wrapped streams looking for the API.
If a stream doesn't declare its support for the API, it won't get picked up.
This is done knowing that the sole production implemenation which currently exists,
the HDFS input stream, does declare this capability.
Are these changes tested?
There is new test suite, for new behavior and ensuring that the integration with
HadoopStreams still retains the correct behavior for existing streams.
Suite is parameterized on heap and direct buffers.
Are there any user-facing changes?
No
Closes GH-3080