Use non-contiguous buffer for codec and transport #1559
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
This PR aims to address performance concerns related to memory copying when transferring large data chunks, as outlined in #1558
Solution
To minimize memory copying, we've introduced a specialized buffer backed by non-contiguous memory. Taking cues from the C++ gRPC solution's
SliceBuffer
, this new buffer structure aims to optimize memory handling, especially in scenarios such as Arrow Flight, which involves transferring substantial data volumes.Detailed changes and considerations are discussed in #1558
Perf benchmark
I both run the existing benchmarks for decoder, and also put together an end-to-end benchmarks for the Arrow Flight scenarios (which is elaborated in #1558).
Decoder benchmark
Comparative results between the
tonic
master branch and this PR show performance improvements across most scenarios:Master Branch Results:
This PR Results:
It's notable that in alignment with
h2
's implementation,message_size
always equalschunk_size
in the benchmark. This PR improves performance across all related scenarios.Arrow Flight End-to-End Benchmark
Simulated scenarios involve the client invoking the
DoExchange
RPC method, creating a bidi-streaming channel with the server, with multipleFlightData
objects exchanged.Scenario:
data_body
size 64KB, 10 chunks exchanged462 MB/s
600 MB/s
Scenario:
data_body
size 1MB, 10 chunks exchanged992 MB/s
1375 MB/s
The benchmarks conclusively show that this PR enhances throughput across various scenarios, offering a more efficient solution for data transfer in Arrow Flight use cases and beyond.