Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use non-contiguous buffer for codec and transport #1559

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

xiaoyawei
Copy link
Contributor

@xiaoyawei xiaoyawei commented Oct 25, 2023

Motivation

This PR aims to address performance concerns related to memory copying when transferring large data chunks, as outlined in #1558

Solution

To minimize memory copying, we've introduced a specialized buffer backed by non-contiguous memory. Taking cues from the C++ gRPC solution's SliceBuffer, this new buffer structure aims to optimize memory handling, especially in scenarios such as Arrow Flight, which involves transferring substantial data volumes.

Detailed changes and considerations are discussed in #1558

Perf benchmark

I both run the existing benchmarks for decoder, and also put together an end-to-end benchmarks for the Arrow Flight scenarios (which is elaborated in #1558).

Decoder benchmark

Comparative results between the tonic master branch and this PR show performance improvements across most scenarios:

Master Branch Results:

test chunk_size_100   ... bench:       1,069 ns/iter (+/- 201) = 940 MB/s
test chunk_size_1005  ... bench:         704 ns/iter (+/- 260) = 1427 MB/s
test chunk_size_500   ... bench:         758 ns/iter (+/- 212) = 1325 MB/s
test message_count_1  ... bench:         681 ns/iter (+/- 33) = 741 MB/s
test message_count_10 ... bench:       2,605 ns/iter (+/- 906) = 1938 MB/s
test message_count_20 ... bench:       4,577 ns/iter (+/- 1,339) = 2206 MB/s
test message_size_10k ... bench:       2,463 ns/iter (+/- 423) = 8124 MB/s
test message_size_1k  ... bench:         921 ns/iter (+/- 173) = 2182 MB/s
test message_size_5k  ... bench:       1,345 ns/iter (+/- 293) = 7442 MB/s

This PR Results:

test chunk_size_100   ... bench:       1,174 ns/iter (+/- 55) = 856 MB/s
test chunk_size_1005  ... bench:         580 ns/iter (+/- 37) = 1732 MB/s
test chunk_size_500   ... bench:         808 ns/iter (+/- 89) = 1243 MB/s
test message_count_1  ... bench:         585 ns/iter (+/- 39) = 863 MB/s
test message_count_10 ... bench:       1,303 ns/iter (+/- 149) = 3875 MB/s
test message_count_20 ... bench:       2,097 ns/iter (+/- 195) = 4816 MB/s
test message_size_10k ... bench:       2,080 ns/iter (+/- 657) = 9620 MB/s
test message_size_1k  ... bench:         660 ns/iter (+/- 72) = 3045 MB/s
test message_size_5k  ... bench:       1,393 ns/iter (+/- 48) = 7185 MB/s

It's notable that in alignment with h2's implementation, message_size always equals chunk_size in the benchmark. This PR improves performance across all related scenarios.

Arrow Flight End-to-End Benchmark

Simulated scenarios involve the client invoking the DoExchange RPC method, creating a bidi-streaming channel with the server, with multiple FlightData objects exchanged.

Scenario: data_body size 64KB, 10 chunks exchanged

  • Master Branch: Throughput: 462 MB/s
  • This PR: Throughput: 600 MB/s

Scenario: data_body size 1MB, 10 chunks exchanged

  • Master Branch: Throughput: 992 MB/s
  • This PR: Throughput: 1375 MB/s

The benchmarks conclusively show that this PR enhances throughput across various scenarios, offering a more efficient solution for data transfer in Arrow Flight use cases and beyond.

@xiaoyawei xiaoyawei force-pushed the zero_copy_bytes branch 3 times, most recently from 35cfa8e to f481616 Compare November 16, 2023 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant