Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: don't allocate in UDP recv path #2184

Merged
merged 4 commits into from
Oct 23, 2024
Merged

feat: don't allocate in UDP recv path #2184

merged 4 commits into from
Oct 23, 2024

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Oct 17, 2024

This is the final stacked pull request combining the following commits:

feat(common): make Datagram generic over payload type

Previously the payload type of Datagram was Vec<u8>, thus the Datagram always owned the UDP datagram payload.

This change enables Datagram to represent both (a) an owned payload allocation (i.e. Vec<u8>), but also represent (b) a view into an existing payload (e.g. a long lived receive buffer via &[u8]).

The default payload type stays Vec<u8>, thus not breaking existing usage of Datagram.

feat(transport): accept borrowed instead of owned input datagram

Previously process_input (and the like) took a Datagram<Vec<u8>> as input. In other words, it required allocating each UDP datagram payload in a dedicated Vec<u8> before passing it to process_input.

With this patch, process_input accepts a Datagram<&[u8]>. In other words, it accepts a Datagram with a borrowed view into an existing buffer (&[u8]), e.g. a long lived receive buffer.

feat(udp): return borrowed Datagram on receive

Previously recv_inner would return Datagram<Vec<u8>>. In other words, it would allocate a new Vec<u8> for each UDP datagram payload.

Now recv_inner reads into a provided buffer and returns Datagram<&[u8]>, i.e. it returns a view into the provided buffer without allocating.

feat(bin): don't allocate in UDP recv path

Pass a long lived receive buffer to neqo_udp::recv_inner, receiving an iterator of Datagram<&[u8]>s pointing into that buffer, thus no longer allocating in UDP receive path.


Extracted out of #2093.

Part of #1693.

Pull request stack:

  1. (this one) feat: don't allocate in UDP recv path #2184
  2. feat(transport): accept borrowed instead of owned input datagram #2187
  3. feat(udp): return borrowed Datagram on receive #2188
  4. feat(bin): don't allocate in UDP recv path #2189

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 99.45848% with 3 lines in your changes missing coverage. Please review.

Project coverage is 95.39%. Comparing base (9f1c2f3) to head (2b322d7).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
neqo-common/src/datagram.rs 93.18% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2184   +/-   ##
=======================================
  Coverage   95.39%   95.39%           
=======================================
  Files         112      112           
  Lines       36373    36447   +74     
=======================================
+ Hits        34697    34769   +72     
- Misses       1676     1678    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Oct 17, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

github-actions bot commented Oct 21, 2024

Benchmark results

Performance differences relative to b5e0374.

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.
       time:   [98.917 ns 99.193 ns 99.481 ns]
       change: [-0.9459% -0.5340% -0.0999%] (p = 0.02 < 0.05)

Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe

coalesce_acked_from_zero 3+1 entries: Change within noise threshold.
       time:   [117.01 ns 117.37 ns 117.75 ns]
       change: [-0.8819% -0.5415% -0.2079%] (p = 0.00 < 0.05)

Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
10 (10.00%) high severe

coalesce_acked_from_zero 10+1 entries: Change within noise threshold.
       time:   [116.51 ns 116.85 ns 117.29 ns]
       change: [-1.3921% -0.8099% -0.3062%] (p = 0.00 < 0.05)

Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low severe
3 (3.00%) low mild
3 (3.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [96.717 ns 101.78 ns 112.97 ns]
       change: [-2.1810% +0.6453% +6.4132%] (p = 0.83 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.61 ms 111.67 ms 111.74 ms]
       change: [-0.3427% -0.2718% -0.1991%] (p = 0.00 < 0.05)

Found 22 outliers among 100 measurements (22.00%)
4 (4.00%) low severe
4 (4.00%) low mild
5 (5.00%) high mild
9 (9.00%) high severe

transfer/pacing-false/varying-seeds: No change in performance detected.
       time:   [25.850 ms 26.759 ms 27.654 ms]
       change: [-7.9363% -3.0643% +2.2316%] (p = 0.24 > 0.05)
transfer/pacing-true/varying-seeds: No change in performance detected.
       time:   [33.889 ms 35.649 ms 37.426 ms]
       change: [-6.8098% -0.1727% +6.8300%] (p = 0.96 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-false/same-seed: No change in performance detected.
       time:   [25.294 ms 25.955 ms 26.623 ms]
       change: [-6.4483% -2.4512% +1.7271%] (p = 0.25 > 0.05)
transfer/pacing-true/same-seed: No change in performance detected.
       time:   [39.871 ms 41.886 ms 43.943 ms]
       change: [-9.8397% -3.6845% +3.0139%] (p = 0.27 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-100mb-resp/mtu-1500 (aka. Download)/client: No change in performance detected.
       time:   [871.83 ms 880.49 ms 889.33 ms]
       thrpt:  [112.44 MiB/s 113.57 MiB/s 114.70 MiB/s]
change:
       time:   [-0.7754% +0.5489% +1.9642%] (p = 0.44 > 0.05)
       thrpt:  [-1.9264% -0.5459% +0.7815%]
1-conn/10_000-parallel-1b-resp/mtu-1500 (aka. RPS)/client: No change in performance detected.
       time:   [316.33 ms 319.33 ms 322.42 ms]
       thrpt:  [31.015 Kelem/s 31.315 Kelem/s 31.613 Kelem/s]
change:
       time:   [-1.3503% -0.0743% +1.3366%] (p = 0.92 > 0.05)
       thrpt:  [-1.3190% +0.0743% +1.3688%]
1-conn/1-1b-resp/mtu-1500 (aka. HPS)/client: No change in performance detected.
       time:   [33.780 ms 34.009 ms 34.259 ms]
       thrpt:  [29.189  elem/s 29.404  elem/s 29.603  elem/s]
change:
       time:   [-1.5083% -0.5383% +0.4334%] (p = 0.29 > 0.05)
       thrpt:  [-0.4315% +0.5412% +1.5314%]

Found 16 outliers among 100 measurements (16.00%)
7 (7.00%) low mild
2 (2.00%) high mild
7 (7.00%) high severe

1-conn/1-100mb-resp/mtu-65536 (aka. Download)/client: 💚 Performance has improved.
       time:   [111.51 ms 111.80 ms 112.08 ms]
       thrpt:  [892.23 MiB/s 894.46 MiB/s 896.77 MiB/s]
change:
       time:   [-2.0616% -1.5830% -1.0883%] (p = 0.00 < 0.05)
       thrpt:  [+1.1003% +1.6085% +2.1050%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

1-conn/10_000-parallel-1b-resp/mtu-65536 (aka. RPS)/client: No change in performance detected.
       time:   [315.47 ms 318.63 ms 321.84 ms]
       thrpt:  [31.072 Kelem/s 31.384 Kelem/s 31.699 Kelem/s]
change:
       time:   [-1.0625% +0.4502% +1.9110%] (p = 0.55 > 0.05)
       thrpt:  [-1.8752% -0.4482% +1.0739%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp/mtu-65536 (aka. HPS)/client: No change in performance detected.
       time:   [33.916 ms 34.102 ms 34.308 ms]
       thrpt:  [29.148  elem/s 29.324  elem/s 29.484  elem/s]
change:
       time:   [-0.2857% +0.5354% +1.3188%] (p = 0.21 > 0.05)
       thrpt:  [-1.3016% -0.5325% +0.2865%]

Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
3 (3.00%) high mild
6 (6.00%) high severe

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing MTU Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 1504 184.3 ± 99.2 91.1 368.0 1.00
neqo msquic reno on 1504 291.2 ± 85.9 214.7 460.2 1.00
neqo msquic reno 1504 237.9 ± 57.6 208.1 427.0 1.00
neqo msquic cubic on 1504 251.0 ± 54.1 208.3 413.6 1.00
neqo msquic cubic 1504 240.4 ± 54.0 209.8 407.6 1.00
msquic neqo reno on 1504 724.3 ± 39.5 688.5 826.3 1.00
msquic neqo reno 1504 716.5 ± 13.6 686.1 731.3 1.00
msquic neqo cubic on 1504 734.6 ± 85.8 685.8 920.2 1.00
msquic neqo cubic 1504 692.1 ± 15.8 667.2 722.5 1.00
neqo neqo reno on 1504 475.3 ± 99.3 412.7 672.4 1.00
neqo neqo reno 1504 424.9 ± 17.9 401.8 459.0 1.00
neqo neqo cubic on 1504 484.3 ± 112.3 414.9 750.8 1.00
neqo neqo cubic 1504 425.9 ± 19.5 406.6 472.8 1.00
msquic msquic 65536 126.3 ± 38.0 91.8 215.3 1.00
neqo msquic reno on 65536 272.7 ± 90.9 209.7 458.6 1.00
neqo msquic reno 65536 209.5 ± 8.1 201.0 223.6 1.00
neqo msquic cubic on 65536 264.2 ± 82.4 203.1 405.7 1.00
neqo msquic cubic 65536 209.3 ± 8.7 200.4 227.4 1.00
msquic neqo reno on 65536 112.3 ± 60.4 82.2 320.9 1.00
msquic neqo reno 65536 117.7 ± 57.3 82.0 323.2 1.00
msquic neqo cubic on 65536 88.4 ± 9.0 81.9 113.8 1.00
msquic neqo cubic 65536 161.0 ± 98.8 84.7 354.5 1.00
neqo neqo reno on 65536 205.9 ± 100.4 123.0 397.5 1.00
neqo neqo reno 65536 158.4 ± 58.6 115.0 354.4 1.00
neqo neqo cubic on 65536 206.1 ± 98.9 121.8 433.2 1.00
neqo neqo cubic 65536 154.4 ± 27.0 120.3 225.9 1.00

⬇️ Download logs

@mxinden mxinden changed the title feat(common): make Datagram generic over payload type feat: don't allocate in UDP recv path Oct 22, 2024
@mxinden
Copy link
Collaborator Author

mxinden commented Oct 22, 2024

This is the final pull request in the stack. All commits, but the first (08a0003), have been reviewed and merged through the 3 previous stacked pull requests.

@martinthomson and @larseggert can you do one more review?

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

neqo-common/src/datagram.rs Outdated Show resolved Hide resolved
neqo-udp/src/lib.rs Show resolved Hide resolved
Previously the payload type of `Datagram` was `Vec<u8>`, thus the `Datagram`
always owned the UDP datagram payload.

This change enables `Datagram` to represent both (a) an owned payload allocation
(i.e. `Vec<u8>`), but also represent (b) a view into an existing payload (e.g. a
long lived receive buffer via `&[u8]`).

The default payload type stays `Vec<u8>`, thus not breaking existing usage of
`Datagram`.
Previously `process_input` (and the like) took a `Datagram<Vec<u8>>` as input.
In other words, it required allocating each UDP datagram payload in a dedicated
`Vec<u8>` before passing it to `process_input`.

With this patch, `process_input` accepts a `Datagram<&[u8]>`. In other words, it
accepts a `Datagram` with a borrowed view into an existing buffer (`&[u8]`),
e.g. a long lived receive buffer.
Previously `recv_inner` would return `Datagram<Vec<u8>>`. In other words, it
would allocate a new `Vec<u8>` for each UDP datagram payload.

Now `recv_inner` reads into a provided buffer and returns `Datagram<&[u8]>`,
i.e. it returns a view into the provided buffer without allocating.
Pass a long lived receive buffer to `neqo_udp::recv_inner`, receiving an
iterator of `Datagram<&[u8]>`s pointing into that buffer, thus no longer
allocating in UDP receive path.
@larseggert
Copy link
Collaborator

@mxinden is this good to go? Nothing more pending?

@mxinden
Copy link
Collaborator Author

mxinden commented Oct 23, 2024

Nothing else pending from my end @larseggert. Please merge in case you have no further comments.

I will propose a new release once merged. Excited to get this into Firefox.

@larseggert larseggert added this pull request to the merge queue Oct 23, 2024
Merged via the queue into main with commit 934b62d Oct 23, 2024
63 of 71 checks passed
@larseggert larseggert deleted the recv-no-alloc-common branch October 23, 2024 10:47
@mxinden
Copy link
Collaborator Author

mxinden commented Oct 24, 2024

Thank you for the help here!

For early CPU profiles of this patch in Firefox see #2198 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants