Optimize VarInt decoding #1117

Matthias247 · 2021-05-09T00:01:59Z

Upfront: This isn't the most elegant change. But given VarInts sit in the
hot path it seems worth it.

The current VarInt decoding performs some duplicate bounds check.
VarInt::decode checks for available bytes in order to return an error.
And bytes::Buf performs additional bounds checks in various helper methods.

By using Buf::chunk and Buf::advance directly we can save some of those.
We could improve even more by only support decoding from contiguous buffers,
but that would change the API contract.

In addition it seemed like the u16 be->le conversion wasn't as free
as it should be. A 2byte LE buffer improves things here.

Overall this reduces CPU time spent in VarInt decoding from about 2% to around 1%.

Upfront: This isn't the most elegant change. But given VarInts sit in the hot path it seems worth it. The current VarInt decoding performs some duplicate bounds check. `VarInt::decode` checks for available bytes in order to return an error. And `bytes::Buf` performs additional bounds checks in various helper methods. By using `Buf::chunk` and `Buf::advance` directly we can save some of those. We could improve even more by only support decoding from contiguous buffers, but that would change the API contract. In addition it seemed like the `u16` be->le conversion wasn't as free as it should be. A 2byte LE buffer improves things here. Overall this reduces CPU time spent in VarInt decoding from about 2% to around 1%.

Matthias247 · 2021-05-09T00:04:47Z

For anyone who is more curious. Here is a baseline flamegraph. "Matched" is the amount of samples in VarInt::decode (2.6%).
Inside the method we can also see 0.68% time spent in swap_bytes.

Here is the changed version, with 1.2% of samples in VarInt::decode:

Ralith

Should we have a microbenchmark for this path?

Ralith · 2021-05-09T00:31:33Z

quinn-proto/src/varint.rs

+        // check. We already performed this one earlier on.
+        buf[0] = r.chunk()[0];
+        r.advance(1);
+


Does this specific change actually help? I wouldn't expect this to have any difference in bounds checking semantics.

There's an assert! in the get_u8, not just a rust array bounds check. I don't think those are removed. And since even without the "le" change this proved to be faster, I guess practical experience shows that it helps.

djc

Uh, left this sitting around for a long time. @Matthias247 what do you think?

djc · 2021-05-17T20:23:55Z

quinn-proto/src/varint.rs

@@ -159,21 +163,35 @@ impl Codec for VarInt {
                if r.remaining() < 1 {


Could we do this without unsafe if we use <&[u8; 1]>::try_from() to extract an array out of the bounds check?

Along these lines:

diff --git a/quinn-proto/src/varint.rs b/quinn-proto/src/varint.rs index 7f57bc3c..8f4b9b7c 100644 --- a/quinn-proto/src/varint.rs +++ b/quinn-proto/src/varint.rs @@ -1,4 +1,4 @@ -use std::{convert::TryInto, fmt}; +use std::{convert::{TryInto, TryFrom}, fmt}; use bytes::{Buf, BufMut}; use thiserror::Error; @@ -146,38 +146,37 @@ pub struct VarIntBoundsExceeded; impl Codec for VarInt { fn decode<B: Buf>(r: &mut B) -> coding::Result<Self> { - if !r.has_remaining() { - return Err(UnexpectedEnd); - } let mut buf = [0; 8]; - buf[0] = r.get_u8(); + let input = r.chunk(); + buf[0] = match input.get(0) { + Some(&b) => b, + None => return Err(UnexpectedEnd), + }; + let tag = buf[0] >> 6; buf[0] &= 0b0011_1111; - let x = match tag { - 0b00 => u64::from(buf[0]), + let (x, len) = match tag { + 0b00 => (u64::from(buf[0]), 1), 0b01 => { - if r.remaining() < 1 { - return Err(UnexpectedEnd); - } - r.copy_to_slice(&mut buf[1..2]); - u64::from(u16::from_be_bytes(buf[..2].try_into().unwrap())) + let arr = [0, buf[0]]; + let read = <&[u8; 1]>::try_from(&input[1..]).map_err(|_| UnexpectedEnd)?; + let arr = [read[0], buf[0]]; + (u64::from(u16::from_le_bytes(arr)), 2) } 0b10 => { - if r.remaining() < 3 { - return Err(UnexpectedEnd); - } - r.copy_to_slice(&mut buf[1..4]); - u64::from(u32::from_be_bytes(buf[..4].try_into().unwrap())) + let read = <&[u8; 3]>::try_from(&input[1..]).map_err(|_| UnexpectedEnd)?; + let arr = [read[2], read[1], read[0], buf[0]]; + (u64::from(u32::from_le_bytes(arr)), 4) } 0b11 => { - if r.remaining() < 7 { - return Err(UnexpectedEnd); - } - r.copy_to_slice(&mut buf[1..8]); - u64::from_be_bytes(buf) + let read = <&[u8; 7]>::try_from(&input[1..]).map_err(|_| UnexpectedEnd)?; + let arr = [read[6], read[5], read[4], read[3], read[2], read[1], read[0], buf[0]]; + (u64::from(u64::from_le_bytes(arr)), 8) } _ => unreachable!(), }; + + r.advance(len); Ok(VarInt(x)) }

Ralith reviewed May 9, 2021

View reviewed changes

Ralith approved these changes May 13, 2021

View reviewed changes

Ralith requested a review from djc May 17, 2021 00:48

djc reviewed Sep 29, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize VarInt decoding #1117

Optimize VarInt decoding #1117

Matthias247 commented May 9, 2021

Matthias247 commented May 9, 2021

Ralith left a comment

Ralith May 9, 2021

Matthias247 May 9, 2021

djc left a comment

djc May 17, 2021

		@@ -159,21 +163,35 @@ impl Codec for VarInt {
		if r.remaining() < 1 {

Optimize VarInt decoding #1117

Are you sure you want to change the base?

Optimize VarInt decoding #1117

Conversation

Matthias247 commented May 9, 2021

Matthias247 commented May 9, 2021

Ralith left a comment

Choose a reason for hiding this comment

Ralith May 9, 2021

Choose a reason for hiding this comment

Matthias247 May 9, 2021

Choose a reason for hiding this comment

djc left a comment

Choose a reason for hiding this comment

djc May 17, 2021

Choose a reason for hiding this comment