-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bin): don't allocate in server UDP recv path #2202
base: main
Are you sure you want to change the base?
Conversation
Benchmark resultsPerformance differences relative to 05b4af9. coalesce_acked_from_zero 1+1 entries: No change in performance detected.time: [99.529 ns 99.865 ns 100.21 ns] change: [-0.1174% +0.3717% +0.8660%] (p = 0.15 > 0.05) coalesce_acked_from_zero 3+1 entries: No change in performance detected.time: [117.76 ns 118.07 ns 118.41 ns] change: [-0.6885% +0.0671% +0.7042%] (p = 0.86 > 0.05) coalesce_acked_from_zero 10+1 entries: No change in performance detected.time: [117.04 ns 117.42 ns 117.90 ns] change: [-0.0271% +0.7334% +1.6988%] (p = 0.11 > 0.05) coalesce_acked_from_zero 1000+1 entries: No change in performance detected.time: [97.116 ns 97.271 ns 97.452 ns] change: [-0.8285% +0.0213% +0.7736%] (p = 0.96 > 0.05) RxStreamOrderer::inbound_frame(): Change within noise threshold.time: [112.09 ms 112.14 ms 112.19 ms] change: [+0.2811% +0.3416% +0.4103%] (p = 0.00 < 0.05) transfer/pacing-false/varying-seeds: No change in performance detected.time: [26.443 ms 27.478 ms 28.535 ms] change: [-3.4655% +1.9202% +7.4555%] (p = 0.51 > 0.05) transfer/pacing-true/varying-seeds: No change in performance detected.time: [34.793 ms 36.454 ms 38.119 ms] change: [-3.5960% +2.6120% +9.8175%] (p = 0.45 > 0.05) transfer/pacing-false/same-seed: No change in performance detected.time: [25.374 ms 26.217 ms 27.080 ms] change: [-4.1916% -0.0816% +4.6652%] (p = 0.97 > 0.05) transfer/pacing-true/same-seed: No change in performance detected.time: [40.728 ms 42.847 ms 45.005 ms] change: [-6.5312% -0.0175% +7.3814%] (p = 0.99 > 0.05) 1-conn/1-100mb-resp/mtu-1500 (aka. Download)/client: No change in performance detected.time: [867.39 ms 877.58 ms 888.07 ms] thrpt: [112.60 MiB/s 113.95 MiB/s 115.29 MiB/s] change: time: [-1.6565% -0.1340% +1.4007%] (p = 0.87 > 0.05) thrpt: [-1.3814% +0.1342% +1.6844%] 1-conn/10_000-parallel-1b-resp/mtu-1500 (aka. RPS)/client: Change within noise threshold.time: [326.95 ms 330.95 ms 334.98 ms] thrpt: [29.853 Kelem/s 30.216 Kelem/s 30.586 Kelem/s] change: time: [+0.2293% +1.7788% +3.2955%] (p = 0.03 < 0.05) thrpt: [-3.1903% -1.7477% -0.2288%] 1-conn/1-1b-resp/mtu-1500 (aka. HPS)/client: Change within noise threshold.time: [34.381 ms 34.577 ms 34.787 ms] thrpt: [28.746 elem/s 28.921 elem/s 29.085 elem/s] change: time: [+0.3836% +1.2617% +2.1159%] (p = 0.00 < 0.05) thrpt: [-2.0720% -1.2460% -0.3821%] 1-conn/1-100mb-resp/mtu-1500 (aka. Upload)/client: 💚 Performance has improved.time: [1.6054 s 1.6208 s 1.6362 s] thrpt: [61.118 MiB/s 61.697 MiB/s 62.290 MiB/s] change: time: [-8.9362% -7.7417% -6.5331%] (p = 0.00 < 0.05) thrpt: [+6.9898% +8.3913% +9.8131%] 1-conn/1-100mb-resp/mtu-65536 (aka. Download)/client: 💚 Performance has improved.time: [100.83 ms 101.12 ms 101.40 ms] thrpt: [986.15 MiB/s 988.95 MiB/s 991.78 MiB/s] change: time: [-12.288% -10.127% -8.8506%] (p = 0.00 < 0.05) thrpt: [+9.7100% +11.269% +14.009%] 1-conn/10_000-parallel-1b-resp/mtu-65536 (aka. RPS)/client: 💔 Performance has regressed.time: [320.95 ms 323.75 ms 326.50 ms] thrpt: [30.627 Kelem/s 30.888 Kelem/s 31.158 Kelem/s] change: time: [+1.0502% +2.3047% +3.6421%] (p = 0.00 < 0.05) thrpt: [-3.5141% -2.2527% -1.0393%] 1-conn/1-1b-resp/mtu-65536 (aka. HPS)/client: Change within noise threshold.time: [34.484 ms 34.661 ms 34.847 ms] thrpt: [28.697 elem/s 28.851 elem/s 28.999 elem/s] change: time: [+0.7538% +1.4183% +2.1085%] (p = 0.00 < 0.05) thrpt: [-2.0650% -1.3985% -0.7481%] 1-conn/1-100mb-resp/mtu-65536 (aka. Upload)/client: No change in performance detected.time: [254.10 ms 265.28 ms 277.36 ms] thrpt: [360.54 MiB/s 376.95 MiB/s 393.55 MiB/s] change: time: [-12.552% -5.3656% +1.3585%] (p = 0.16 > 0.05) thrpt: [-1.3403% +5.6698% +14.354%] Client/server transfer resultsTransfer of 33554432 bytes over loopback.
|
02fe570
to
32ce4ae
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2202 +/- ##
==========================================
- Coverage 95.39% 95.36% -0.03%
==========================================
Files 112 112
Lines 36447 36447
==========================================
- Hits 34768 34759 -9
- Misses 1679 1688 +9 ☔ View full report in Codecov by Sentry. |
Failed Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
32ce4ae
to
bdb5da9
Compare
Previously the `neqo-bin` server would read a set of datagrams from the socket and allocate them: ``` rust let dgrams: Vec<Datagram> = dgrams.map(|d| d.to_owned()).collect(); ``` This was done out of convenience, as handling `Datagram<&[u8]>`s, each borrowing from `self.recv_buf`, is hard to get right across multiple `&mut self` functions, that is here `self.run`, `self.process` and `self.find_socket`. This commit combines `self.process` and `self.find_socket` and passes a socket index, instead of the read `Datagram`s from `self.run` to `self.process`, thus making the Rust borrow checker happy to handle borrowing `Datagram<&[u8]>`s instead of owning `Datagram`s.
bdb5da9
to
12bb957
Compare
neqo-bin/src/server/mod.rs
Outdated
input_dgrams.iter_mut().flatten().next().map_or_else( | ||
|| { | ||
// Reading from the socket returned no datagrams. Don't try again. | ||
ready_socket_index = None; | ||
input_dgrams = None; | ||
None | ||
}, | ||
Some, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that clippy didn't pick this one up.
input_dgrams.iter_mut().flatten().next().map_or_else( | |
|| { | |
// Reading from the socket returned no datagrams. Don't try again. | |
ready_socket_index = None; | |
input_dgrams = None; | |
None | |
}, | |
Some, | |
) | |
input_dgrams.iter_mut().flatten().next().or_else( | |
|| { | |
// Reading from the socket returned no datagrams. Don't try again. | |
ready_socket_index = None; | |
input_dgrams = None; | |
None | |
}, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that clippy didn't pick this one up.
And I am surprised that I didn't see this. 🤦 Thank you Martin!
neqo-bin/src/server/mod.rs
Outdated
let ((_host, first_socket), rest) = self.sockets.split_first_mut().unwrap(); | ||
let socket = rest | ||
.iter_mut() | ||
.map(|(_host, socket)| socket) | ||
.find(|socket| { | ||
socket | ||
.local_addr() | ||
.ok() | ||
.map_or(false, |socket_addr| socket_addr == dgram.source()) | ||
}) | ||
.unwrap_or(first_socket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You inlined this, presumably to avoid having to pass &mut self
to it. It's still a useful thing to have broken out. You can make a new function that takes &mut self.sockets
and returns &mut Socket
. Either that or you could go with Self::send(&mut self.sockets, &dgram).await?
and have that function do the sending part as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Keeping this logic separate is simpler. I re-introduced find_socket
in 786c616.
loop { | ||
match self.server.process(dgram.take(), (self.now)()) { | ||
let input_dgram = if let Some(d) = input_dgrams.iter_mut().flatten().next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this code to be a little unintuitive. You are taking a mutable iterator over the option, then flattening it. It's not clear that you are mutating the underlying iterator as a result of calling next()
.
Taking a step back, I think that this code is fairly simple:
You read from the indicated socket, process every datagram that it produces, and stop when the socket stops producing datagrams.
Would this structure work?
fn whatever(ready_socket_index: Option<usize>) -> Res<()> {
let Some(inx) = ready_socket_index else {
return Ok(());
};
let (host, socket) = self.sockets.get_mut(inx).unwrap();
while let Some(input_dgrams) = socket.recv(*host, &mut self.recv_buf)? {
for input in input_dgrams {
match self.server.process(input, (self.now)()) {
// see below for a note about sending.
Output::Datagram(output) => Self::send(&mut self.sockets, &output).await?,
Output::Callback(t) => {
self.timeout = Some(Box::pin(tokio::time::sleep(new_timeout)))
}
Output::None => {}
}
}
}
Ok(())
}
I think that's functionally equivalent to what you have, but a lot easier to read, at least for me.
I'm not a borrow checker, so I couldn't say if this works. There are a lot of references being held here. I can't see any obvious overlap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this code to be a little unintuitive.
Agreed. The complexity stems from the fact that neqo_transport::Server
does not have a process_multiple_input
function, thus having to handle individual output Datagram
s while still buffering input Datagram
s.
neqo_transport::Server
does not have a process_multiple_input
function, because the set of input Datagram
s provided might each be for a different neqo_transport::Connection
, thus each result in a Output::Datagram
and thus process_multiple_input
would need to return a set of output Datagram
s, not just one Datagram
.
If you think it is worth it, I can explore this pathway, i.e. adding process_multiple_input
to neqo_transport::Server
. Preferably in a follow-up pull request.
Addressing the concrete suggestion above:
let Some(inx) = ready_socket_index else {
return Ok(());
};
process
(or whatever
above) might be called with ready_socket_index
None
, in which case it is expected to drive the output path only, i.e. not just return
early.
while let Some(input_dgrams) = socket.recv(*host, &mut self.recv_buf)? {
for input in input_dgrams {
match self.server.process(input, (self.now)()) {
// see below for a note about sending.
Output::Datagram(output) => Self::send(&mut self.sockets, &output).await?,
Output::Callback(t) => {
self.timeout = Some(Box::pin(tokio::time::sleep(new_timeout)))
}
Output::None => {}
}
}
}
If self.server.process
returns Output::Datagram
, one has to call it again, until it returns Output::Callback
or Output::None
. In the above suggestion, self.server.process
is only called, if more input datagrams are available.
/// Tries to find a socket, but then just falls back to sending from the first. | ||
fn find_socket(&mut self, addr: SocketAddr) -> &mut crate::udp::Socket { | ||
let ((_host, first_socket), rest) = self.sockets.split_first_mut().unwrap(); | ||
rest.iter_mut() | ||
.map(|(_host, socket)| socket) | ||
.find(|socket| { | ||
socket | ||
.local_addr() | ||
.ok() | ||
.map_or(false, |socket_addr| socket_addr == addr) | ||
}) | ||
.unwrap_or(first_socket) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved outside of impl ServerRunner {}
, i.e. below, as it no longer takes &mut self
but instead socket: &mut [(SocketAddr, crate::udp::Socket)]
.
Previously the
neqo-bin
server would read a set of datagrams from the socket and allocate them:This was done out of convenience, as handling
Datagram<&[u8]>
s, each borrowing fromself.recv_buf
, is hard to get right across multiple&mut self
functions, that is hereself.run
,self.process
andself.find_socket
.This commit combines
self.process
andself.find_socket
and passes a socket index, instead of the readDatagram
s fromself.run
toself.process
, thus making the Rust borrow checker happy to handle borrowingDatagram<&[u8]>
sinstead of owning
Datagram
s.Follow-up to #2184.
Fixes #2190.
Hopefully speeds up #2199.