-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move FramedWrite
work to a separate task
#1145
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor questions
+ 'static, | ||
{ | ||
while let Some(m) = resp_channel_rx.recv().await { | ||
fw.send(m).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self, this would be a great place for a counter of jobs responded to by a downstairs.
@@ -1686,25 +1681,62 @@ where | |||
* gracefully. By exiting the loop here, we allow the calling | |||
* function to take over and handle a reconnect or a new upstairs | |||
* takeover. | |||
* | |||
* Tasks are organized as follows, with tasks in `snake_case` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh perfect.
I thought going in to this review that my only comment was going to be to add the
picture from the PR to the code somewhere, but you have already done that.
@@ -1717,8 +1749,6 @@ where | |||
// continually locking the downstairs, cache the result here. | |||
let lossy = ads.lock().await.lossy; | |||
|
|||
tokio::pin!(dw_task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why no more pin!
? Is it because fw
is no longer a mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pin was never actually necessary, as far as I can tell; this diff removes it on main
:
diff --git a/downstairs/src/lib.rs b/downstairs/src/lib.rs
index 427fdbd..76b0d95 100644
--- a/downstairs/src/lib.rs
+++ b/downstairs/src/lib.rs
@@ -1687,7 +1687,7 @@ where
* function to take over and handle a reconnect or a new upstairs
* takeover.
*/
- let dw_task = {
+ let mut dw_task = {
let adc = ads.clone();
let fwc = fw.clone();
tokio::spawn(async move {
@@ -1697,7 +1697,7 @@ where
let (message_channel_tx, mut message_channel_rx) =
mpsc::channel(MAX_ACTIVE_COUNT + 50);
- let pf_task = {
+ let mut pf_task = {
let adc = ads.clone();
let tx = job_channel_tx.clone();
let fwc = fw.clone();
@@ -1717,8 +1717,6 @@ where
// continually locking the downstairs, cache the result here.
let lossy = ads.lock().await.lossy;
- tokio::pin!(dw_task);
- tokio::pin!(pf_task);
loop {
tokio::select! {
e = &mut dw_task => {
I guess pinning let us borrow the tasks mutably in the select!
, which we could also do by just... making them mutable?
The downstairs currently has the following tasks (labelled in
snake_case
):Sending response data to the
FramedWrite
requires serializing it (usingbincode
) and pushing it into the socket.During read-heavy operations, this is expensive! We can see it as
__writev
andmemcpy
(directly to its right) in the left section of this flamegraph, which together take about 28% of runtime.(interactive version)
FramedWrite::send
is currently being called fromdw_task
(fromDownstairs::do_work
), which means that we're blocking on this serialization + socket write. This PR adds a new task specifically for performing theFramedWrite
work, i.e.dw_task
now passesMessage
objects into theframed_write_task
, and keeps going. Serialization is a standalone operation that can happen elsewhere in the worker thread pool, without holding theDownstairs
lock.With these changes, I see a 20% speedup for 1M random reads and a 56% speedup for 4M random reads.
(I'd also be interested in a #1087 -style fix later on, to remove the
memcpy
entirely, but this seems worthwhile on its own)