-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gracefully shutting down a TLS server sometimes leads to the client not receiving a response #3792
Comments
Have you been able to trace what is happening? One thing that sort of sounds like is that the TLS stream perhaps hasn't flushed the response before closing? |
But do you know for sure all the requests have indeed started? Or could the shutdown be triggered just before hyper has been able to see the request bytes? |
Seems to be the problem. Most of this comment is how I arrived at that. Skip to the end for some potential solutions. Did a bit of Hmm, so if I sleep for Inlining the snippet code here for convenience: loop {
tokio::select! {
_ = &mut shutdown_receiver => {
// ADDING THIS SLEEP MAKES THE TEST PASS
tokio::time::sleep(std::time::Duration::from_millis(1)).await;
drop(shut_down_connections_rx);
break;
}
conn = tcp_listener.accept() => {
tokio::spawn(
handle_tcp_conn(
conn,
wait_for_request_to_complete_rx.clone(),
shut_down_connections_tx.clone(),
tls_config
)
);
}
}
} The Inlining the snippet here for convenience: tokio::select! {
result = conn.as_mut() => {
if let Err(err) = result {
dbg!(err);
}
}
_ = should_shut_down_connection => {
// TEST STILL FAILS IF WE SLEEP RIGHT HERE
conn.as_mut().graceful_shutdown();
let result = conn.as_mut().await;
if let Err(err) = result {
dbg!(err);
}
}
}; Key PointsThe test passes if we sleep for a millisecond before sending on the channel that leads to If I instead move the sleep to just before the This suggests that the problem occurs when we call It looks like
Yeah seems like this is the problem. ProblemCurrently, if a user opens a TCP connection to a server and the server calls This means that if the client has just begun transmitting packets, but the server has not received them, the client will get an error. Potential SolutionsPotential Solution - change how
|
#[cfg(feature = "server")] | |
pub(crate) fn disable_keep_alive(&mut self) { | |
if self.state.is_idle() { | |
trace!("disable_keep_alive; closing idle connection"); | |
self.state.close(); | |
} else { | |
trace!("disable_keep_alive; in-progress connection"); | |
self.state.disable_keep_alive(); | |
} | |
} |
If a user could disable keepalive themselves then they could:
- disable keepalive
- poll the connection for up to N seconds (i.e. by using
tokio::time::timeout
)
Potential Solution - ... I'll add more if I think of any ...
...
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
I'm coming back to this having caught up some sleep and with some further discussion with @chinedufn. My main concerns were that the benefit was negligible and the only cases it mattered were artificial. I disagree with these two conclusions now. Here's my current take of why I think this should be fixed:
Therefore, I think this issue is worth solving. Sorry about all the noise I've added to the issue, I've hidden the previous comments as "resolved", for posterity. |
I do agree with some of the points brought up:
It seems likely that in this case, the time it takes to write/encrypt/read/decrypt is just enough that the tiny timeout used can beat it. |
Definitely agree that a timeout wouldn't solve 100% of cases. My thinking is that if a client opens a connection they're either:
I'm not concerned about making things easier for the attacker (1). I'd love to make things (slightly) easier for the non-malicious user who was about to give me bytes (2). I can save them a retry by handling the request. But, yes, for case (3) if it takes to long to see your bytes we'll need to close the connection. Can't wait forever. So, case (2) is what I'm interested in handling. I am operating under the assumption (no data on this currently) that with a timeout of, say, 1-2 seconds, I could handle the overwhelming majority of "I was about to give you bytes but you closed the connection on me!" scenarios. This means that my users see fewer errors, at nearly no expense to my server (I'm already comfortable taking up to 30 seconds to handle any in-progress requests before closing all connections. This 1-2 seconds would just come from that 30 second budget.) Only trade-off I can imagine is I'd say that saving the extra retry request for the client is worth it if the cost to The use case that I imagine is a server that sees heavy consistent traffic and is gracefully shut down often (say, every hour when the maintainers redeploy a new version of it). Are you open to solving this "problem" (I recognize that one might argue that it isn't a problem and the client should just be expected to retry) in Do any of the potential solutions that I've left in my previous comment seem reasonable? I'm happy to do the work required to land a solution to the "decrease the number of connections that get closed when we could have reasonably waited for a bit to get the data" problem (to repeat, yup I know we can't wait forever. I just want to reduce errors that users experience and recognize that I cannot fully eliminate them). |
I'm observing errors while testing graceful shutdown of a hyper server.
When gracefully shutting down a TLS connection the client will sometimes get an
IncompleteMessage
error.This only happens for TLS connections. The graceful shutdown process is always successful for non-TLS connections.
Given the following testing steps:
(Since the request handler has a random sleep between 5-10ms we can be reasonably confident
that when we receive a response there are still some other requests that are in-progress.)
When the hyper server is not using TLS, the test pass.
When the hyper server is using TLS, the test fails with an
IncompleteMessage
error.I've created a repository that reproduces the issue https://github.com/chinedufn/hyper-tls-graceful-shutdown-issue
Here's a quick snippet of the graceful shutdown code:
Here is the full source code for convenience (also available in the linked repository)
Cargo.toml (click to expand)
Rust code (click to expand)
The text was updated successfully, but these errors were encountered: