From 963d9aac961143cfb9d93090173afc86427e124e Mon Sep 17 00:00:00 2001 From: David Turner Date: Mon, 5 Jul 2021 13:38:29 +0100 Subject: [PATCH] Generalize TCP retxn docs to cover remote clusters (#74732) Today the docs on setting `tcp_retries2` only talk about intra-cluster connections, but in fact this setting is equally important to the resilience of remote cluster connections too. This commit rewords these docs to cover both cases. Relates #34405 --- .../setup/sysconfig/tcpretries.asciidoc | 52 +++++++++++-------- 1 file changed, 29 insertions(+), 23 deletions(-) diff --git a/docs/reference/setup/sysconfig/tcpretries.asciidoc b/docs/reference/setup/sysconfig/tcpretries.asciidoc index e25c07c388a1a..a7cbe728e212d 100644 --- a/docs/reference/setup/sysconfig/tcpretries.asciidoc +++ b/docs/reference/setup/sysconfig/tcpretries.asciidoc @@ -1,32 +1,38 @@ [[system-config-tcpretries]] === TCP retransmission timeout -Each pair of nodes in a cluster communicates via a number of TCP connections -which <> until one of the nodes shuts down -or communication between the nodes is disrupted by a failure in the underlying +Each pair of {es} nodes communicates via a number of TCP connections which +<> until one of the nodes shuts down or +communication between the nodes is disrupted by a failure in the underlying infrastructure. -TCP provides reliable communication over occasionally-unreliable networks by +TCP provides reliable communication over occasionally unreliable networks by hiding temporary network disruptions from the communicating applications. Your operating system will retransmit any lost messages a number of times before -informing the sender of any problem. Most Linux distributions default to -retransmitting any lost packets 15 times. Retransmissions back off -exponentially, so these 15 retransmissions take over 900 seconds to complete. -This means it takes Linux many minutes to detect a network partition or a -failed node with this method. Windows defaults to just 5 retransmissions which -corresponds with a timeout of around 6 seconds. +informing the sender of any problem. {es} must wait while the retransmissions +are happening and can only react once the operating system decides to give up. +Users must therefore also wait for a sequence of retransmissions to complete. + +Most Linux distributions default to retransmitting any lost packets 15 times. +Retransmissions back off exponentially, so these 15 retransmissions take over +900 seconds to complete. This means it takes Linux many minutes to detect a +network partition or a failed node with this method. Windows defaults to just 5 +retransmissions which corresponds with a timeout of around 6 seconds. The Linux default allows for communication over networks that may experience -very long periods of packet loss, but this default is excessive for production -networks within a single data centre as is the case for most {es} clusters. -Highly-available clusters must be able to detect node failures quickly so that -they can react promptly by reallocating lost shards, rerouting searches and -perhaps electing a new master node. Linux users should therefore reduce the -maximum number of TCP retransmissions. +very long periods of packet loss, but this default is excessive and even harmful +on the high quality networks used by most {es} installations. When a cluster +detects a node failure it reacts by reallocating lost shards, rerouting +searches, and maybe electing a new master node. Highly available clusters must +be able to detect node failures promptly, which can be achieved by reducing the +permitted number of retransmissions. Connections to +<> should also prefer to detect +failures much more quickly than the Linux default allows. Linux users should +therefore reduce the maximum number of TCP retransmissions. -You can decrease the maximum number of TCP retransmissions to `5` by running -the following command as `root`. Five retransmissions corresponds with a -timeout of around six seconds. +You can decrease the maximum number of TCP retransmissions to `5` by running the +following command as `root`. Five retransmissions corresponds with a timeout of +around six seconds. [source,sh] ------------------------------------- @@ -38,8 +44,8 @@ To set this value permanently, update the `net.ipv4.tcp_retries2` setting in `sysctl net.ipv4.tcp_retries2`. IMPORTANT: This setting applies to all TCP connections and will affect the -reliability of communication with systems outside your cluster too. If your -cluster communicates with external systems over an unreliable network then you +reliability of communication with systems other than {es} clusters too. If your +clusters communicate with external systems over a low quality network then you may need to select a higher value for `net.ipv4.tcp_retries2`. For this reason, {es} does not adjust this setting automatically. @@ -54,6 +60,6 @@ related to these application-level health checks. You must also ensure your network infrastructure does not interfere with the long-lived connections between nodes, <>. Devices which drop connections when they reach -a certain age are a common source of problems to Elasticsearch clusters, and -must not be used. +a certain age are a common source of problems to {es} clusters, and must not be +used.