[BUG] Stats transport actions based on TransportNodeActions sends large payload of Discovery Nodes to all nodes #14713

Pranshu-S · 2024-07-11T04:13:15Z

Describe the bug

In the current implementation, every transport action extending TransportNodesAction includes all discovery nodes in the transport request sent to each node in the cluster. This approach leads to performance bottlenecks in large clusters due to redundant data transmission. Specifically:

Increased Network Traffic: The same list of discovery nodes is written n^2 times (where n is the number of nodes), causing unnecessary network traffic and increased IO.
Write/Read Latency: The excessive data transmission contributes to higher overall latency for both write and read operations.
NIO Buffer Bottleneck: When using plugins like Netty for inter-node communication, the buffer becomes overloaded with redundant discovery node information, increasing the size of the request and correspondingly reducing the amount of requests which can fit in the Netty buffer.

Related component

Other

To Reproduce

If NodeIDs are passed in the TransportNodeAction requests, we resolve them into DiscoveryNodes. This request is cloned by the individual requests which go to each node here which ends up write the discoveryNodes object.

Essentially for a 200 Node cluster, we are sending writing 200 discoveryNode objects for each request -> implying we write about 200x200 in the entire duration of the send path. This grows exponentially with number of nodes

Expected behavior

The request path should only be sending information that is to be required on the receive path.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

OS: [e.g. iOS]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Pranshu-S added bug Something isn't working untriaged labels Jul 11, 2024

github-actions bot added the Other label Jul 11, 2024

rwali-aws added Cluster Manager and removed Other labels Jul 11, 2024

github-project-automation bot added this to Cluster Manager Project Board Jul 11, 2024

github-project-automation bot moved this to 🆕 New in Cluster Manager Project Board Jul 11, 2024

rwali-aws added v2.16.0 Issues and PRs related to version 2.16.0 and removed untriaged labels Jul 11, 2024

Pranshu-S mentioned this issue Jul 15, 2024

Optimise TransportNodesAction to not send DiscoveryNodes for NodeStat… #14749

Merged

3 tasks

shwetathareja closed this as completed in #14749 Jul 22, 2024

github-project-automation bot moved this from 🆕 New to ✅ Done in Cluster Manager Project Board Jul 22, 2024

SwethaGuptha mentioned this issue Aug 6, 2024

Reset discovery nodes in all transport node actions request. #15131

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Stats transport actions based on TransportNodeActions sends large payload of Discovery Nodes to all nodes #14713

[BUG] Stats transport actions based on TransportNodeActions sends large payload of Discovery Nodes to all nodes #14713

Pranshu-S commented Jul 11, 2024

[BUG] Stats transport actions based on TransportNodeActions sends large payload of Discovery Nodes to all nodes #14713

[BUG] Stats transport actions based on TransportNodeActions sends large payload of Discovery Nodes to all nodes #14713

Comments

Pranshu-S commented Jul 11, 2024

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details