Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise node to node ser/de with caching static attributes #13662

Open
Bukhtawar opened this issue May 14, 2024 · 1 comment
Open

Optimise node to node ser/de with caching static attributes #13662

Bukhtawar opened this issue May 14, 2024 · 1 comment
Labels
Cluster Manager enhancement Enhancement or improvement to existing feature or request ShardManagement:Performance

Comments

@Bukhtawar
Copy link
Collaborator

Bukhtawar commented May 14, 2024

Is your feature request related to a problem? Please describe

Screenshot 2024-05-06 at 11 33 22 AM

A significant amount of compute and memory goes into ser/de during node to node communications for DiscoveryNode containing a bunch of node properties and attributes which are largely static and doesn't need to passed around for most of the node to node commnication.

100.1% (5s out of 5s) cpu usage by thread 'opensearch[d7a36f3accb52285ed475b5d2dcbecba][transport_worker][T#16]'
     3/10 snapshots sharing following 102 elements
       app//org.opensearch.core.common.io.stream.StreamOutput.writeBytes(StreamOutput.java:179)
       app//org.opensearch.core.common.io.stream.StreamOutput.writeString(StreamOutput.java:468)
       app//org.opensearch.cluster.node.DiscoveryNode.writeTo(DiscoveryNode.java:393)
       app//org.opensearch.core.common.io.stream.StreamOutput.lambda$writeOptionalArray$28(StreamOutput.java:968)
       app//org.opensearch.core.common.io.stream.StreamOutput$$Lambda$7381/0x0000000601e40240.write(Unknown Source)
       app//org.opensearch.core.common.io.stream.StreamOutput.writeArray(StreamOutput.java:937)
       app//org.opensearch.core.common.io.stream.StreamOutput.writeOptionalArray(StreamOutput.java:950)
       app//org.opensearch.core.common.io.stream.StreamOutput.writeOptionalArray(StreamOutput.java:968)
       app//org.opensearch.action.support.nodes.BaseNodesRequest.writeTo(BaseNodesRequest.java:131)
       org.opensearch.ml.common.transport.sync.MLSyncUpNodesRequest.writeTo(MLSyncUpNodesRequest.java:43)
       org.opensearch.ml.common.transport.sync.MLSyncUpNodeRequest.writeTo(MLSyncUpNodeRequest.java:36)
       app//org.opensearch.transport.OutboundMessage.writeMessage(OutboundMessage.java:104)
       app//org.opensearch.transport.OutboundMessage.serialize(OutboundMessage.java:81)
       app//org.opensearch.transport.OutboundHandler$MessageSerializer.get(OutboundHandler.java:235)
       app//org.opensearch.transport.OutboundHandler$MessageSerializer.get(OutboundHandler.java:221)
       app//org.opensearch.transport.OutboundHandler$SendContext.get(OutboundHandler.java:275)
       app//org.opensearch.transport.OutboundHandler.internalSend(OutboundHandler.java:197)
       app//org.opensearch.transport.OutboundHandler.sendMessage(OutboundHandler.java:192)
       app//org.opensearch.transport.OutboundHandler.sendRequest(OutboundHandler.java:129)
       app//org.opensearch.transport.TcpTransport$NodeChannels.sendRequest(TcpTransport.java:320)
       app//org.opensearch.transport.TransportService.sendRequestInternal(TransportService.java:989)
       app//org.opensearch.transport.TransportService$$Lambda$4346/0x00000006014ed410.sendRequest(Unknown Source)      
       org.opensearch.security.transport.SecurityInterceptor.sendRequestDecorate(SecurityInterceptor.java:288)
       org.opensearch.security.OpenSearchSecurityPlugin$6$2.sendRequest(OpenSearchSecurityPlugin.java:871)
      
       app//org.opensearch.transport.TransportService.sendRequestAsync(TransportService.java:1750)
       app//org.opensearch.transport.TransportService.sendRequest(TransportService.java:885)
       app//org.opensearch.transport.TransportService.sendRequest(TransportService.java:862)
       app//org.opensearch.action.support.nodes.TransportNodesAction$AsyncAction.start(TransportNodesAction.java:264)
       app//org.opensearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:153)
       app//org.opensearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:70)
[2024-07-18T10:11:44,767][DEBUG][o.o.a.a.c.n.i.TransportNodesInfoAction] [d7a36f3accb52285ed475b5d2dcbecba] #[org.opensearch.transport.ReceiveTimeoutTransportException]#failed to execute on node [wlttuem8RGatNMytXYSs5g]
ReceiveTimeoutTransportException[[0cb5686526983c7c67119286cbf29b45][10.212.26.46:9300][cluster:monitor/nodes/info[n]] request_id [2574070] timed out after [30011ms]]
        at org.opensearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1409)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
[2024-07-18T10:11:46,093][DEBUG][o.o.a.a.c.n.i.TransportNodesInfoAction] [d7a36f3accb52285ed475b5d2dcbecba] #[org.opensearch.transport.ReceiveTimeoutTransportException]#failed to execute on node [3DNVyiDxSsyXJ845PWkvsw]
ReceiveTimeoutTransportException[[1d6387ea1cff9e2b0687c8906f5e0fb1][10.212.48.209:9300][cluster:monitor/nodes/info[n]] request_id [2575917] timed out after [30010ms]]
        at org.opensearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1409)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
[2024-07-18T10:11:46,774][DEBUG][o.o.a.a.c.n.i.TransportNodesInfoAction] [d7a36f3accb52285ed475b5d2dcbecba] #[org.opensearch.transport.ReceiveTimeoutTransportException]#failed to execute on node [BAG9M3sCTmCp5bNbp_tvYw]
ReceiveTimeoutTransportException[[b595c83d0b3fef1c033beb8162953e88][10.212.24.55:9300][cluster:monitor/nodes/info[n]] request_id [2576545] timed out after [30042ms]]
        at org.opensearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1409)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
[2024-07-18T10:11:46,935][WARN ][o.o.c.InternalClusterInfoService] [d7a36f3accb52285ed475b5d2dcbecba] Failed to update shard information for ClusterInfoUpdateJob within 10s timeout
[2024-07-18T10:11:47,243][DEBUG][o.o.a.a.c.n.i.TransportNodesInfoAction] [d7a36f3accb52285ed475b5d2dcbecba] #[org.opensearch.transport.ReceiveTimeoutTransportException]#failed to execute on node [Q0MgL7n6RMOT4nz0OmhN0w]

Describe the solution you'd like

Cache static node properties and attributes

Related component

ShardManagement:Performance

Describe alternatives you've considered

No response

Additional context

No response

@Bukhtawar Bukhtawar added enhancement Enhancement or improvement to existing feature or request untriaged labels May 14, 2024
@Bukhtawar Bukhtawar changed the title [Feature Request] Optimise node to node ser/de with caching static attributes Optimise node to node ser/de with caching static attributes Jul 18, 2024
@Bukhtawar
Copy link
Collaborator Author

#14749

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cluster Manager enhancement Enhancement or improvement to existing feature or request ShardManagement:Performance
Projects
Status: 🆕 New
Status: 🆕 New
Development

No branches or pull requests

1 participant