Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Cluster State Applier thread blocked on remote store operations #12026

Open
gbbafna opened this issue Jan 26, 2024 · 2 comments
Open
Labels
bug Something isn't working Cluster Manager Storage:Durability Issues and PRs related to the durability framework

Comments

@gbbafna
Copy link
Collaborator

gbbafna commented Jan 26, 2024

Describe the bug

On remote store clusters, we can see cluster state applier thread is blocked on remote store calls. In case when the calls to remote store takes a lot of time, the node is not able to apply the cluster state and LagDetector on the cluster manager kicks it out .

[2024-01-24T20:56:24,412][WARN ][o.o.i.c.IndicesClusterStateService] [a] [.index][5] marking and sending shard failed due to [failed to create shard]
java.io.IOException: java.io.IOException: Exception when listing blobs by prefix [x/y/z/metadata]
    at org.opensearch.index.store.RemoteDirectory.listFilesByPrefixInLexicographicOrder(RemoteDirectory.java:138)
    at org.opensearch.index.store.RemoteSegmentStoreDirectory.readLatestMetadataFile(RemoteSegmentStoreDirectory.java:191)
    at org.opensearch.index.store.RemoteSegmentStoreDirectory.init(RemoteSegmentStoreDirectory.java:145)
    at org.opensearch.index.store.RemoteSegmentStoreDirectory.<init>(RemoteSegmentStoreDirectory.java:132)
    at org.opensearch.index.store.RemoteSegmentStoreDirectoryFactory.newDirectory(RemoteSegmentStoreDirectoryFactory.java:74)
    at org.opensearch.index.store.RemoteSegmentStoreDirectoryFactory.newDirectory(RemoteSegmentStoreDirectoryFactory.java:49)
    at org.opensearch.index.IndexService.createShard(IndexService.java:488)
    at org.opensearch.indices.IndicesService.createShard(IndicesService.java:1036)
    at org.opensearch.indices.IndicesService.createShard(IndicesService.java:212)
    at org.opensearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:673)
    at org.opensearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:650)
    at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:295)
    at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:606)
    at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:593)
    at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:561)
    at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:484)
    at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:186)
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:858)
    at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282)
    at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: Exception when listing blobs by prefix [x/y/z/metadata]
    at org.opensearch.repositories.s3.S3BlobContainer.listBlobsByPrefixInSortedOrder(S3BlobContainer.java:455)
    at org.opensearch.common.blobstore.BlobContainer.listBlobsByPrefixInSortedOrder(BlobContainer.java:234)
    at org.opensearch.common.blobstore.EncryptedBlobContainer.listBlobsByPrefixInSortedOrder(EncryptedBlobContainer.java:207)
    at org.opensearch.index.store.RemoteDirectory.listFilesByPrefixInLexicographicOrder(RemoteDirectory.java:127)
    ... 22 more
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
    at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
    at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)

On cluster manager node :

[2024-01-24T20:56:24,339][WARN ][o.o.c.c.LagDetector      ] [0f2] node [{a}{b}{c}{{dir}] is lagging at cluster state version [29651], although publication of cluster state version [29652] completed [1.5m] ago

[2024-01-24T20:56:25,192][INFO ][o.o.c.s.MasterService    ] [0f2] node-left [{a}{b}{c}{{dir}] reason: lagging], term: 14, version: 29656, delta: removed {[{a}{b}{c}{{dir}]}

Related component

Storage:Durability

To Reproduce

We see this when there are high amount of relocations in the cluster .

Expected behavior

For cluster state applier, we should have a dedicated threadpool, so that it doesn't get blocked on any resource - be it threadpool / connections etc.

Additional Details

Plugins
repository-s3

Host/Environment (please complete the following information):

  • Amazon Linux 2

Additional context
Add any other context about the problem here.

@gbbafna gbbafna added bug Something isn't working untriaged labels Jan 26, 2024
@github-actions github-actions bot added the Storage:Durability Issues and PRs related to the durability framework label Jan 26, 2024
@gbbafna gbbafna changed the title [Remote Store] Cluster State Applier thread blocked on remote uploads [Remote Store] Cluster State Applier thread blocked on remote store operations Jan 26, 2024
@gbbafna gbbafna removed the untriaged label Jan 26, 2024
@Bukhtawar
Copy link
Collaborator

Thanks Gaurav, Ideally cluster state applier thread blocks or should block any blocking or networking operation on that thread. But given the flow with remote store we might need a dedicated and prioritized threadpool in remote store for cluster state applier interactions and we may want to still want to keep the blocking behaviour of the cluster state applier thread.

@shwetathareja
Copy link
Member

We shouldn't perform expensive operation in Cluster state applier thread. If you can offload this work to dedicated thread pool, that would be preferred.

Ideally applier are expected to finish before any listeners can be triggered but we can evaluate if appliers can execute their task which doesn't depend on the cluster state in the parallel in the background.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager Storage:Durability Issues and PRs related to the durability framework
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants