You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a 3 node opensearch cluster. When I take a snapshot of the opensearch and opendistro indices and check the snapshot, the snapshot does not have any failed shards:
However, when I try to restore the same snapshot on the same 3 node cluster, I see that some shards have failed.
{"snapshot":{"snapshot":"2","indices":[".opendistro-alerting-alert-history-2024.06.11-1",".opendistro-alerting-alerts",".opendistro-reports-definitions",".kibana_1",".opendistro-reports-instances",".opensearch-notifications-config",".opendistro-alerting-config"],"shards":{"total":7,"failed":3,"successful":4}}}
The opensearch logs say that the snapshot is missing:
[2024-06-12T09:20:00,612][WARN ][o.o.s.InternalSnapshotsInfoService] [opensearch-cluster-master-0] failed to retrieve shard size for [snapshot=nsp-opensearch-repository:2/vCkI0dAMTxqmOPE3yF_Rfw, index=[.opendistro-alerting-alerts/FLxs5PqsSTOF7NqmUF-0NA], shard=[.opendistro-alerting-alerts][0]]
org.opensearch.snapshots.SnapshotMissingException: [nsp-opensearch-repository:2] is missing
at org.opensearch.repositories.blobstore.BlobStoreRepository.loadShardSnapshot(BlobStoreRepository.java:3556) ~[opensearch-2.14.0.jar:2.14.0]
at org.opensearch.repositories.blobstore.BlobStoreRepository.getShardSnapshotStatus(BlobStoreRepository.java:3356) ~[opensearch-2.14.0.jar:2.14.0]
at org.opensearch.snapshots.InternalSnapshotsInfoService$FetchingSnapshotShardSizeRunnable.doRun(InternalSnapshotsInfoService.java:241) [opensearch-2.14.0.jar:2.14.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.14.0.jar:2.14.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.14.0.jar:2.14.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
However, I have manually copied the backup files to the pod at the path registered in the repository.
We don't have a shared file system but we make sure to manually copy the snapshot files to all the nodes.
I wanted to clarify two things.
Is it advisable to backup and restore these indices?
How does opensearch restore process handle index that have multiple primary and replica shards.
We have the same issue with another set of application specific indices that have multiple primary and replica shards.
Related component
Storage:Snapshots
To Reproduce
Take a snapshot of an opensearch cluster consisting of multiple nodes
Restore the snapshot on the same node.
Expected behavior
The snapshot should get restored successfully without any shard failures.
Additional Details
No response
The text was updated successfully, but these errors were encountered:
Describe the bug
We have a 3 node opensearch cluster. When I take a snapshot of the opensearch and opendistro indices and check the snapshot, the snapshot does not have any failed shards:
{"snapshots":[{"snapshot":"2","uuid":"vCkI0dAMTxqmOPE3yF_Rfw","version_id":136357827,"version":"2.14.0","remote_store_index_shallow_copy":false,"indices":[".opendistro-alerting-alert-history-2024.06.11-1",".opensearch-notifications-config",".opendistro-alerting-alerts",".kibana_1",".opendistro-reports-instances",".opendistro-alerting-config",".opendistro-reports-definitions"],"data_streams":[],"include_global_state":false,"state":"SUCCESS","start_time":"2024-06-11T07:33:00.126Z","start_time_in_millis":1718091180126,"end_time":"2024-06-11T07:33:00.326Z","end_time_in_millis":1718091180326,"duration_in_millis":200,"failures":[],"shards":{"total":7,"failed":0,"successful":7}}]}
However, when I try to restore the same snapshot on the same 3 node cluster, I see that some shards have failed.
{"snapshot":{"snapshot":"2","indices":[".opendistro-alerting-alert-history-2024.06.11-1",".opendistro-alerting-alerts",".opendistro-reports-definitions",".kibana_1",".opendistro-reports-instances",".opensearch-notifications-config",".opendistro-alerting-config"],"shards":{"total":7,"failed":3,"successful":4}}}
The opensearch logs say that the snapshot is missing:
[2024-06-12T09:20:00,612][WARN ][o.o.s.InternalSnapshotsInfoService] [opensearch-cluster-master-0] failed to retrieve shard size for [snapshot=nsp-opensearch-repository:2/vCkI0dAMTxqmOPE3yF_Rfw, index=[.opendistro-alerting-alerts/FLxs5PqsSTOF7NqmUF-0NA], shard=[.opendistro-alerting-alerts][0]]
org.opensearch.snapshots.SnapshotMissingException: [nsp-opensearch-repository:2] is missing
at org.opensearch.repositories.blobstore.BlobStoreRepository.loadShardSnapshot(BlobStoreRepository.java:3556) ~[opensearch-2.14.0.jar:2.14.0]
at org.opensearch.repositories.blobstore.BlobStoreRepository.getShardSnapshotStatus(BlobStoreRepository.java:3356) ~[opensearch-2.14.0.jar:2.14.0]
at org.opensearch.snapshots.InternalSnapshotsInfoService$FetchingSnapshotShardSizeRunnable.doRun(InternalSnapshotsInfoService.java:241) [opensearch-2.14.0.jar:2.14.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.14.0.jar:2.14.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.14.0.jar:2.14.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
However, I have manually copied the backup files to the pod at the path registered in the repository.
We don't have a shared file system but we make sure to manually copy the snapshot files to all the nodes.
I wanted to clarify two things.
We have the same issue with another set of application specific indices that have multiple primary and replica shards.
Related component
Storage:Snapshots
To Reproduce
Expected behavior
The snapshot should get restored successfully without any shard failures.
Additional Details
No response
The text was updated successfully, but these errors were encountered: