You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When s3 is used as a backing store for remote cluster state, the multi part upload of remote state files fails with the below error.
[2024-07-16T12:53:45,471][ERROR][o.o.g.r.RemoteClusterStateService] [7c09ef8cc274078bab152a013b5cbb55] Exception during transfer of Metadata Fragment to Remote nodes
org.opensearch.gateway.remote.RemoteStateTransferException: nodes, failed entity:org.opensearch.gateway.remote.model.RemoteDiscoveryNodes@1c832901
at org.opensearch.gateway.remote.RemoteClusterStateAttributesManager.lambda$getActionListener$2(RemoteClusterStateAttributesManager.java:106)
at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:90)
at org.opensearch.repositories.s3.S3BlobContainer.lambda$createFileCompletableFuture$7(S3BlobContainer.java:320)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Failed to send multipart upload requests.
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
at org.opensearch.repositories.s3.async.AsyncTransferManager.handleException(AsyncTransferManager.java:326)
... 61 more
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Request content was only 177910 bytes, but the specified content-length was 5288374 bytes.
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223)
at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:218)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:182)
... 24 more
Caused by: java.lang.IllegalStateException: Request content was only 177910 bytes, but the specified content-length was 5288374 bytes.
at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor$StreamedRequest$1.onComplete(NettyRequestExecutor.java:479)
at software.amazon.awssdk.utils.async.SimplePublisher.doProcessQueue(SimplePublisher.java:275)
at software.amazon.awssdk.utils.async.SimplePublisher.processEventQueue(SimplePublisher.java:224)
When s3 async upload is invoked, an IndexInput is passed which is created using the serialized bytes(code ref). Internally in the s3 plugin, when the parts are initialized for multi part upload, they set the file pointer to the location in IndexInput where it should start reading the bytes. But since the backing IndexInput is the same, the file pointer gets set to the last part. Now when the s3 client starts to read, only one of the parts will be able to read and but will face the issue with the content length mismatch. The other parts will not even be able to read any byte as the file pointer gets set to the last location in IndexInput.
Related component
Cluster Manager
To Reproduce
Create a remote state publication enabled cluster
Keep adding nodes to the cluster, so that the size of DiscoveryNodes in cluster state breaches 5 MB.
When the size reaches 5 MB, s3 plugin tries to perform multi part upload.
Check the logs to see upload failure exception.
Expected behavior
Multi part upload should work correctly
Solution
When s3 async upload is invoked, there should be new instance of IndexInput created in the stream supplier function.
Additional Details
Plugins
s3 plugin
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
OS 2.15
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
When s3 is used as a backing store for remote cluster state, the multi part upload of remote state files fails with the below error.
When s3 async upload is invoked, an IndexInput is passed which is created using the serialized bytes(code ref). Internally in the s3 plugin, when the parts are initialized for multi part upload, they set the file pointer to the location in IndexInput where it should start reading the bytes. But since the backing IndexInput is the same, the file pointer gets set to the last part. Now when the s3 client starts to read, only one of the parts will be able to read and but will face the issue with the content length mismatch. The other parts will not even be able to read any byte as the file pointer gets set to the last location in IndexInput.
Related component
Cluster Manager
To Reproduce
Expected behavior
Multi part upload should work correctly
Solution
When s3 async upload is invoked, there should be new instance of IndexInput created in the stream supplier function.
Additional Details
Plugins
s3 plugin
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
OS 2.15
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: