You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We get following error while uploading translog files.
[2024-02-27T10:53:26,255][ERROR][o.o.i.t.t.BlobStoreTransferService] [8c4f026d5ef9b5702a7d65da4c517316] Failed to upload blob translog-1147.ckp
java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:209)
at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlob(BlobStoreTransferService.java:130)
at org.opensearch.index.translog.transfer.BlobStoreTransferService.lambda$uploadBlobs$2(BlobStoreTransferService.java:99)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlobs(BlobStoreTransferService.java:94)
at org.opensearch.index.translog.transfer.TranslogTransferManager.transferSnapshot(TranslogTransferManager.java:154)
at org.opensearch.index.translog.RemoteFsTranslog.upload(RemoteFsTranslog.java:348)
at org.opensearch.index.translog.RemoteFsTranslog.prepareAndUpload(RemoteFsTranslog.java:326)
at org.opensearch.index.translog.RemoteFsTranslog.sync(RemoteFsTranslog.java:375)
at org.opensearch.index.translog.InternalTranslogManager.syncTranslog(InternalTranslogManager.java:197)
at org.opensearch.index.engine.InternalEngine.syncTranslog(InternalEngine.java:610)
at org.opensearch.index.shard.IndexShard.sync(IndexShard.java:4412)
at org.opensearch.index.IndexService.maybeFSyncTranslogs(IndexService.java:1008)
at org.opensearch.index.IndexService$AsyncTranslogFSync.runInternal(IndexService.java:1143)
at org.opensearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:159)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:858)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Backtracking the trace, we found out that checksum can be null for files that are already present on the local while RemoteFsTranslog is getting initialized.
This works as we delete all local files and download from remote in RemoteFsTranslog constructor.
So, in ideal flow, post RemoteFsTranslog initialization, we will have tlog and ckp files downloaded from remote to local and file tracker updated accordingly. All good!
Now, the issue happens if you have tlog files on local that are not part of file tracker. As these are not part of file tracker, we try to upload them and upload fails with NPE.
Why were translog files present on local and not in the file tracker?
In download translog flow, we fetch the translog metadata file, delete existing local files and download from remote.
If we don't find translog metadata file, we skip deleting existing local files as well.
As we don't download any tlog files, the file tracker is not updated.
This only happens if we have files in local but no translog metadata file in remote.
Why was the translog file missing from remote?
This can happen if we upload the translog files and before uploading translog metadata, process crashes (due to any reason).
In this case, the process will be restarted and the same node will get the same primary shard (due to 0 replica). Node will start accepting the writes but they were not acknowledged as the translog file upload was failing and metadata is never uploaded.
But the translog file on local remained with dirty writes and kept growing.
Expected behavior
Translog upload should not fail due to NPE.
Additional Details
No response
The text was updated successfully, but these errors were encountered:
Describe the bug
We get following error while uploading translog files.
Related component
Storage:Remote
To Reproduce
Why were translog uploads failing due to NPE?
Why were translog files present on local and not in the file tracker?
Why was the translog file missing from remote?
Expected behavior
Translog upload should not fail due to NPE.
Additional Details
No response
The text was updated successfully, but these errors were encountered: