From 144b1989a1fd9580be683a97515d1fa9939af3e2 Mon Sep 17 00:00:00 2001 From: Andrew Ross Date: Tue, 10 Oct 2023 19:54:15 -0500 Subject: [PATCH] Replace multipart download with parallel file download (#10519) There are a few open issues with the multi-stream download approach: - Recovery stats are not being reported correctly - It is incompatible (short of reopening and re-reading the entire file) with the existing Lucene checksum validation logic - There are some issues with integrating it with the pending client side encryption work Given this, I attempted an experiment where I replaced with multi-stream-within-a-single-file approach with simply parallelizing downloads across files (this is how snapshot restore works). I actually got better results with this approach: recovering a ~52GiB shard took about 4.7 minutes with the multi-stream code versus 3.9 minutes with the parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume, S3 as remote repository). I think this is the right approach as it leverages the more battle-tested code path and addresses the three issues listed above. The multi-stream approach still has promise as it will allow us to download very large files faster (whereas this approach they can be the long poll on the transfer operation). However, given that 5GB segments (made up of multiple files in practice) are the norm, we generally aren't dealing with huge files. Signed-off-by: Andrew Ross --- server/src/main/java/org/opensearch/index/shard/IndexShard.java | 1 - 1 file changed, 1 deletion(-) diff --git a/server/src/main/java/org/opensearch/index/shard/IndexShard.java b/server/src/main/java/org/opensearch/index/shard/IndexShard.java index 981c4bdf1aea6..f990a3b56e856 100644 --- a/server/src/main/java/org/opensearch/index/shard/IndexShard.java +++ b/server/src/main/java/org/opensearch/index/shard/IndexShard.java @@ -159,7 +159,6 @@ import org.opensearch.index.seqno.SequenceNumbers; import org.opensearch.index.shard.PrimaryReplicaSyncer.ResyncTask; import org.opensearch.index.similarity.SimilarityService; -import org.opensearch.index.store.DirectoryFileTransferTracker; import org.opensearch.index.store.RemoteSegmentStoreDirectory; import org.opensearch.index.store.RemoteStoreFileDownloader; import org.opensearch.index.store.Store;