From 114142fd1ef472930aef2e4b947c818b8d4a6910 Mon Sep 17 00:00:00 2001 From: Andrew Ross Date: Tue, 2 Apr 2024 11:42:05 -0500 Subject: [PATCH] Add detail to searchable snapshot limits section (#6828) * Add detail to searchable snapshot limits section Signed-off-by: Andrew Ross * Update _tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Andrew Ross * Reword ratio sentence Signed-off-by: Andrew Ross --------- Signed-off-by: Andrew Ross Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- .../availability-and-recovery/snapshots/searchable_snapshot.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index 6a4486d966..ed2fae5cc4 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -96,3 +96,5 @@ The following are known limitations of the searchable snapshots feature: - Accessing data from a remote repository is slower than local disk reads, so higher latencies on search queries are expected. - Many remote object stores charge on a per-request basis for retrieval, so users should closely monitor any costs incurred. - Searching remote data can impact the performance of other queries running on the same node. We recommend that users provision dedicated nodes with the `search` role for performance-critical applications. +- For better search performance, consider [force merging]({{site.url}}{{site.baseurl}}/api-reference/index-apis/force-merge/) indexes into a smaller number of segments before taking a snapshot. For the best performance, at the cost of using compute resources prior to snapshotting, force merge your index into one segment. +- We recommend configuring a maximum ratio of remote data to local disk cache size using the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. See issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676) for a known bug related to this scenario. \ No newline at end of file