From 6e8b5c77f3aa90bc76778c6d3047d58eb2959ee6 Mon Sep 17 00:00:00 2001 From: Andrew Ross Date: Tue, 2 Apr 2024 09:17:07 -0700 Subject: [PATCH 1/3] Add detail to searchable snapshot limits section Signed-off-by: Andrew Ross --- .../availability-and-recovery/snapshots/searchable_snapshot.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index 6a4486d966..f8f11fc7ee 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -96,3 +96,5 @@ The following are known limitations of the searchable snapshots feature: - Accessing data from a remote repository is slower than local disk reads, so higher latencies on search queries are expected. - Many remote object stores charge on a per-request basis for retrieval, so users should closely monitor any costs incurred. - Searching remote data can impact the performance of other queries running on the same node. We recommend that users provision dedicated nodes with the `search` role for performance-critical applications. +- Consider [force merging]({{site.url}}{{site.baseurl}}/api-reference/index-apis/force-merge/) indexes to a smaller number of segments before snapshotting for best search performance. For best performance, at the cost of using compute resources prior to snapshotting, force merge your index to one segment. +- It is recommended to configure a maximum remote data to local disk cache ratio with the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. See issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676) for a known bug related to this scenario. \ No newline at end of file From ce72cdca9b0ad45e456189ebe5f31584d0a25d19 Mon Sep 17 00:00:00 2001 From: Andrew Ross Date: Tue, 2 Apr 2024 09:38:57 -0700 Subject: [PATCH 2/3] Update _tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Andrew Ross --- .../availability-and-recovery/snapshots/searchable_snapshot.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index f8f11fc7ee..bfd5a56ec2 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -96,5 +96,5 @@ The following are known limitations of the searchable snapshots feature: - Accessing data from a remote repository is slower than local disk reads, so higher latencies on search queries are expected. - Many remote object stores charge on a per-request basis for retrieval, so users should closely monitor any costs incurred. - Searching remote data can impact the performance of other queries running on the same node. We recommend that users provision dedicated nodes with the `search` role for performance-critical applications. -- Consider [force merging]({{site.url}}{{site.baseurl}}/api-reference/index-apis/force-merge/) indexes to a smaller number of segments before snapshotting for best search performance. For best performance, at the cost of using compute resources prior to snapshotting, force merge your index to one segment. +- For better search performance, consider [force merging]({{site.url}}{{site.baseurl}}/api-reference/index-apis/force-merge/) indexes into a smaller number of segments before taking a snapshot. For the best performance, at the cost of using compute resources prior to snapshotting, force merge your index into one segment. - It is recommended to configure a maximum remote data to local disk cache ratio with the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. See issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676) for a known bug related to this scenario. \ No newline at end of file From 547bde8dcef483928a8c051652db0e59645f221c Mon Sep 17 00:00:00 2001 From: Andrew Ross Date: Tue, 2 Apr 2024 09:40:25 -0700 Subject: [PATCH 3/3] Reword ratio sentence Signed-off-by: Andrew Ross --- .../availability-and-recovery/snapshots/searchable_snapshot.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index bfd5a56ec2..ed2fae5cc4 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -97,4 +97,4 @@ The following are known limitations of the searchable snapshots feature: - Many remote object stores charge on a per-request basis for retrieval, so users should closely monitor any costs incurred. - Searching remote data can impact the performance of other queries running on the same node. We recommend that users provision dedicated nodes with the `search` role for performance-critical applications. - For better search performance, consider [force merging]({{site.url}}{{site.baseurl}}/api-reference/index-apis/force-merge/) indexes into a smaller number of segments before taking a snapshot. For the best performance, at the cost of using compute resources prior to snapshotting, force merge your index into one segment. -- It is recommended to configure a maximum remote data to local disk cache ratio with the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. See issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676) for a known bug related to this scenario. \ No newline at end of file +- We recommend configuring a maximum ratio of remote data to local disk cache size using the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. See issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676) for a known bug related to this scenario. \ No newline at end of file