Skip to content

Commit

Permalink
Update known limitations for kNN based indexes (#8137)
Browse files Browse the repository at this point in the history
* Update known limitations for kNN based indexes

Signed-off-by: Kunal Kotwani <[email protected]>

* Update _tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md

Signed-off-by: kolchfa-aws <[email protected]>

---------

Signed-off-by: Kunal Kotwani <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
  • Loading branch information
kotwanikunal and kolchfa-aws authored Sep 3, 2024
1 parent 0427252 commit e3576fb
Showing 1 changed file with 2 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -108,4 +108,5 @@ The following are known limitations of the searchable snapshots feature:
- Many remote object stores charge on a per-request basis for retrieval, so users should closely monitor any costs incurred.
- Searching remote data can impact the performance of other queries running on the same node. We recommend that users provision dedicated nodes with the `search` role for performance-critical applications.
- For better search performance, consider [force merging]({{site.url}}{{site.baseurl}}/api-reference/index-apis/force-merge/) indexes into a smaller number of segments before taking a snapshot. For the best performance, at the cost of using compute resources prior to snapshotting, force merge your index into one segment.
- We recommend configuring a maximum ratio of remote data to local disk cache size using the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. For more details on the maximum ratio of remote data, see issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676).
- We recommend configuring a maximum ratio of remote data to local disk cache size using the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. For more details on the maximum ratio of remote data, see issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676).
- k-NN native-engine-based indexes using `faiss` and `nmslib` engines are incompatible with searchable snapshots.

0 comments on commit e3576fb

Please sign in to comment.