From 4f1bd63c028b29aa22f7aac095476ab2ee675c1f Mon Sep 17 00:00:00 2001 From: John Mazanec Date: Mon, 8 Apr 2024 17:30:12 -0700 Subject: [PATCH 01/37] Adds section on product quantization for docs Adds section in vector quantization docs for product quantization. In it, it contains tips for using it as well as memory estimations. Along with this, changed some formatting to make docs easier to write. Signed-off-by: John Mazanec --- _search-plugins/knn/knn-index.md | 4 +- .../knn/knn-vector-quantization.md | 123 +++++++++++++++--- 2 files changed, 110 insertions(+), 17 deletions(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 01b82b425b..3b2c794df9 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -204,7 +204,7 @@ Encoder name | Requires training | Description :--- | :--- | :--- `flat` (Default) | false | Encode vectors as floating-point arrays. This encoding does not reduce memory footprint. `pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388). -`sq` | false | An abbreviation for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM64 architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization). +`sq` | false | An abbreviation for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM64 architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-16-bit-scalar-quantization). #### PQ parameters @@ -322,7 +322,7 @@ If you want to use less memory and index faster than HNSW, while maintaining sim If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop. -You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) in order to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). +You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) in order to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). ### Memory estimation diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 3373f104c2..f0221364d2 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -10,22 +10,42 @@ has_math: true # k-NN vector quantization -By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization. +By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the +vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be +expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines +). To reduce the memory footprint, you can use vector quantization. + +In OpenSearch, there are many varieties of quantization supported. In general, the level of quantization +will provide a tradeoff between the accuracy of the nearest neighbor search and the size of the memory footprint the +vector search system will consume. The supported types include: Byte vectors, 16-bit scalar quantization, and +Product Quantization (PQ). ## Lucene byte vector -Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). +Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount +of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch +index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). -## Faiss scalar quantization +## Faiss 16-bit scalar quantization -Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. - -SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop in performance, including decreased indexing throughput and increased search latencies. +Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within +OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit +vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into +16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the +vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the +memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector +values are large compared to the error introduced by eliminating their two least significant bits. When used with +[SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing +throughput. + +SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop +in performance, including decreased indexing throughput and increased search latencies. {: .warning} ### Using Faiss scalar quantization -To use Faiss scalar quantization, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a k-NN index: +To use Faiss scalar quantization, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a +k-NN index: ```json PUT /test-index @@ -60,14 +80,22 @@ PUT /test-index ``` {% include copy-curl.html %} -Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about `encoder` object parameters, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters). +Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about `encoder` object +parameters, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters). -The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, the preceding request specifies the `clip` parameter. By default, this parameter is `false`, and any vectors containing out-of-range values are rejected. When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will be indexed as a 16-bit vector `[65504.0, -65504.0]`. +The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must +be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, the preceding request specifies the +`clip` parameter. By default, this parameter is `false`, and any vectors containing out-of-range values are rejected. +When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that +they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will +be indexed as a 16-bit vector `[65504.0, -65504.0]`. -We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values may cause a drop in recall. +We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values +may cause a drop in recall. {: .note} -The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that contains out-of-range vector values (because the `clip` parameter is `false` by default): +The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that +contains out-of-range vector values (because the `clip` parameter is `false` by default): ```json PUT /test-index @@ -133,15 +161,17 @@ GET test-index/_search ``` {% include copy-curl.html %} -## Memory estimation +### Memory estimation -In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit vectors require. +In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit +vectors require. #### HNSW memory estimation The memory required for HNSW is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector. -As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: +As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be +estimated as follows: ```bash 1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB @@ -151,9 +181,72 @@ As an example, assume that you have 1 million vectors with a dimension of 256 an The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_vectors) + (4 * nlist * d))` bytes/vector. -As an example, assume that you have 1 million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows: +As an example, assume that you have 1 million vectors with a dimension of 256 and `nlist` of 128. The memory requirement +can be estimated as follows: ```bash 1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB ``` +## Faiss product quantization + +Product quantization is a technique that allows users to represent a vector in a configurable amount of bits. In +general, it can be used to achieve a higher level of compression compared to byte and scalar quantization. Product +quantization works by breaking up vectors into _m_ subvectors, and encoding each subvector with _code_size_ bits. Thus, +the total amount of memory for the vector ends up being `m*code_size` bits, plus overhead. For more details about the +parameters of product quantization, see +[PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). Product quantization is only +supported for the _Faiss_ engine and can be used with either the _HNSW_ or the _IVF_ ANN algorithms. + +### Using Faiss product quantization + +In order to minimize the loss in accuracy, product quantization requires a _training_ step that builds a model based on +the distribution of the data that will be searched over. + +Under the hood, the product quantizer is trained by running k-Means clustering on a set of training vectors for each +sub-vector space and extracts the centroids to be used for the encoding. The training vectors can either be a subset +of the vectors to be ingested, or vectors that have the same distribution and dimension as the vectors to be ingested. +In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend +on which ANN algorithm will be used and how much data will go into the index. For IVF-based indices, a good number of +training vectors to use is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indices, a good number is +`2^code_size*1000` training vectors. See [Faiss's documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) +for more details on how these numbers are arrived at. + +For product quantization, the two parameters that need to be selected are _m_ and _code_size_. _m_ determines how many +sub-vectors the vectors should be broken up into to encode separately - consequently, the _dimension_ needs to be +divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, a good place to +start is setting `code_size = 8` and then tuning _m_ to get the desired tradeoff between memory footprint and recall. + +For an example of setting up an index with product quantization, see [this tutorial]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). + +### Memory Estimation + +While product quantization is meant to represent individual vectors with `m*code_size` bits, in reality the indices +take up more space than this. This is mainly due to the overhead of storing certain code tables and auxilary data +structures. + +Some of the memory formulas depend on the number of segments present. Typically, this is not known beforehand but a good +default value is 300. +{: .note} + +#### HNSW memory estimation + +The memory required for HNSW with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24 + 8 * hnsw_m) * num_vectors + num_segments * (2^pq_code_size * 4 * d))` bytes. + +As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, +`pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows: + +```bash +1.1*((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB +``` + +#### IVF memory estimation + +The memory required for IVF with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24) * num_vectors + num_segments * (2^code_size * 4 * d + 4 * ivf_nlist * d))` bytes. + +As an example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, +`pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows: + +```bash +1.1*((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB +``` From 0370b851aff221a0a708c5a1f9996b89488f2576 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 10 Apr 2024 12:11:40 -0600 Subject: [PATCH 02/37] Update knn-vector-quantization.md Fix formatting Signed-off-by: Melissa Vagi --- .../knn/knn-vector-quantization.md | 104 +++++------------- 1 file changed, 29 insertions(+), 75 deletions(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index f0221364d2..7d2118467b 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -10,42 +10,25 @@ has_math: true # k-NN vector quantization -By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the -vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be -expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines -). To reduce the memory footprint, you can use vector quantization. +By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization. -In OpenSearch, there are many varieties of quantization supported. In general, the level of quantization -will provide a tradeoff between the accuracy of the nearest neighbor search and the size of the memory footprint the -vector search system will consume. The supported types include: Byte vectors, 16-bit scalar quantization, and -Product Quantization (PQ). +In OpenSearch, there are many varieties of quantization supported. In general, the level of quantization will provide a tradeoff between the accuracy of the nearest neighbor search and the size of the memory footprint the vector search system will consume. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ). ## Lucene byte vector -Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount -of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch -index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). +Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). ## Faiss 16-bit scalar quantization -Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within -OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit -vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into -16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the -vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the -memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector -values are large compared to the error introduced by eliminating their two least significant bits. When used with -[SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing -throughput. - -SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop -in performance, including decreased indexing throughput and increased search latencies. +Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector +values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. + +SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop in performance, including decreased indexing throughput and increased search latencies. {: .warning} ### Using Faiss scalar quantization -To use Faiss scalar quantization, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a -k-NN index: +To use Faiss scalar quantization, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a k-NN index: ```json PUT /test-index @@ -80,22 +63,16 @@ PUT /test-index ``` {% include copy-curl.html %} -Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about `encoder` object -parameters, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters). +Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about `encoder` object parameters, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters). + +The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, the preceding request specifies the `clip` parameter. By default, this parameter is `false`, and any vectors containing out-of-range values are rejected. -The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must -be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, the preceding request specifies the -`clip` parameter. By default, this parameter is `false`, and any vectors containing out-of-range values are rejected. -When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that -they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will -be indexed as a 16-bit vector `[65504.0, -65504.0]`. +When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will be indexed as a 16-bit vector `[65504.0, -65504.0]`. -We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values -may cause a drop in recall. +We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values may cause a drop in recall. {: .note} -The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that -contains out-of-range vector values (because the `clip` parameter is `false` by default): +The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that contains out-of-range vector values (because the `clip` parameter is `false` by default). ```json PUT /test-index @@ -133,7 +110,7 @@ PUT /test-index ``` {% include copy-curl.html %} -During ingestion, make sure each dimension of the vector is in the supported range ([-65504.0, 65504.0]): +During ingestion, make sure each dimension of the vector is in the supported range ([-65504.0, 65504.0]). ```json PUT test-index/_doc/1 @@ -143,7 +120,7 @@ PUT test-index/_doc/1 ``` {% include copy-curl.html %} -During querying, there is no range limitation for the query vector: +During querying, there is no range limitation for the query vector. ```json GET test-index/_search @@ -163,15 +140,13 @@ GET test-index/_search ### Memory estimation -In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit -vectors require. +In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit vectors require. #### HNSW memory estimation The memory required for HNSW is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector. -As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be -estimated as follows: +As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: ```bash 1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB @@ -181,8 +156,7 @@ estimated as follows: The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_vectors) + (4 * nlist * d))` bytes/vector. -As an example, assume that you have 1 million vectors with a dimension of 256 and `nlist` of 128. The memory requirement -can be estimated as follows: +As an example, assume that you have 1 million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows: ```bash 1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB @@ -190,51 +164,32 @@ can be estimated as follows: ## Faiss product quantization -Product quantization is a technique that allows users to represent a vector in a configurable amount of bits. In -general, it can be used to achieve a higher level of compression compared to byte and scalar quantization. Product -quantization works by breaking up vectors into _m_ subvectors, and encoding each subvector with _code_size_ bits. Thus, -the total amount of memory for the vector ends up being `m*code_size` bits, plus overhead. For more details about the -parameters of product quantization, see -[PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). Product quantization is only -supported for the _Faiss_ engine and can be used with either the _HNSW_ or the _IVF_ ANN algorithms. +Product quantization is a technique that allows users to represent a vector in a configurable amount of bits. In general, it can be used to achieve a higher level of compression compared to byte and scalar quantization. Product quantization works by breaking up vectors into _m_ subvectors, and encoding each subvector with _code_size_ bits. Thus, the total amount of memory for the vector ends up being `m*code_size` bits, plus overhead. For more details about the parameters of product quantization, see [PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). Product quantization is only supported for the _Faiss_ engine and can be used with either the _HNSW_ or the _IVF_ ANN algorithms. ### Using Faiss product quantization -In order to minimize the loss in accuracy, product quantization requires a _training_ step that builds a model based on -the distribution of the data that will be searched over. +In order to minimize the loss in accuracy, product quantization requires a _training_ step that builds a model based on the distribution of the data that will be searched over. + +Under the hood, the product quantizer is trained by running k-Means clustering on a set of training vectors for each sub-vector space and extracts the centroids to be used for the encoding. The training vectors can either be a subset of the vectors to be ingested, or vectors that have the same distribution and dimension as the vectors to be ingested. -Under the hood, the product quantizer is trained by running k-Means clustering on a set of training vectors for each -sub-vector space and extracts the centroids to be used for the encoding. The training vectors can either be a subset -of the vectors to be ingested, or vectors that have the same distribution and dimension as the vectors to be ingested. -In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend -on which ANN algorithm will be used and how much data will go into the index. For IVF-based indices, a good number of -training vectors to use is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indices, a good number is -`2^code_size*1000` training vectors. See [Faiss's documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) -for more details on how these numbers are arrived at. +In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm will be used and how much data will go into the index. For IVF-based indices, a good number of training vectors to use is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indices, a good number is `2^code_size*1000` training vectors. See [Faiss's documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more details on how these numbers are arrived at. -For product quantization, the two parameters that need to be selected are _m_ and _code_size_. _m_ determines how many -sub-vectors the vectors should be broken up into to encode separately - consequently, the _dimension_ needs to be -divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, a good place to -start is setting `code_size = 8` and then tuning _m_ to get the desired tradeoff between memory footprint and recall. +For product quantization, the two parameters that need to be selected are _m_ and _code_size_. _m_ determines how many sub-vectors the vectors should be broken up into to encode separately - consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, a good place to start is setting `code_size = 8` and then tuning _m_ to get the desired tradeoff between memory footprint and recall. For an example of setting up an index with product quantization, see [this tutorial]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). ### Memory Estimation -While product quantization is meant to represent individual vectors with `m*code_size` bits, in reality the indices -take up more space than this. This is mainly due to the overhead of storing certain code tables and auxilary data -structures. +While product quantization is meant to represent individual vectors with `m*code_size` bits, in reality the indices take up more space than this. This is mainly due to the overhead of storing certain code tables and auxilary data structures. -Some of the memory formulas depend on the number of segments present. Typically, this is not known beforehand but a good -default value is 300. +Some of the memory formulas depend on the number of segments present. Typically, this is not known beforehand but a good default value is 300. {: .note} #### HNSW memory estimation The memory required for HNSW with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24 + 8 * hnsw_m) * num_vectors + num_segments * (2^pq_code_size * 4 * d))` bytes. -As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, -`pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows: +As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, `pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows: ```bash 1.1*((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB @@ -244,8 +199,7 @@ As an example, assume that you have 1 million vectors with a dimension of 256, ` The memory required for IVF with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24) * num_vectors + num_segments * (2^code_size * 4 * d + 4 * ivf_nlist * d))` bytes. -As an example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, -`pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows: +As an example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, `pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows: ```bash 1.1*((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB From 548805b1ff03992da681e526a3146160d3c60c60 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 10 Apr 2024 12:15:47 -0600 Subject: [PATCH 03/37] Update knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 7d2118467b..bab42fda56 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -20,8 +20,9 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce ## Faiss 16-bit scalar quantization -Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector -values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. +Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. + +At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop in performance, including decreased indexing throughput and increased search latencies. {: .warning} From 167cb960dbf325008da28ce375c19a20aa045427 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 10 Apr 2024 13:36:45 -0600 Subject: [PATCH 04/37] Update knn-vector-quantization.md Define abbreviation on first mention Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index bab42fda56..3853616b38 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -145,7 +145,7 @@ In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer #### HNSW memory estimation -The memory required for HNSW is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector. +The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector. As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: From be1c83603163bd86343e45ee50c66cc5e65295ab Mon Sep 17 00:00:00 2001 From: John Mazanec Date: Wed, 10 Apr 2024 14:13:58 -0700 Subject: [PATCH 05/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Melissa Vagi Signed-off-by: John Mazanec --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 3853616b38..4724e1078b 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -200,7 +200,7 @@ As an example, assume that you have 1 million vectors with a dimension of 256, ` The memory required for IVF with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24) * num_vectors + num_segments * (2^code_size * 4 * d + 4 * ivf_nlist * d))` bytes. -As an example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, `pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows: +For example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, `pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows: ```bash 1.1*((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB From 255a25bcc937a8155c72b775b3d9919cc060546b Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:35:54 -0600 Subject: [PATCH 06/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 4724e1078b..a20e2a0c95 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -12,7 +12,7 @@ has_math: true By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization. -In OpenSearch, there are many varieties of quantization supported. In general, the level of quantization will provide a tradeoff between the accuracy of the nearest neighbor search and the size of the memory footprint the vector search system will consume. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ). +In OpenSearch, many varieties of quantization are supported. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint the vector search system will consume. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ). ## Lucene byte vector From 08194cbcfbee0ac2774cf87df04065874e9208be Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:36:05 -0600 Subject: [PATCH 07/37] Update _search-plugins/knn/knn-index.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 3b2c794df9..3cb99bad40 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -322,7 +322,7 @@ If you want to use less memory and index faster than HNSW, while maintaining sim If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop. -You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) in order to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). +You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) is recommended in order to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). ### Memory estimation From 0a0e10515015f9e082620635618614af51e46ef5 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:37:10 -0600 Subject: [PATCH 08/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index a20e2a0c95..2e893ef6e3 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -169,7 +169,7 @@ Product quantization is a technique that allows users to represent a vector in a ### Using Faiss product quantization -In order to minimize the loss in accuracy, product quantization requires a _training_ step that builds a model based on the distribution of the data that will be searched over. +To minimize the loss in accuracy, PQ requires a _training_ step that builds a model based on the distribution of the data that will be searched over. Under the hood, the product quantizer is trained by running k-Means clustering on a set of training vectors for each sub-vector space and extracts the centroids to be used for the encoding. The training vectors can either be a subset of the vectors to be ingested, or vectors that have the same distribution and dimension as the vectors to be ingested. From 83503a8d4b8bd5413c2f16c5c8e71306494a649a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:37:26 -0600 Subject: [PATCH 09/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 2e893ef6e3..426efa9c7a 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -177,7 +177,7 @@ In OpenSearch, the training vectors need to be present in an index. In general, For product quantization, the two parameters that need to be selected are _m_ and _code_size_. _m_ determines how many sub-vectors the vectors should be broken up into to encode separately - consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, a good place to start is setting `code_size = 8` and then tuning _m_ to get the desired tradeoff between memory footprint and recall. -For an example of setting up an index with product quantization, see [this tutorial]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). +For an example of setting up an index with PQ, see the [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model) tutorial. ### Memory Estimation From 1e413bcb5a45bd0361568b8c2dac3186b8b24ed0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:37:38 -0600 Subject: [PATCH 10/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 426efa9c7a..c23f482b7a 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -181,7 +181,7 @@ For an example of setting up an index with PQ, see the [Building a k-NN index fr ### Memory Estimation -While product quantization is meant to represent individual vectors with `m*code_size` bits, in reality the indices take up more space than this. This is mainly due to the overhead of storing certain code tables and auxilary data structures. +While PQ is meant to represent individual vectors with `m*code_size` bits, in reality the indexes take up more space. This is mainly due to the overhead of storing certain code tables and auxiliary data structures. Some of the memory formulas depend on the number of segments present. Typically, this is not known beforehand but a good default value is 300. {: .note} From 050064e6a8e897721da54479180e0a7963b59e64 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:37:49 -0600 Subject: [PATCH 11/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index c23f482b7a..816712cc90 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -190,7 +190,7 @@ Some of the memory formulas depend on the number of segments present. Typically, The memory required for HNSW with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24 + 8 * hnsw_m) * num_vectors + num_segments * (2^pq_code_size * 4 * d))` bytes. -As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, `pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows: +As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows: ```bash 1.1*((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB From f2f42ee894dc83119d581a2f6492b63e9af57803 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:38:11 -0600 Subject: [PATCH 12/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 816712cc90..49427e98fc 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -121,7 +121,7 @@ PUT test-index/_doc/1 ``` {% include copy-curl.html %} -During querying, there is no range limitation for the query vector. +During querying, the query vector has no range limitation. ```json GET test-index/_search From 10c35ebada001a0f80e394e426e04818bb4804b3 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:38:27 -0600 Subject: [PATCH 13/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 49427e98fc..b050257f8a 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -165,7 +165,7 @@ As an example, assume that you have 1 million vectors with a dimension of 256 an ## Faiss product quantization -Product quantization is a technique that allows users to represent a vector in a configurable amount of bits. In general, it can be used to achieve a higher level of compression compared to byte and scalar quantization. Product quantization works by breaking up vectors into _m_ subvectors, and encoding each subvector with _code_size_ bits. Thus, the total amount of memory for the vector ends up being `m*code_size` bits, plus overhead. For more details about the parameters of product quantization, see [PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). Product quantization is only supported for the _Faiss_ engine and can be used with either the _HNSW_ or the _IVF_ ANN algorithms. +PQ is a technique used to represent a vector in a configurable amount of bits. In general, it can be used to achieve a higher level of compression compared to byte and scalar quantization. PQ works by breaking up vectors into _m_ subvectors and encoding each subvector with _code_size_ bits. Thus, the total amount of memory for the vector ends up being `m*code_size` bits, plus overhead. For details about the parameters, see [PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). PQ is only supported for the _Faiss_ engine and can be used with either the _HNSW_ or the _IVF_ ANN (Approximate Nearest Neighbor) algorithms. ### Using Faiss product quantization From 22508e29d3ed37cd7f6425cec51414019eac84d4 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:42:12 -0600 Subject: [PATCH 14/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index b050257f8a..da67ccbc59 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -171,7 +171,7 @@ PQ is a technique used to represent a vector in a configurable amount of bits. I To minimize the loss in accuracy, PQ requires a _training_ step that builds a model based on the distribution of the data that will be searched over. -Under the hood, the product quantizer is trained by running k-Means clustering on a set of training vectors for each sub-vector space and extracts the centroids to be used for the encoding. The training vectors can either be a subset of the vectors to be ingested, or vectors that have the same distribution and dimension as the vectors to be ingested. +The product quantizer is trained by running k-means clustering on a set of training vectors for each sub-vector space and extracts the centroids to be used for the encoding. The training vectors can be either a subset of the vectors to be ingested or vectors that have the same distribution and dimension as the vectors to be ingested. In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm will be used and how much data will go into the index. For IVF-based indices, a good number of training vectors to use is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indices, a good number is `2^code_size*1000` training vectors. See [Faiss's documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more details on how these numbers are arrived at. From 6d8e9d1183a0fd71c79c268118d6386f7739e45a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:42:42 -0600 Subject: [PATCH 15/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index da67ccbc59..d5593f1153 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -173,7 +173,7 @@ To minimize the loss in accuracy, PQ requires a _training_ step that builds a mo The product quantizer is trained by running k-means clustering on a set of training vectors for each sub-vector space and extracts the centroids to be used for the encoding. The training vectors can be either a subset of the vectors to be ingested or vectors that have the same distribution and dimension as the vectors to be ingested. -In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm will be used and how much data will go into the index. For IVF-based indices, a good number of training vectors to use is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indices, a good number is `2^code_size*1000` training vectors. See [Faiss's documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more details on how these numbers are arrived at. +In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm will be used and how much data will go into the index. For IVF-based indices, a good number of training vectors to use is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indexes, a good number is `2^code_size*1000` training vectors. See [Faiss's documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more details about the methodology behind calculating these figures. For product quantization, the two parameters that need to be selected are _m_ and _code_size_. _m_ determines how many sub-vectors the vectors should be broken up into to encode separately - consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, a good place to start is setting `code_size = 8` and then tuning _m_ to get the desired tradeoff between memory footprint and recall. From bda3d4f7c24556a2ee1917a6cb7a1bb48498aa25 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:43:28 -0600 Subject: [PATCH 16/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index d5593f1153..819515ded6 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -175,7 +175,7 @@ The product quantizer is trained by running k-means clustering on a set of train In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm will be used and how much data will go into the index. For IVF-based indices, a good number of training vectors to use is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indexes, a good number is `2^code_size*1000` training vectors. See [Faiss's documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more details about the methodology behind calculating these figures. -For product quantization, the two parameters that need to be selected are _m_ and _code_size_. _m_ determines how many sub-vectors the vectors should be broken up into to encode separately - consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, a good place to start is setting `code_size = 8` and then tuning _m_ to get the desired tradeoff between memory footprint and recall. +For PQ, the two parameters that need to be selected are _m_ and _code_size_. _m_ determines how many sub-vectors the vectors should be split to encode separately. Consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, a good place to start is setting `code_size = 8` and then tuning _m_ to get the desired trade-off between memory footprint and recall. For an example of setting up an index with PQ, see the [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model) tutorial. From 7e4e95693477868318a8401c676d1b75bdbc31ca Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 12 Apr 2024 15:50:46 -0600 Subject: [PATCH 17/37] Update knn-index.md Formatting and copyedits Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-index.md | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 3cb99bad40..9bf75dbf9f 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -44,7 +44,7 @@ PUT /test-index ## Lucene byte vector -Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). +Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). ## SIMD optimization for the Faiss engine @@ -137,10 +137,7 @@ For more information about setting these parameters, refer to the [Faiss documen #### IVF training requirements -The IVF algorithm requires a training step. To create an index that uses IVF, you need to train a model with the -[Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model), passing the IVF method definition. IVF requires that, at a minimum, there should be `nlist` training -data points, but it is [recommended that you use more](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index#how-big-is-the-dataset). -Training data can be composed of either the same data that is going to be ingested or a separate dataset. +The IVF algorithm requires a training step. To create an index that uses IVF, you need to train a model with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model), passing the IVF method definition. IVF requires that, at a minimum, there should be `nlist` training data points, but it is [recommended that you use more](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index#how-big-is-the-dataset). Training data can be composed of either the same data that is going to be ingested or a separate dataset. ### Supported Lucene methods @@ -175,8 +172,7 @@ An index created in OpenSearch version 2.11 or earlier will still use the old `e ### Supported Faiss encoders -You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. The k-NN plugin currently supports the -`flat`, `pq`, and `sq` encoders in the Faiss library. +You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. The k-NN plugin currently supports the `flat`, `pq`, and `sq` encoders in the Faiss library. The following example method definition specifies the `hnsw` method and a `pq` encoder: @@ -314,11 +310,11 @@ The following example uses the `ivf` method with an `sq` encoder of type `fp16`: ### Choosing the right method -There are a lot of options to choose from when building your `knn_vector` field. To determine the correct methods and parameters to choose, you should first understand what requirements you have for your workload and what trade-offs you are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, (4) indexing latency. +There are several options to choose from when building your `knn_vector` field. To determine the correct methods and parameters to choose, you should first understand what requirements you have for your workload and what trade-offs you are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, and (4) indexing latency. -If memory is not a concern, HNSW offers a very strong query latency/query quality tradeoff. +If memory is not a concern, HNSW offers a strong query latency/query quality trade-off. -If you want to use less memory and index faster than HNSW, while maintaining similar query quality, you should evaluate IVF. +If you want to use less memory and index faster than HNSW while maintaining similar query quality, you should evaluate IVF. If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop. @@ -326,9 +322,7 @@ You can reduce the memory footprint by a factor of 2, with a minimal loss in sea ### Memory estimation -In a typical OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates -native library indexes to a portion of the remaining RAM. This portion's size is determined by -the `circuit_breaker_limit` cluster setting. By default, the limit is set at 50%. +In a typical OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates native library indexes to a portion of the remaining RAM. This portion's size is determined by the `circuit_breaker_limit` cluster setting. By default, the limit is set at 50%. Having a replica doubles the total number of vectors. {: .note } From 3cf22dd7130bc2cf9c881172ca8973b466ae0b86 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 15 Apr 2024 14:53:56 -0600 Subject: [PATCH 18/37] Update knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 819515ded6..544737f823 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -165,7 +165,7 @@ As an example, assume that you have 1 million vectors with a dimension of 256 an ## Faiss product quantization -PQ is a technique used to represent a vector in a configurable amount of bits. In general, it can be used to achieve a higher level of compression compared to byte and scalar quantization. PQ works by breaking up vectors into _m_ subvectors and encoding each subvector with _code_size_ bits. Thus, the total amount of memory for the vector ends up being `m*code_size` bits, plus overhead. For details about the parameters, see [PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). PQ is only supported for the _Faiss_ engine and can be used with either the _HNSW_ or the _IVF_ ANN (Approximate Nearest Neighbor) algorithms. +PQ is a technique used to represent a vector in a configurable amount of bits. In general, it can be used to achieve a higher level of compression compared to byte and scalar quantization. PQ works by breaking up vectors into _m_ subvectors and encoding each subvector with _code_size_ bits. Thus, the total amount of memory for the vector ends up being `m*code_size` bits, plus overhead. For details about the parameters, see [PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). PQ is only supported for the _Faiss_ engine and can be used with either the _HNSW_ or the _IVF_ ANN (Approximate Nearest Neighbor) algorithms. ### Using Faiss product quantization From 5231b87d9147049858ccfb51abf2af4a9d1f1178 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:47:06 -0600 Subject: [PATCH 19/37] Update _search-plugins/knn/knn-index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 9bf75dbf9f..d14911b68a 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -137,7 +137,7 @@ For more information about setting these parameters, refer to the [Faiss documen #### IVF training requirements -The IVF algorithm requires a training step. To create an index that uses IVF, you need to train a model with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model), passing the IVF method definition. IVF requires that, at a minimum, there should be `nlist` training data points, but it is [recommended that you use more](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index#how-big-is-the-dataset). Training data can be composed of either the same data that is going to be ingested or a separate dataset. +The IVF algorithm requires a training step. To create an index that uses IVF, you need to train a model with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model), passing the IVF method definition. IVF requires that, at a minimum, there are `nlist` training data points, but it is [recommended that you use more than this](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index#how-big-is-the-dataset). Training data can be composed of either the same data that is going to be ingested or a separate dataset. ### Supported Lucene methods From be1e447b68508d1610994d41b173299233fef1c2 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:47:26 -0600 Subject: [PATCH 20/37] Update _search-plugins/knn/knn-index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index d14911b68a..eaef112ab4 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -310,7 +310,7 @@ The following example uses the `ivf` method with an `sq` encoder of type `fp16`: ### Choosing the right method -There are several options to choose from when building your `knn_vector` field. To determine the correct methods and parameters to choose, you should first understand what requirements you have for your workload and what trade-offs you are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, and (4) indexing latency. +There are several options to choose from when building your `knn_vector` field. To determine the correct methods and parameters, you should first understand the requirements of your workload and what trade-offs you are willing to make. Factors to consider are (1) query latency, (2) query quality, (3) memory limits, and (4) indexing latency. If memory is not a concern, HNSW offers a strong query latency/query quality trade-off. From c78bc7590819d1f07804d07dc6eaddcd82b8bf36 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:47:43 -0600 Subject: [PATCH 21/37] Update _search-plugins/knn/knn-index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index eaef112ab4..25c838dd2e 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -314,7 +314,7 @@ There are several options to choose from when building your `knn_vector` field. If memory is not a concern, HNSW offers a strong query latency/query quality trade-off. -If you want to use less memory and index faster than HNSW while maintaining similar query quality, you should evaluate IVF. +If you want to use less memory and increase indexing speed as compared to HNSW while maintaining similar query quality, you should evaluate IVF. If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop. From 036db271236f8a9c676a9bf0c50beecc61897c51 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:48:02 -0600 Subject: [PATCH 22/37] Update _search-plugins/knn/knn-index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 25c838dd2e..8e8b4f44cf 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -318,7 +318,7 @@ If you want to use less memory and increase indexing speed as compared to HNSW w If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop. -You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) is recommended in order to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). +You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). ### Memory estimation From 08a985776793061bdf4a2a50b19c736ffa251d0e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:48:18 -0600 Subject: [PATCH 23/37] Update _search-plugins/knn/knn-index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md index 8e8b4f44cf..9165ba3338 100644 --- a/_search-plugins/knn/knn-index.md +++ b/_search-plugins/knn/knn-index.md @@ -322,7 +322,7 @@ You can reduce the memory footprint by a factor of 2, with a minimal loss in sea ### Memory estimation -In a typical OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates native library indexes to a portion of the remaining RAM. This portion's size is determined by the `circuit_breaker_limit` cluster setting. By default, the limit is set at 50%. +In a typical OpenSearch cluster, a certain portion of RAM is reserved for the JVM heap. The k-NN plugin allocates native library indexes to a portion of the remaining RAM. This portion's size is determined by the `circuit_breaker_limit` cluster setting. By default, the limit is set to 50%. Having a replica doubles the total number of vectors. {: .note } From 55bf87c0e9459f57ff4fc8449c1a602c07accee3 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:48:44 -0600 Subject: [PATCH 24/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 544737f823..dd09a754ec 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -12,7 +12,7 @@ has_math: true By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization. -In OpenSearch, many varieties of quantization are supported. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint the vector search system will consume. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ). +OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ). ## Lucene byte vector From 38aca8db68ba9306525569688e9419c040a170d0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:49:14 -0600 Subject: [PATCH 25/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index dd09a754ec..279003b291 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -73,7 +73,7 @@ When `clip` is set to `true` (as in the preceding request), out-of-range vector We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values may cause a drop in recall. {: .note} -The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that contains out-of-range vector values (because the `clip` parameter is `false` by default). +The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that contains out-of-range vector values (because the `clip` parameter is `false` by default): ```json PUT /test-index From 5591d8871896739ecd7bd5ae1119590e0a551ae3 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:49:26 -0600 Subject: [PATCH 26/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 279003b291..478e9182f0 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -111,7 +111,7 @@ PUT /test-index ``` {% include copy-curl.html %} -During ingestion, make sure each dimension of the vector is in the supported range ([-65504.0, 65504.0]). +During ingestion, make sure each vector dimension is in the supported range ([-65504.0, 65504.0]). ```json PUT test-index/_doc/1 From 2b51288f6e8287ecf7843d4bafbd27df6785f93a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:49:45 -0600 Subject: [PATCH 27/37] Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 478e9182f0..e371006e85 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -121,7 +121,7 @@ PUT test-index/_doc/1 ``` {% include copy-curl.html %} -During querying, the query vector has no range limitation. +During querying, the query vector has no range limitation: ```json GET test-index/_search From 2c389a35a0513c20621f2b7edd2eaeb496945c77 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:50:12 -0600 Subject: [PATCH 28/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index e371006e85..f9b750d9b4 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -165,7 +165,7 @@ As an example, assume that you have 1 million vectors with a dimension of 256 an ## Faiss product quantization -PQ is a technique used to represent a vector in a configurable amount of bits. In general, it can be used to achieve a higher level of compression compared to byte and scalar quantization. PQ works by breaking up vectors into _m_ subvectors and encoding each subvector with _code_size_ bits. Thus, the total amount of memory for the vector ends up being `m*code_size` bits, plus overhead. For details about the parameters, see [PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). PQ is only supported for the _Faiss_ engine and can be used with either the _HNSW_ or the _IVF_ ANN (Approximate Nearest Neighbor) algorithms. +PQ is a technique used to represent a vector in a configurable amount of bits. In general, it can be used to achieve a higher level of compression as compared to byte or scalar quantization. PQ works by separating vectors into _m_ subvectors and encoding each subvector with _code_size_ bits. Thus, the total amount of memory for the vector is `m*code_size` bits, plus overhead. For details about the parameters, see [PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). PQ is only supported for the _Faiss_ engine and can be used with either the _HNSW_ or _IVF_ approximate nearest neighbor (ANN) algorithms. ### Using Faiss product quantization From 5b0f258789a63b259e607175da46d37c6a069718 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:50:22 -0600 Subject: [PATCH 29/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index f9b750d9b4..3989c30620 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -171,7 +171,7 @@ PQ is a technique used to represent a vector in a configurable amount of bits. I To minimize the loss in accuracy, PQ requires a _training_ step that builds a model based on the distribution of the data that will be searched over. -The product quantizer is trained by running k-means clustering on a set of training vectors for each sub-vector space and extracts the centroids to be used for the encoding. The training vectors can be either a subset of the vectors to be ingested or vectors that have the same distribution and dimension as the vectors to be ingested. +The product quantizer is trained by running k-means clustering on a set of training vectors for each subvector space and extracts the centroids to be used for encoding. The training vectors can be either a subset of the vectors to be ingested or vectors that have the same distribution and dimension as the vectors to be ingested. In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm will be used and how much data will go into the index. For IVF-based indices, a good number of training vectors to use is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indexes, a good number is `2^code_size*1000` training vectors. See [Faiss's documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more details about the methodology behind calculating these figures. From d054f3fdc318cbd1ff9cc5700d79fcc94979e196 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:50:46 -0600 Subject: [PATCH 30/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 3989c30620..d59c59906c 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -173,7 +173,7 @@ To minimize the loss in accuracy, PQ requires a _training_ step that builds a mo The product quantizer is trained by running k-means clustering on a set of training vectors for each subvector space and extracts the centroids to be used for encoding. The training vectors can be either a subset of the vectors to be ingested or vectors that have the same distribution and dimension as the vectors to be ingested. -In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm will be used and how much data will go into the index. For IVF-based indices, a good number of training vectors to use is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indexes, a good number is `2^code_size*1000` training vectors. See [Faiss's documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more details about the methodology behind calculating these figures. +In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm is used and how much data will be stored in the index. For IVF-based indexes, a recommended number of training vectors is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indexes, a recommended number is `2^code_size*1000`. See the [Faiss documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more information about the methodology used to calculate these figures. For PQ, the two parameters that need to be selected are _m_ and _code_size_. _m_ determines how many sub-vectors the vectors should be split to encode separately. Consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, a good place to start is setting `code_size = 8` and then tuning _m_ to get the desired trade-off between memory footprint and recall. From 9cc412ff8cc18fca269e3756782c223f1b8a4d39 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:51:01 -0600 Subject: [PATCH 31/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index d59c59906c..fa21886702 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -175,7 +175,7 @@ The product quantizer is trained by running k-means clustering on a set of train In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm is used and how much data will be stored in the index. For IVF-based indexes, a recommended number of training vectors is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indexes, a recommended number is `2^code_size*1000`. See the [Faiss documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more information about the methodology used to calculate these figures. -For PQ, the two parameters that need to be selected are _m_ and _code_size_. _m_ determines how many sub-vectors the vectors should be split to encode separately. Consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, a good place to start is setting `code_size = 8` and then tuning _m_ to get the desired trade-off between memory footprint and recall. +For PQ, both _m_ and _code_size_ need to be selected. _m_ determines how many subvectors the vectors should be split to encode separately. Consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, we recommend a setting of `code_size = 8` and then tuning _m_ to get the desired trade-off between memory footprint and recall. For an example of setting up an index with PQ, see the [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model) tutorial. From ff0bebc6cbd4df73fc93f8c982588388dd23649d Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:51:08 -0600 Subject: [PATCH 32/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index fa21886702..39ea1fb7f5 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -181,7 +181,7 @@ For an example of setting up an index with PQ, see the [Building a k-NN index fr ### Memory Estimation -While PQ is meant to represent individual vectors with `m*code_size` bits, in reality the indexes take up more space. This is mainly due to the overhead of storing certain code tables and auxiliary data structures. +While PQ is meant to represent individual vectors with `m*code_size` bits, in reality, the indexes consume more space. This is mainly due to the overhead of storing certain code tables and auxiliary data structures. Some of the memory formulas depend on the number of segments present. Typically, this is not known beforehand but a good default value is 300. {: .note} From 5b6872177f5c12be4e012a2bbe2a2097f5333b92 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:51:17 -0600 Subject: [PATCH 33/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 39ea1fb7f5..b87b52f61e 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -183,7 +183,7 @@ For an example of setting up an index with PQ, see the [Building a k-NN index fr While PQ is meant to represent individual vectors with `m*code_size` bits, in reality, the indexes consume more space. This is mainly due to the overhead of storing certain code tables and auxiliary data structures. -Some of the memory formulas depend on the number of segments present. Typically, this is not known beforehand but a good default value is 300. +Some of the memory formulas depend on the number of segments present. This is not typically known beforehand, but a recommended default value is 300. {: .note} #### HNSW memory estimation From b15ac62778b72f3a0fdf0453d7910813ce110bfb Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:51:29 -0600 Subject: [PATCH 34/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index b87b52f61e..483e5b95b8 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -200,7 +200,7 @@ As an example, assume that you have 1 million vectors with a dimension of 256, ` The memory required for IVF with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24) * num_vectors + num_segments * (2^code_size * 4 * d + 4 * ivf_nlist * d))` bytes. -For example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, `pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows: +For example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, `pq_code_size` of 8, and 100 segments. The memory requirement can be estimated as follows: ```bash 1.1*((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB From 072f1e6709aaf0976bb9aee4633cb97991cf7c06 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:51:37 -0600 Subject: [PATCH 35/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 483e5b95b8..733102f0c4 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -179,7 +179,7 @@ For PQ, both _m_ and _code_size_ need to be selected. _m_ determines how many su For an example of setting up an index with PQ, see the [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model) tutorial. -### Memory Estimation +### Memory estimation While PQ is meant to represent individual vectors with `m*code_size` bits, in reality, the indexes consume more space. This is mainly due to the overhead of storing certain code tables and auxiliary data structures. From ead048e0745f4db1518e20129275bd30862cab85 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:51:46 -0600 Subject: [PATCH 36/37] Update _search-plugins/knn/knn-vector-quantization.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 733102f0c4..05cab2ff2e 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -169,7 +169,7 @@ PQ is a technique used to represent a vector in a configurable amount of bits. I ### Using Faiss product quantization -To minimize the loss in accuracy, PQ requires a _training_ step that builds a model based on the distribution of the data that will be searched over. +To minimize loss in accuracy, PQ requires a _training_ step that builds a model based on the distribution of the data that will be searched. The product quantizer is trained by running k-means clustering on a set of training vectors for each subvector space and extracts the centroids to be used for encoding. The training vectors can be either a subset of the vectors to be ingested or vectors that have the same distribution and dimension as the vectors to be ingested. From 2721679a6ae5e0e91b8b06640eab16b7f04508b6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 16 Apr 2024 08:58:07 -0600 Subject: [PATCH 37/37] Update knn-vector-quantization.md Address editorial feedback Signed-off-by: Melissa Vagi --- _search-plugins/knn/knn-vector-quantization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 05cab2ff2e..96db75b3eb 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -175,7 +175,7 @@ The product quantizer is trained by running k-means clustering on a set of train In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend on which ANN algorithm is used and how much data will be stored in the index. For IVF-based indexes, a recommended number of training vectors is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indexes, a recommended number is `2^code_size*1000`. See the [Faiss documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) for more information about the methodology used to calculate these figures. -For PQ, both _m_ and _code_size_ need to be selected. _m_ determines how many subvectors the vectors should be split to encode separately. Consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, we recommend a setting of `code_size = 8` and then tuning _m_ to get the desired trade-off between memory footprint and recall. +For PQ, both _m_ and _code_size_ need to be selected. _m_ determines the number of subvectors into which vectors should be split for separate encoding. Consequently, the _dimension_ needs to be divisible by _m_. _code_size_ determines the number of bits used to encode each subvector. In general, we recommend a setting of `code_size = 8` and then tuning _m_ to get the desired trade-off between memory footprint and recall. For an example of setting up an index with PQ, see the [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model) tutorial.