Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
  • Loading branch information
kolchfa-aws and natebower authored Nov 25, 2024
1 parent 3f019b2 commit 9b6853d
Showing 1 changed file with 18 additions and 18 deletions.
36 changes: 18 additions & 18 deletions _posts/2024-11-22-faiss-byte-vector.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: post
title: Introducing byte vector support for Faiss in OpenSearch vector engine
title: Introducing byte vector support for Faiss in the OpenSearch vector engine
authors:
- naveen
- navneev
Expand All @@ -20,9 +20,9 @@ The growing popularity of generative AI and large language models (LLMs) has led
Using byte vectors instead of float vectors for vector search provides significant improvements in memory efficiency and performance. This is especially beneficial for large-scale vector databases or environments with limited resources. Faiss byte vectors enable you to store quantized embeddings, significantly reducing memory consumption and lowering costs. This approach typically results in only minimal recall loss compared to using full-precision (float) vectors.


## How to use a Faiss byte vector?
## How to use a Faiss byte vector

A byte vector is a compact representation where each dimension is a signed 8-bit integer ranging from -128 to 127. To use byte vectors, you must convert your input vectors, typically in `float` format, into the `byte` type before ingestion. This process requires quantization techniques, which compress float vectors while maintaining essential data characteristics. For more information, see [Quantization techniques](https://opensearch.org/docs/latest/field-types/supported-field-types/knn-vector#quantization-techniques).
A byte vector is a compact vector representation in which each dimension is a signed 8-bit integer ranging from -128 to 127. To use byte vectors, you must convert your input vectors, typically in `float` format, into the `byte` type before ingestion. This process requires quantization techniques, which compress float vectors while maintaining essential data characteristics. For more information, see [Quantization techniques](https://opensearch.org/docs/latest/field-types/supported-field-types/knn-vector#quantization-techniques).

Check failure on line 25 in _posts/2024-11-22-faiss-byte-vector.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: informat. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: informat. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-11-22-faiss-byte-vector.md", "range": {"start": {"line": 25, "column": 387}}}, "severity": "ERROR"}

To use a `byte` vector, set the `data_type` parameter to `byte` when creating a k-NN index (the default value of the `data_type` parameter is `float`):

Expand Down Expand Up @@ -56,7 +56,7 @@ PUT test-index
}
```

During ingestion, make sure each dimension of the vector is within the supported [-128, 127] range:
During ingestion, make sure that each dimension of the vector is within the supported [-128, 127] range:

```json
PUT test-index/_doc/1
Expand All @@ -72,7 +72,7 @@ PUT test-index/_doc/2
}
```

During querying, make sure the query vector is also within the byte range:
During querying, make sure that the query vector is also within the byte range:

```json
GET test-index/_search
Expand All @@ -89,28 +89,28 @@ GET test-index/_search
}
```

**Note**: When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize reducing memory usage in exchange for a minimal loss in recall.
**Note**: When using `byte` vectors, expect some loss of recall precision as compared to using `float` vectors. Byte vectors are useful for large-scale applications and use cases that prioritize reducing memory usage in exchange for a minimal loss in recall.


## Benchmarking results

We ran benchmarking tests on popular datasets using OpenSearch Benchmark to compare recall, indexing, and search performance between float vectors and byte vectors using Faiss HNSW.
We used OpenSearch Benchmark to run benchmarking tests on popular datasets to compare recall, indexing, and search performance between float vectors and byte vectors using Faiss HNSW.

**Note**: Without SIMD optimization (such as AVX2 or NEON) or when AVX2 is disabled (on x86 architectures), the quantization process introduces additional latency. For details on AVX2-compatible processors, see [CPUs with AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX2). In an AWS environment, all community Amazon Machine Images (AMIs) with [HVM](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtualization_types.html) support AVX2 optimization.
**Note**: Without SIMD optimization (such as AVX2 or NEON) or when AVX2 is disabled (on x86 architectures), the quantization process introduces additional latency. For more information about AVX2-compatible processors, see [CPUs with AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX2). In an AWS environment, all community Amazon Machine Images (AMIs) with [HVM](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtualization_types.html) support AVX2 optimization.

These tests were conducted on a single-node cluster, except for the cohere-10m dataset, which used two `r5.2xlarge` instances.

### Configuration

The following table lists the cluster configuration for benchmarking tests.
The following table lists the cluster configuration for the benchmarking tests.

|`m` |`ef_construction` |`ef_search` |Replicas |Primary shards |Indexing clients |
|--- |--- |--- |--- |--- |--- |
|16 |100 |100 |0 |8 |16 |

The following table lists the dataset configuration for benchmarking tests.
The following table lists the dataset configuration for the benchmarking tests.

|Dataset ID |Dataset |Dimension of vector |Data size |Number of queries |Training data range |Query data range |Space type |
|Dataset ID |Dataset |Vector dimension |Data size |Number of queries |Training data range |Query data range |Space type |
|--- |--- |--- |--- |--- |--- |--- |--- |
|**Dataset 1** |gist-960-euclidean |960 |1,000,000 |1,000 |[0.0, 1.48] |[0.0, 0.729] |L2 |
|**Dataset 2** |cohere-ip-10m |768 |10,000,000 |10,000 |[-4.142334, 5.5211477] |[-4.109505, 5.4809895] |innerproduct |
Expand All @@ -119,7 +119,7 @@ The following table lists the dataset configuration for benchmarking tests.

### Recall, memory, and indexing results

|Dataset ID |Faiss HNSW recall@100 |Faiss HNSW-byte recall@100 |% Reduction in recall |Faiss HNSW memory usage (GB) |Faiss HNSW byte memory usage (GB) |% Reduction in memory |Faiss HNSW mean indexing throughput (docs/sec) |Faiss HNSW byte mean indexing throughput (docs/sec) |% Gain in indexing throughput |
|Dataset ID |Faiss HNSW recall@100 |Faiss HNSW byte recall@100 |% Reduction in recall |Faiss HNSW memory usage (GB) |Faiss HNSW byte memory usage (GB) |% Reduction in memory |Faiss HNSW mean indexing throughput (docs/sec) |Faiss HNSW byte mean indexing throughput (docs/sec) |% Gain in indexing throughput |
|--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
|**Dataset 1** |0.91 |0.89 |2.20 |3.72 |1.04 |72.00 |4673 |9686 |107.28 |
|**Dataset 2** |0.91 |0.83 |8.79 |30.03 |8.57 |71.46 |4911 |10207 |107.84 |
Expand Down Expand Up @@ -148,22 +148,22 @@ The following table lists the dataset configuration for benchmarking tests.

### Key findings

When comparing the benchmarking results, here are the key findings:
The following are the key findings derived from comparing the benchmarking results:

- **Memory savings**: Byte vectors reduced memory usage by up to **72%**, with higher-dimensional vectors achieving greater reductions.
- **Indexing performance**: The mean indexing throughput for byte vectors was **2x to 107.84%** higher than for float vectors, especially with larger vector dimensions.
- **Search performance**: Search latencies were similar, with byte vectors occasionally performing better.
- **Recall**: For byte vectors, there was a slight (up to **8.8%**) reduction in recall compared to float vectors, depending on the dataset and the quantization technique used.
- **Recall**: For byte vectors, there was a slight (up to **8.8%**) reduction in recall as compared to float vectors, depending on the dataset and the quantization technique used.

## How does Faiss work with byte vectors internally?

Faiss doesn't directly support the `byte` data type for storing vectors. To achieve this, OpenSearch uses a [`QT_8bit_direct_signed` scalar quantizer](https://faiss.ai/cpp_api/struct/structfaiss_1_1ScalarQuantizer.html). This quantizer accepts float vectors within the signed 8-bit value range and encodes them as unsigned 8-bit integer vectors. During indexing and search, these encoded unsigned 8-bit integer vectors are decoded back into signed 8-bit original vectors for distance computation.
Faiss doesn't directly support the `byte` data type for vector storage. To achieve this, OpenSearch uses a [`QT_8bit_direct_signed` scalar quantizer](https://faiss.ai/cpp_api/struct/structfaiss_1_1ScalarQuantizer.html). This quantizer accepts float vectors within the signed 8-bit value range and encodes them as unsigned 8-bit integer vectors. During indexing and search, these encoded unsigned 8-bit integer vectors are decoded back into signed 8-bit original vectors for distance computation.

This quantization approach reduces the memory footprint by a factor of four. However, encoding and decoding during scalar quantization introduce additional latency. To mitigate this, you can use [SIMD optimization](https://opensearch.org/docs/latest/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine) with the `QT_8bit_direct_signed` quantizer to reduce search latencies and improve indexing throughput.

### Example

The following example shows how an input vector is encoded and decoded using `QT_8bit_direct_signed` scalar quantizer:
The following example shows how an input vector is encoded and decoded using the `QT_8bit_direct_signed` scalar quantizer:

```c
// Input vector:
Expand All @@ -178,11 +178,11 @@ The following example shows how an input vector is encoded and decoded using `QT

## Conclusion

OpenSearch 2.17 introduced support for Faiss byte vectors, enabling you to store quantized byte vector embeddings efficiently. This reduces memory consumption by up to 75%, lowers costs, and maintains high performance. These advantages make byte vectors an excellent choice for large-scale similarity search applications, especially where memory resources are limited, and applications that handle large volumes of data within the signed byte value range.
OpenSearch 2.17 introduced support for Faiss byte vectors, allowing you to efficiently store quantized byte vector embeddings. This reduces memory consumption by up to 75%, lowers costs, and maintains high performance. These advantages make byte vectors an excellent choice for large-scale similarity search applications, especially when memory resources are limited, and applications that handle large volumes of data within the signed byte value range.

## Future enhancements

In future versions, we plan to enhance this feature by adding an `on_disk` mode with a `4x` compression level in Faiss. This mode will accept `fp32` vectors as input, perform online training, and quantize the data into byte-sized vectors, eliminating the need for performing external quantization.
In future versions, we plan to enhance this feature by adding an `on_disk` mode with a `4x` Faiss compression level. This mode will accept `fp32` vectors as input, perform online training, and quantize the data into byte-sized vectors, eliminating the need to perform external quantization.

## References

Expand Down

0 comments on commit 9b6853d

Please sign in to comment.