diff --git a/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md b/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md
index 9014c585c8..b78d3d641e 100644
--- a/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md
+++ b/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md
@@ -20,13 +20,13 @@ The `NeuralSparseSearchTool` performs sparse vector retrieval. For more informat
OpenSearch supports several pretrained sparse encoding models. You can either use one of those models or your own custom model. For a list of supported pretrained models, see [Sparse encoding models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models). For more information, see [OpenSearch-provided pretrained models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/) and [Custom local models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/).
-In this example, you'll use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` pretrained model for both ingestion and search. To register and deploy the model to OpenSearch, send the following request:
+In this example, you'll use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` pretrained model for both ingestion and search. To register the model and deploy it to OpenSearch, send the following request:
```json
POST /_plugins/_ml/models/_register?deploy=true
{
- "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v1",
- "version": "1.0.1",
+ "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill",
+ "version": "1.0.0",
"model_format": "TORCH_SCRIPT"
}
```
diff --git a/_ml-commons-plugin/api/model-apis/register-model.md b/_ml-commons-plugin/api/model-apis/register-model.md
index 2a0e9706e9..7d8f6d8cc6 100644
--- a/_ml-commons-plugin/api/model-apis/register-model.md
+++ b/_ml-commons-plugin/api/model-apis/register-model.md
@@ -95,8 +95,8 @@ Field | Data type | Required/Optional | Description
```json
POST /_plugins/_ml/models/_register
{
- "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1",
- "version": "1.0.1",
+ "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill",
+ "version": "1.0.0",
"model_group_id": "Z1eQf4oB5Vm0Tdw8EIP2",
"model_format": "TORCH_SCRIPT"
}
diff --git a/_ml-commons-plugin/pretrained-models.md b/_ml-commons-plugin/pretrained-models.md
index 154b8b530f..1b0c726c33 100644
--- a/_ml-commons-plugin/pretrained-models.md
+++ b/_ml-commons-plugin/pretrained-models.md
@@ -48,8 +48,8 @@ Sparse encoding models transfer text into a sparse vector and convert the vector
We recommend the following combinations for optimal performance:
-- Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model during both ingestion and search.
-- Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model during ingestion and the
+- Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` model during both ingestion and search.
+- Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill` model during ingestion and the
`amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer during search.
For more information about the preceding options for running neural sparse search, see [Generating sparse vector embeddings within OpenSearch]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/).
@@ -58,8 +58,11 @@ The following table provides a list of sparse encoding models and artifact links
| Model name | Version | Auto-truncation | TorchScript artifact | Description |
|:---|:---|:---|:---|:---|
-| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indexes of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [HuggingFace documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1). |
-| `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-doc-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indexes of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [HuggingFace documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v1). |
+| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indices of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [Hugging Face documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1). |
+| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` | 1.0.0 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill/1.0.0/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v2-distill-1.0.0-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill/1.0.0/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indices of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [Hugging Face documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v2-distill). |
+| `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-doc-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1/1.0.1/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indices of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [Hugging Face documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v1). |
+| `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill` | 1.0.0 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill/1.0.0/torch_script/neural-sparse_opensearch-neural-sparse-encoding-doc-v2-distill-1.0.0-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill/1.0.0/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indices of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [Hugging Face documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill). |
+| `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-mini` | 1.0.0 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-mini/1.0.0/torch_script/neural-sparse_opensearch-neural-sparse-encoding-doc-v2-mini-1.0.0-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-mini/1.0.0/torch_script/config.json) | A neural sparse encoding model. The model transforms text into a sparse vector, identifies the indices of non-zero elements in the vector, and then converts the vector into `` pairs, where each entry corresponds to a non-zero element index. To experiment with this model using transformers and the PyTorch API, see the [Hugging Face documentation](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini). |
| `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 1.0.1 | Yes | - [model_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-tokenizer-v1-1.0.1-torch_script.zip)
- [config_url](https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1/1.0.1/torch_script/config.json) | A neural sparse tokenizer. The tokenizer splits text into tokens and assigns each token a predefined weight, which is the token's inverse document frequency (IDF). If the IDF file is not provided, the weight defaults to 1. For more information, see [Preparing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/#preparing-a-model). |
### Cross-encoder models
diff --git a/_search-plugins/neural-sparse-with-pipelines.md b/_search-plugins/neural-sparse-with-pipelines.md
index fea2f0d795..ef7044494a 100644
--- a/_search-plugins/neural-sparse-with-pipelines.md
+++ b/_search-plugins/neural-sparse-with-pipelines.md
@@ -16,9 +16,9 @@ At ingestion time, neural sparse search uses a sparse encoding model to generate
At query time, neural sparse search operates in one of two search modes:
-- **Bi-encoder mode** (requires a sparse encoding model): A sparse encoding model generates sparse vector embeddings from query text. This approach provides better search relevance at the cost of a slight increase in latency.
+- **Bi-encoder mode** (requires a sparse encoding model): A sparse encoding model generates sparse vector embeddings from both documents and query text. This approach provides better search relevance at the cost of an increase in latency.
-- **Doc-only mode** (requires a sparse encoding model and a tokenizer): A sparse encoding model generates sparse vector embeddings from query text. In this mode, neural sparse search tokenizes query text using a tokenizer and obtains the token weights from a lookup table. This approach provides faster retrieval at the cost of a slight decrease in search relevance. The tokenizer is deployed and invoked using the [Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/index/) for a uniform neural sparse search experience.
+- **Doc-only mode** (requires a sparse encoding model and a tokenizer): A sparse encoding model generates sparse vector embeddings from documents. In this mode, neural sparse search tokenizes query text using a tokenizer and obtains the token weights from a lookup table. This approach provides faster retrieval at the cost of a slight decrease in search relevance. The tokenizer is deployed and invoked using the [Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/index/) for a uniform neural sparse search experience.
For more information about choosing the neural sparse search mode that best suits your workload, see [Choose the search mode](#step-1a-choose-the-search-mode).
@@ -48,32 +48,35 @@ Both the bi-encoder and doc-only search modes require you to configure a sparse
Choose the search mode and the appropriate model/tokenizer combination:
-- **Bi-encoder**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model during both ingestion and search.
+- **Bi-encoder**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` model during both ingestion and search.
-- **Doc-only**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model during ingestion and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer during search.
+- **Doc-only**: Use the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill` model during ingestion and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer during search.
-The following table provides a search relevance comparison for the two search modes so that you can choose the best mode for your use case.
+The following table provides a search relevance comparison for all available combinations of the two search modes so that you can choose the best combination for your use case.
| Mode | Ingestion model | Search model | Avg search relevance on BEIR | Model parameters |
|-----------|---------------------------------------------------------------|---------------------------------------------------------------|------------------------------|------------------|
| Doc-only | `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` | `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 0.49 | 133M |
+| Doc-only | `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill` | `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 0.504 | 67M |
+| Doc-only | `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-mini` | `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` | 0.497 | 23M |
| Bi-encoder| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` | 0.524 | 133M |
+| Bi-encoder| `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` | `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` | 0.528 | 67M |
-### Step 1(b): Register the model/tokenizer
+### Step 1(b): Register the model/tokenizer
When you register a model/tokenizer, OpenSearch creates a model group for the model/tokenizer. You can also explicitly create a model group before registering models. For more information, see [Model access control]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control/).
#### Bi-encoder mode
-When using bi-encoder mode, you only need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v1` model.
+When using bi-encoder mode, you only need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill` model.
Register the sparse encoding model:
```json
POST /_plugins/_ml/models/_register?deploy=true
{
- "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v1",
- "version": "1.0.1",
+ "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill",
+ "version": "1.0.0",
"model_format": "TORCH_SCRIPT"
}
```
@@ -116,15 +119,15 @@ Note the `model_id` of the model you've created; you'll need it for the followin
#### Doc-only mode
-When using doc-only mode, you need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1` model, which you'll use at ingestion time, and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer, which you'll use at search time.
+When using doc-only mode, you need to register the `amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill` model, which you'll use at ingestion time, and the `amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1` tokenizer, which you'll use at search time.
Register the sparse encoding model:
```json
POST /_plugins/_ml/models/_register?deploy=true
{
- "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1",
- "version": "1.0.1",
+ "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill",
+ "version": "1.0.0",
"model_format": "TORCH_SCRIPT"
}
```
@@ -276,7 +279,7 @@ PUT /my-nlp-index
"default_pipeline": "nlp-ingest-pipeline-sparse"
},
"mappings": {
- "_source": {
+ "_source": {
"excludes": [
"passage_embedding"
]
@@ -421,6 +424,28 @@ The response contains the matching documents:
}
```
+To minimize disk and network I/O latency related to sparse embedding sources, you can exclude the embedding vector source from the query as follows:
+
+```json
+GET my-nlp-index/_search
+{
+ "_source": {
+ "excludes": [
+ "passage_embedding"
+ ]
+ },
+ "query": {
+ "neural_sparse": {
+ "passage_embedding": {
+ "query_text": "Hi world",
+ "model_id": ""
+ }
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
## Accelerating neural sparse search
To learn more about improving retrieval time for neural sparse search, see [Accelerating neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/#accelerating-neural-sparse-search).