Skip to content

Commit

Permalink
Add doc for neural-sparse-query-two-phase-processor. (#7306)
Browse files Browse the repository at this point in the history
* Add doc for neural-sparse-query-two-phase-processor.

Signed-off-by: conggguan <[email protected]>

* Make some edits for the comments.

Signed-off-by: conggguan <[email protected]>

* Fix some typo and style-job.

Signed-off-by: conggguan <[email protected]>

* Update neural-sparse-query-two-phase-processor.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md

Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: conggguan <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
  • Loading branch information
3 people authored Jun 14, 2024
1 parent 5e91768 commit 42e3fba
Show file tree
Hide file tree
Showing 2 changed files with 183 additions and 0 deletions.
33 changes: 33 additions & 0 deletions _search-plugins/neural-sparse-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ To use neural sparse search, follow these steps:
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
1. [Search the index using neural search](#step-4-search-the-index-using-neural-sparse-search).
1. _Optional_ [Create and enable the two-phase processor](#step-5-create-and-enable-the-two-phase-processor-optional).

## Step 1: Create an ingest pipeline

Expand Down Expand Up @@ -262,6 +263,38 @@ GET my-nlp-index/_search
}
}
```
## Step 5: Create and enable the two-phase processor (Optional)


The `neural_sparse_two_phase_processor` is a new feature introduced in OpenSearch 2.15. Using the two-phase processor can significantly improve the performance of neural sparse queries.

To quickly launch a search pipeline with neural sparse search, use the following example pipeline:

```json
PUT /_search/pipeline/two_phase_search_pipeline
{
"request_processors": [
{
"neural_sparse_two_phase_processor": {
"tag": "neural-sparse",
"description": "This processor is making two-phase processor."
}
}
]
}
```
{% include copy-curl.html %}

Then choose the index you want to configure with the search pipeline and set the `index.search.default_pipeline` to the pipeline name, as shown in the following example:
```json
PUT /index-name/_settings
{
"index.search.default_pipeline" : "two_phase_search_pipeline"
}
```
{% include copy-curl.html %}



## Setting a default model on an index or field

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
layout: default
title: Neural spare query two-phase processor
nav_order: 13
parent: Search processors
grand_parent: Search pipelines
---

# Neural sparse query two-phase processor
Introduced 2.15
{: .label .label-purple }

The `neural_sparse_two_phase_processor` search processor is designed to provide faster search pipelines for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by dividing the original method of scoring all documents with all tokens into two steps:

1. High-weight tokens score the documents and filter out the top documents.
2. Low-weight tokens rescore the top documents.

## Request fields

The following table lists all available request fields.

Field | Data type | Description
:--- | :--- | :---
`enabled` | Boolean | Controls whether the two-phase processor is enabled. Default is `true`.
`two_phase_parameter` | Object | A map of key-value pairs representing the two-phase parameters and their associated values. You can specify the value of `prune_ratio`, `expansion_rate`, `max_window_size`, or any combination of these three parameters. Optional.
`two_phase_parameter.prune_ratio` | Float | A ratio that represents how to split the high-weight tokens and low-weight tokens. The threshold is the token's maximum score multiplied by its `prune_ratio`. Valid range is [0,1]. Default is `0.4`
`two_phase_parameter.expansion_rate` | Float | The rate at which documents will be fine-tuned during the second phase. The second-phase document number equals the query size (default is 10) multiplied by its expansion rate. Valid range is greater than 1.0. Default is `5.0`
`two_phase_parameter.max_window_size` | Int | The maximum number of documents that can be processed using the two-phase processor. Valid range is greater than 50. Default is `10000`.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.

## Example

The following example creates a search pipeline with a `neural_sparse_two_phase_processor` search request processor.

### Create search pipeline

The following example request creates a search pipeline with a `neural_sparse_two_phase_processor` search request processor. The processor sets a custom model ID at the index level and provides different default model IDs for two specific index fields:

```json
PUT /_search/pipeline/two_phase_search_pipeline
{
"request_processors": [
{
"neural_sparse_two_phase_processor": {
"tag": "neural-sparse",
"description": "This processor is making two-phase processor.",
"enabled": true,
"two_phase_parameter": {
"prune_ratio": custom_prune_ratio,
"expansion_rate": custom_expansion_rate,
"max_window_size": custom_max_window_size
}
}
}
]
}
```
{% include copy-curl.html %}

### Set search pipeline

After the two-phase pipeline is created, set the `index.search.default_pipeline` setting to the name of the pipeline for the index on which you want to use the two-phase pipeline:

```json
PUT /index-name/_settings
{
"index.search.default_pipeline" : "two_phase_search_pipeline"
}
```
{% include copy-curl.html %}

## Limitation

The `neural_sparse_two_phase_processor` has the following limitations.

### Version support

The `neural_sparse_two_phase_processor` can only be used with OpenSearch 2.15 or later.

### Compound query support

As of OpenSearch 2.15, only the Boolean [compound query]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/) is supported.

Neural sparse queries and Boolean queries with a boost parameter (not boosting queries) are also supported.

## Examples

The following examples show neural sparse queries with the supported query types.

### Single neural sparse query

```
GET /my-nlp-index/_search
{
"query": {
"neural_sparse": {
"passage_embedding": {
"query_text": "Hi world"
"model_id": <model-id>
}
}
}
}
```
{% include copy-curl.html %}

### Neural sparse query nested in a Boolean query

```
GET /my-nlp-index/_search
{
"query": {
"bool": {
"should": [
{
"neural_sparse": {
"passage_embedding": {
"query_text": "Hi world",
"model_id": <model-id>
},
"boost": 2.0
}
}
]
}
}
}
```
{% include copy-curl.html %}

## P99 latency metrics
Using an OpenSearch cluster set up on three m5.4xlarge Amazon Elastic Compute Cloud (Amazon EC2) instances, OpenSearch conducts neural sparse query P99 latency tests on indexes corresponding to more than 10 datasets.

### Doc-only mode latency metric

In doc-only mode, the two-phase processor can significantly decrease query latency, as shown by the following latency metrics:

- Average latency without the two-phase processor: 53.56 ms
- Average latency with the two-phase processor: 38.61 ms

This results in an overall latency reduction of approximately 27.92%. Most indexes show a significant latency reduction when using the two-phase processor, with reductions ranging from 5.14 to 84.6%. The specific latency optimization values depend on the data distribution within the indexes.

### Bi-encoder mode latency metric

In bi-encoder mode, the two-phase processor can significantly decrease query latency, as shown by the following latency metrics:
- Average latency without the two-phase processor: 300.79 ms
- Average latency with the two-phase processor: 121.64 ms

This results in an overall latency reduction of approximately 59.56%. Most indexes show a significant latency reduction when using the two-phase processor, with reductions ranging from 1.56 to 82.84%. The specific latency optimization values depend on the data distribution within the indexes.

0 comments on commit 42e3fba

Please sign in to comment.