Skip to content

Commit

Permalink
[DOCS] Add Elastic Rerank usage docs (elastic#117625)
Browse files Browse the repository at this point in the history
  • Loading branch information
leemthompo committed Nov 28, 2024
1 parent 10fdfbb commit a69d4ef
Show file tree
Hide file tree
Showing 3 changed files with 121 additions and 23 deletions.
41 changes: 34 additions & 7 deletions docs/reference/inference/service-elasticsearch.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -69,15 +69,15 @@ include::inference-shared.asciidoc[tag=service-settings]
These settings are specific to the `elasticsearch` service.
--

`adaptive_allocations`:::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]

`deployment_id`:::
(Optional, string)
The `deployment_id` of an existing trained model deployment.
When `deployment_id` is used the `model_id` is optional.

`adaptive_allocations`:::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]

`enabled`::::
(Optional, Boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
Expand Down Expand Up @@ -119,7 +119,6 @@ include::inference-shared.asciidoc[tag=task-settings]
Returns the document instead of only the index. Defaults to `true`.
=====


[discrete]
[[inference-example-elasticsearch-elser]]
==== ELSER via the `elasticsearch` service
Expand All @@ -137,7 +136,7 @@ PUT _inference/sparse_embedding/my-elser-model
"adaptive_allocations": { <1>
"enabled": true,
"min_number_of_allocations": 1,
"max_number_of_allocations": 10
"max_number_of_allocations": 4
},
"num_threads": 1,
"model_id": ".elser_model_2" <2>
Expand All @@ -150,6 +149,34 @@ PUT _inference/sparse_embedding/my-elser-model
Valid values are `.elser_model_2` and `.elser_model_2_linux-x86_64`.
For further details, refer to the {ml-docs}/ml-nlp-elser.html[ELSER model documentation].

[discrete]
[[inference-example-elastic-reranker]]
==== Elastic Rerank via the `elasticsearch` service

The following example shows how to create an {infer} endpoint called `my-elastic-rerank` to perform a `rerank` task type using the built-in Elastic Rerank cross-encoder model.

The API request below will automatically download the Elastic Rerank model if it isn't already downloaded and then deploy the model.
Once deployed, the model can be used for semantic re-ranking with a <<text-similarity-reranker-retriever-example-elastic-rerank,`text_similarity_reranker` retriever>>.

[source,console]
------------------------------------------------------------
PUT _inference/rerank/my-elastic-rerank
{
"service": "elasticsearch",
"service_settings": {
"model_id": ".rerank-v1", <1>
"num_threads": 1,
"adaptive_allocations": { <2>
"enabled": true,
"min_number_of_allocations": 1,
"max_number_of_allocations": 4
}
}
}
------------------------------------------------------------
// TEST[skip:TBD]
<1> The `model_id` must be the ID of the built-in Elastic Rerank model: `.rerank-v1`.
<2> {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[Adaptive allocations] will be enabled with the minimum of 1 and the maximum of 10 allocations.

[discrete]
[[inference-example-elasticsearch]]
Expand Down Expand Up @@ -186,7 +213,7 @@ If using the Python client, you can set the `timeout` parameter to a higher valu

[discrete]
[[inference-example-eland]]
==== Models uploaded by Eland via the elasticsearch service
==== Models uploaded by Eland via the `elasticsearch` service

The following example shows how to create an {infer} endpoint called
`my-msmarco-minilm-model` to perform a `text_embedding` task type.
Expand Down
20 changes: 11 additions & 9 deletions docs/reference/reranking/semantic-reranking.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -85,14 +85,16 @@ In {es}, semantic re-rankers are implemented using the {es} <<inference-apis,Inf

To use semantic re-ranking in {es}, you need to:

. *Choose a re-ranking model*.
Currently you can:

** Integrate directly with the <<infer-service-cohere,Cohere Rerank inference endpoint>> using the `rerank` task type
** Integrate directly with the <<infer-service-google-vertex-ai,Google Vertex AI inference endpoint>> using the `rerank` task type
** Upload a model to {es} from Hugging Face with {eland-docs}/machine-learning.html#ml-nlp-pytorch[Eland]. You'll need to use the `text_similarity` NLP task type when loading the model using Eland. Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third party text similarity models supported by {es} for semantic re-ranking.
*** Then set up an <<inference-example-eland,{es} service inference endpoint>> with the `rerank` task type
. *Create a `rerank` task using the <<put-inference-api,{es} Inference API>>*.
. *Select and configure a re-ranking model*.
You have the following options:
.. Use the <<inference-example-elastic-reranker,Elastic Rerank>> cross-encoder model via the inference API's {es} service.
.. Use the <<infer-service-cohere,Cohere Rerank inference endpoint>> to create a `rerank` endpoint.
.. Use the <<infer-service-google-vertex-ai,Google Vertex AI inference endpoint>> to create a `rerank` endpoint.
.. Upload a model to {es} from Hugging Face with {eland-docs}/machine-learning.html#ml-nlp-pytorch[Eland]. You'll need to use the `text_similarity` NLP task type when loading the model using Eland. Then set up an <<inference-example-eland,{es} service inference endpoint>> with the `rerank` endpoint type.
+
Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third party text similarity models supported by {es} for semantic re-ranking.

. *Create a `rerank` endpoint using the <<put-inference-api,{es} Inference API>>*.
The Inference API creates an inference endpoint and configures your chosen machine learning model to perform the re-ranking task.
. *Define a `text_similarity_reranker` retriever in your search request*.
The retriever syntax makes it simple to configure both the retrieval and re-ranking of search results in a single API call.
Expand All @@ -117,7 +119,7 @@ POST _search
}
},
"field": "text",
"inference_id": "my-cohere-rerank-model",
"inference_id": "elastic-rerank",
"inference_text": "How often does the moon hide the sun?",
"rank_window_size": 100,
"min_score": 0.5
Expand Down
83 changes: 76 additions & 7 deletions docs/reference/search/retriever.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ This allows for complex behavior to be depicted in a tree-like structure, called
[TIP]
====
Refer to <<retrievers-overview>> for a high level overview of the retrievers abstraction.
Refer to <<retrievers-examples, Retrievers examples>> for additional examples.
====

The following retrievers are available:
Expand Down Expand Up @@ -382,16 +383,17 @@ Refer to <<semantic-reranking>> for a high level overview of semantic re-ranking

===== Prerequisites

To use `text_similarity_reranker` you must first set up a `rerank` task using the <<put-inference-api, Create {infer} API>>.
The `rerank` task should be set up with a machine learning model that can compute text similarity.
To use `text_similarity_reranker` you must first set up an inference endpoint for the `rerank` task using the <<put-inference-api, Create {infer} API>>.
The endpoint should be set up with a machine learning model that can compute text similarity.
Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third-party text similarity models supported by {es}.

Currently you can:
You have the following options:

* Integrate directly with the <<infer-service-cohere,Cohere Rerank inference endpoint>> using the `rerank` task type
* Integrate directly with the <<infer-service-google-vertex-ai,Google Vertex AI inference endpoint>> using the `rerank` task type
* Use the the built-in <<inference-example-elastic-reranker,Elastic Rerank>> cross-encoder model via the inference API's {es} service.
* Use the <<infer-service-cohere,Cohere Rerank inference endpoint>> with the `rerank` task type.
* Use the <<infer-service-google-vertex-ai,Google Vertex AI inference endpoint>> with the `rerank` task type.
* Upload a model to {es} with {eland-docs}/machine-learning.html#ml-nlp-pytorch[Eland] using the `text_similarity` NLP task type.
** Then set up an <<inference-example-eland,{es} service inference endpoint>> with the `rerank` task type
** Then set up an <<inference-example-eland,{es} service inference endpoint>> with the `rerank` task type.
** Refer to the <<text-similarity-reranker-retriever-example-eland,example>> on this page for a step-by-step guide.

===== Parameters
Expand Down Expand Up @@ -436,13 +438,70 @@ Note that score calculations vary depending on the model used.
Applies the specified <<query-dsl-bool-query, boolean query filter>> to the child <<retriever, retriever>>.
If the child retriever already specifies any filters, then this top-level filter is applied in conjuction with the filter defined in the child retriever.

[discrete]
[[text-similarity-reranker-retriever-example-elastic-rerank]]
==== Example: Elastic Rerank

This examples demonstrates how to deploy the Elastic Rerank model and use it to re-rank search results using the `text_similarity_reranker` retriever.

Follow these steps:

. Create an inference endpoint for the `rerank` task using the <<put-inference-api, Create {infer} API>>.
+
[source,console]
----
PUT _inference/rerank/my-elastic-rerank
{
"service": "elasticsearch",
"service_settings": {
"model_id": ".rerank-v1",
"num_threads": 1,
"adaptive_allocations": { <1>
"enabled": true,
"min_number_of_allocations": 1,
"max_number_of_allocations": 10
}
}
}
----
// TEST[skip:uses ML]
<1> {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[Adaptive allocations] will be enabled with the minimum of 1 and the maximum of 10 allocations.
+
. Define a `text_similarity_rerank` retriever:
+
[source,console]
----
POST _search
{
"retriever": {
"text_similarity_reranker": {
"retriever": {
"standard": {
"query": {
"match": {
"text": "How often does the moon hide the sun?"
}
}
}
},
"field": "text",
"inference_id": "my-elastic-rerank",
"inference_text": "How often does the moon hide the sun?",
"rank_window_size": 100,
"min_score": 0.5
}
}
}
----
// TEST[skip:uses ML]

[discrete]
[[text-similarity-reranker-retriever-example-cohere]]
==== Example: Cohere Rerank

This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API.
This approach eliminates the need to generate and store embeddings for all indexed documents.
This requires a <<infer-service-cohere,Cohere Rerank inference endpoint>> using the `rerank` task type.
This requires a <<infer-service-cohere,Cohere Rerank inference endpoint>> that is set up for the `rerank` task type.

[source,console]
----
Expand Down Expand Up @@ -680,6 +739,12 @@ GET movies/_search
<1> The `rule` retriever is the outermost retriever, applying rules to the search results that were previously reranked using the `rrf` retriever.
<2> The `rrf` retriever returns results from all of its sub-retrievers, and the output of the `rrf` retriever is used as input to the `rule` retriever.

[discrete]
[[retriever-common-parameters]]
=== Common usage guidelines

[discrete]
[[retriever-size-pagination]]
==== Using `from` and `size` with a retriever tree

The <<search-from-param, `from`>> and <<search-size-param, `size`>>
Expand All @@ -688,12 +753,16 @@ parameters are provided globally as part of the general
They are applied to all retrievers in a retriever tree, unless a specific retriever overrides the `size` parameter using a different parameter such as `rank_window_size`.
Though, the final search hits are always limited to `size`.

[discrete]
[[retriever-aggregations]]
==== Using aggregations with a retriever tree

<<search-aggregations, Aggregations>> are globally specified as part of a search request.
The query used for an aggregation is the combination of all leaf retrievers as `should`
clauses in a <<query-dsl-bool-query, boolean query>>.

[discrete]
[[retriever-restrictions]]
==== Restrictions on search parameters when specifying a retriever

When a retriever is specified as part of a search, the following elements are not allowed at the top-level.
Expand Down

0 comments on commit a69d4ef

Please sign in to comment.