diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc index 8d5ee1b7d6ba5..97dac0f844487 100644 --- a/docs/reference/inference/inference-apis.asciidoc +++ b/docs/reference/inference/inference-apis.asciidoc @@ -143,6 +143,7 @@ include::service-elser.asciidoc[] include::service-google-ai-studio.asciidoc[] include::service-google-vertex-ai.asciidoc[] include::service-hugging-face.asciidoc[] +include::service-jinaai.asciidoc[] include::service-mistral.asciidoc[] include::service-openai.asciidoc[] include::service-watsonx-ai.asciidoc[] diff --git a/docs/reference/inference/put-inference.asciidoc b/docs/reference/inference/put-inference.asciidoc index 4f82889f562d8..d6abe90e48956 100644 --- a/docs/reference/inference/put-inference.asciidoc +++ b/docs/reference/inference/put-inference.asciidoc @@ -72,6 +72,7 @@ Click the links to review the configuration details of the services: * <> (`text_embedding`) * <> (`completion`, `text_embedding`) * <> (`text_embedding`) +* <> (`text_embedding`, `rerank`) The {es} and ELSER services run on a {ml} node in your {es} cluster. The rest of the services connect to external providers. @@ -87,4 +88,4 @@ When adaptive allocations are enabled: - The number of allocations scales up automatically when the load increases. - Allocations scale down to a minimum of 0 when the load decreases, saving resources. -For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation. \ No newline at end of file +For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation. diff --git a/docs/reference/inference/service-jinaai.asciidoc b/docs/reference/inference/service-jinaai.asciidoc new file mode 100644 index 0000000000000..7c5aebe5bcf8e --- /dev/null +++ b/docs/reference/inference/service-jinaai.asciidoc @@ -0,0 +1,255 @@ +[[infer-service-jinaai]] +=== JinaAI {infer} service + +Creates an {infer} endpoint to perform an {infer} task with the `jinaai` service. + + +[discrete] +[[infer-service-jinaai-api-request]] +==== {api-request-title} + +`PUT /_inference//` + +[discrete] +[[infer-service-jinaai-api-path-params]] +==== {api-path-parms-title} + +``:: +(Required, string) +include::inference-shared.asciidoc[tag=inference-id] + +``:: +(Required, string) +include::inference-shared.asciidoc[tag=task-type] ++ +-- +Available task types: + +* `text_embedding`, +* `rerank`. +-- + +[discrete] +[[infer-service-jinaai-api-request-body]] +==== {api-request-body-title} + +`chunking_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=chunking-settings] + +`max_chunking_size`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size] + +`overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-overlap] + +`sentence_overlap`::: +(Optional, integer) +include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap] + +`strategy`::: +(Optional, string) +include::inference-shared.asciidoc[tag=chunking-settings-strategy] + +`service`:: +(Required, string) +The type of service supported for the specified task type. In this case, +`jinaai`. + +`service_settings`:: +(Required, object) +include::inference-shared.asciidoc[tag=service-settings] ++ +-- +These settings are specific to the `jinaai` service. +-- + +`api_key`::: +(Required, string) +A valid API key for your JinaAI account. +You can find it at https://jina.ai/embeddings/. ++ +-- +include::inference-shared.asciidoc[tag=api-key-admonition] +-- + +`rate_limit`::: +(Optional, object) +The default rate limit for the `jinaai` service is 2000 requests per minute for all task types. +You can modify this using the `requests_per_minute` setting in your service settings: ++ +-- +include::inference-shared.asciidoc[tag=request-per-minute-example] + +More information about JinaAI's rate limits can be found in https://jina.ai/contact-sales/#rate-limit. +-- ++ +.`service_settings` for the `rerank` task type +[%collapsible%closed] +===== +`model_id`:: +(Required, string) +The name of the model to use for the {infer} task. +To review the available `rerank` compatible models, refer to https://jina.ai/reranker. +===== ++ +.`service_settings` for the `text_embedding` task type +[%collapsible%closed] +===== +`model_id`::: +(Optional, string) +The name of the model to use for the {infer} task. +To review the available `text_embedding` models, refer to the +https://jina.ai/embeddings/. + +`similarity`::: +(Optional, string) +Similarity measure. One of `cosine`, `dot_product`, `l2_norm`. +Defaults based on the `embedding_type` (`float` -> `dot_product`, `int8/byte` -> `cosine`). +===== + + + +`task_settings`:: +(Optional, object) +include::inference-shared.asciidoc[tag=task-settings] ++ +.`task_settings` for the `rerank` task type +[%collapsible%closed] +===== +`return_documents`:: +(Optional, boolean) +Specify whether to return doc text within the results. + +`top_n`:: +(Optional, integer) +The number of most relevant documents to return, defaults to the number of the documents. +If this {infer} endpoint is used in a `text_similarity_reranker` retriever query and `top_n` is set, it must be greater than or equal to `rank_window_size` in the query. +===== ++ +.`task_settings` for the `text_embedding` task type +[%collapsible%closed] +===== +`task`::: +(Optional, string) +Specifies the task passed to the model. +Valid values are: +* `classification`: use it for embeddings passed through a text classifier. +* `clustering`: use it for the embeddings run through a clustering algorithm. +* `ingest`: use it for storing document embeddings in a vector database. +* `search`: use it for storing embeddings of search queries run against a vector database to find relevant documents. +===== + + +[discrete] +[[inference-example-jinaai]] +==== JinaAI service examples + +The following examples demonstrate how to create {infer} endpoints for `text_embeddings` and `rerank` tasks using the JinaAI service and use them in search requests. + +First, we create the `embeddings` service: + +[source,console] +------------------------------------------------------------ +PUT _inference/text_embedding/jinaai-embeddings +{ + "service": "jinaai", + "service_settings": { + "model_id": "jina-embeddings-v3", + "api_key": "" + } +} +------------------------------------------------------------ +// TEST[skip:uses ML] + +Then, we create the `rerank` service: +[source,console] +------------------------------------------------------------ +PUT _inference/rerank/jinaai-rerank +{ + "service": "jinaai", + "service_settings": { + "api_key": "", + "model_id": "jina-reranker-v2-base-multilingual" + }, + "task_settings": { + "top_n": 10, + "return_documents": true + } +} +------------------------------------------------------------ +// TEST[skip:uses ML] + +Now we can create an index that will use `jinaai-embeddings` service to index the documents. + +[source,console] +------------------------------------------------------------ +PUT jinaai-index +{ + "mappings": { + "properties": { + "content": { + "type": "semantic_text", + "inference_id": "jinaai-embeddings" + } + } + } +} +------------------------------------------------------------ +// TEST[skip:uses ML] + +[source,console] +------------------------------------------------------------ +PUT jinaai-index/_bulk +{ "index" : { "_index" : "jinaai-index", "_id" : "1" } } +{"content": "Sarah Johnson is a talented marine biologist working at the Oceanographic Institute. Her groundbreaking research on coral reef ecosystems has garnered international attention and numerous accolades."} +{ "index" : { "_index" : "jinaai-index", "_id" : "2" } } +{"content": "She spends months at a time diving in remote locations, meticulously documenting the intricate relationships between various marine species. "} +{ "index" : { "_index" : "jinaai-index", "_id" : "3" } } +{"content": "Her dedication to preserving these delicate underwater environments has inspired a new generation of conservationists."} +------------------------------------------------------------ +// TEST[skip:uses ML] + +Now, with the index created, we can search with and without the reranker service. + +[source,console] +------------------------------------------------------------ +GET jinaai-index/_search +{ + "query": { + "semantic": { + "field": "content", + "query": "who inspired taking care of the sea?" + } + } +} +------------------------------------------------------------ +// TEST[skip:uses ML] + +[source,console] +------------------------------------------------------------ +POST jinaai-index/_search +{ + "retriever": { + "text_similarity_reranker": { + "retriever": { + "standard": { + "query": { + "semantic": { + "field": "content", + "query": "who inspired taking care of the sea?" + } + } + } + }, + "field": "content", + "rank_window_size": 100, + "inference_id": "jinaai-rerank", + "inference_text": "who inspired taking care of the sea?" + } + } +} +------------------------------------------------------------ +// TEST[skip:uses ML] \ No newline at end of file