elastic · JoanFM · Dec 16, 2024 · Dec 17, 2024 · Dec 17, 2024 · Dec 17, 2024
diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc
@@ -143,6 +143,7 @@ include::service-elser.asciidoc[]
 include::service-google-ai-studio.asciidoc[]
 include::service-google-vertex-ai.asciidoc[]
 include::service-hugging-face.asciidoc[]
+include::service-jinaai.asciidoc[]
 include::service-mistral.asciidoc[]
 include::service-openai.asciidoc[]
 include::service-watsonx-ai.asciidoc[]
diff --git a/docs/reference/inference/put-inference.asciidoc b/docs/reference/inference/put-inference.asciidoc
@@ -72,6 +72,7 @@ Click the links to review the configuration details of the services:
 * <<infer-service-mistral,Mistral>> (`text_embedding`)
 * <<infer-service-openai,OpenAI>> (`completion`, `text_embedding`)
 * <<infer-service-watsonx-ai>> (`text_embedding`)
+* <<infer-service-jinaai,JinaAI>> (`text_embedding`, `rerank`)
 
 The {es} and ELSER services run on a {ml} node in your {es} cluster. The rest of
 the services connect to external providers.
@@ -87,4 +88,4 @@ When adaptive allocations are enabled:
 - The number of allocations scales up automatically when the load increases.
 - Allocations scale down to a minimum of 0 when the load decreases, saving resources.
 
-For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
+For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
diff --git a/docs/reference/inference/service-jinaai.asciidoc b/docs/reference/inference/service-jinaai.asciidoc
@@ -0,0 +1,253 @@
+[[infer-service-jinaai]]
+=== JinaAI {infer} service
+
+Creates an {infer} endpoint to perform an {infer} task with the `jinaai` service.
+
+
+[discrete]
+[[infer-service-jinaai-api-request]]
+==== {api-request-title}
+
+`PUT /_inference/<task_type>/<inference_id>`
+
+[discrete]
+[[infer-service-jinaai-api-path-params]]
+==== {api-path-parms-title}
+
+`<inference_id>`::
+(Required, string)
+include::inference-shared.asciidoc[tag=inference-id]
+
+`<task_type>`::
+(Required, string)
+include::inference-shared.asciidoc[tag=task-type]
++
+--
+Available task types:
+
+* `text_embedding`,
+* `rerank`.
+--
+
+[discrete]
+[[infer-service-jinaai-api-request-body]]
+==== {api-request-body-title}
+
+`chunking_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=chunking-settings]
+
+`max_chunking_size`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]
+
+`overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-overlap]
+
+`sentence_overlap`:::
+(Optional, integer)
+include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]
+
+`strategy`:::
+(Optional, string)
+include::inference-shared.asciidoc[tag=chunking-settings-strategy]
+
+`service`::
+(Required, string)
+The type of service supported for the specified task type. In this case, 
+`jinaai`.
+
+`service_settings`::
+(Required, object)
+include::inference-shared.asciidoc[tag=service-settings]
++
+--
+These settings are specific to the `jinaai` service.
+--
+
+`api_key`:::
+(Required, string)
+A valid API key of your JinaAI account.
+You can find in:
+https://jina.ai/embeddings/.
++
+--
+include::inference-shared.asciidoc[tag=api-key-admonition]
+--
+
+`rate_limit`:::
+(Optional, object)
+By default, the `jinaai` service sets the number of requests allowed per minute to `2000`.
+This value is the same for all task types.
+To modify this, set the `requests_per_minute` setting of this object in your service settings:
++
+--
+include::inference-shared.asciidoc[tag=request-per-minute-example]
+
+More information about JinaAI's rate limits can be found in https://jina.ai/contact-sales/#rate-limit.
+--
++
+.`service_settings` for the `rerank` task type
+[%collapsible%closed]
+=====
+`model_id`::
+(Optional, string)
+The name of the model to use for the {infer} task.
+To review the available `rerank` models, refer to the
+https://jina.ai/reranker.
+=====
++
+.`service_settings` for the `text_embedding` task type
+[%collapsible%closed]
+=====
+`model_id`:::
+(Optional, string)
+The name of the model to use for the {infer} task.
+To review the available `text_embedding` models, refer to the
+https://jina.ai/embeddings/.
+
+`similarity`:::
+(Optional, string)
+Similarity measure. One of `cosine`, `dot_product`, `l2_norm`.
+Defaults based on the `embedding_type` (`float` -> `dot_product`, `int8/byte` -> `cosine`).
+=====
+
+
+
+`task_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=task-settings]
++
+.`task_settings` for the `rerank` task type
+[%collapsible%closed]
+=====
+`return_documents`::
+(Optional, boolean)
+Specify whether to return doc text within the results.
+
+`top_n`::
+(Optional, integer)
+The number of most relevant documents to return, defaults to the number of the documents.
+If this {infer} endpoint is used in a `text_similarity_reranker` retriever query and `top_n` is set, it must be greater than or equal to `rank_window_size` in the query.
+=====
++
+.`task_settings` for the `text_embedding` task type
+[%collapsible%closed]
+=====
+`task`:::
+(Optional, string)
+Specifies the task passed to the model.
+Valid values are:
+* `classification`: use it for embeddings passed through a text classifier.
+* `clustering`: use it for the embeddings run through a clustering algorithm.
+* `ingest`: use it for storing document embeddings in a vector database.
+* `search`: use it for storing embeddings of search queries run against a vector database to find relevant documents.
+=====
+
+
+[discrete]
+[[inference-example-jinaai]]
+==== JinaAI service examples
+
+The following example shows how to create {infer} endpoints to get `text_embeddings` and `rerank` and to use them in a search application.
+
+First, we create the `embeddings` service:
+
+[source,console]
+------------------------------------------------------------
+PUT _inference/text_embedding/jinaai-embeddings
+{
+    "service": "jinaai",
+    "service_settings": {
+        "model_id": "jina-embeddings-v3",
+        "api_key": "<api_key>",
+    },
+    "task_settings": {}
+}
+------------------------------------------------------------
+
+Then, we create the `rerank` service:
+[source,console]
+------------------------------------------------------------
+PUT _inference/rerank/jinaai-rerank
+{
+    "service": "jinaai",
+    "service_settings": {
+        "api_key": "<API-KEY>",
+        "model_id": "jina-reranker-v2-base-multilingual"
+    },
+    "task_settings": {
+        "top_n": 10,
+        "return_documents": true
+    }
+}
+------------------------------------------------------------
+
+Now we can create an index that will use `jinaai-embeddings` service to index the documents.
+
+[source,console]
+------------------------------------------------------------
+PUT jinaai-index
+{
+  "mappings": {
+    "properties": {
+      "content": {
+        "type": "semantic_text",
+        "inference_id": "jinaai-embeddings"
+      }
+    }
+  }
+}
+------------------------------------------------------------
+
+[source,console]
+------------------------------------------------------------
+PUT jinaai-index/_bulk
+{ "index" : { "_index" : "jinaai-index", "_id" : "1" } }
+{"content": "Sarah Johnson is a talented marine biologist working at the Oceanographic Institute. Her groundbreaking research on coral reef ecosystems has garnered international attention and numerous accolades."}
+{ "index" : { "_index" : "jinaai-index", "_id" : "2" } }
+{"content": "She spends months at a time diving in remote locations, meticulously documenting the intricate relationships between various marine species. "}
+{ "index" : { "_index" : "jinaai-index", "_id" : "3" } }
+{"content": "Her dedication to preserving these delicate underwater environments has inspired a new generation of conservationists."}
+------------------------------------------------------------
+
+Now, with the index created, we can search with and without the reranker service.
+
+[source,console]
+------------------------------------------------------------
+GET jinaai-index/_search 
+{
+  "query": {
+    "semantic": {
+      "field": "content",
+      "query": "who inspired taking care of the sea?"
+    }
+  }
+}
+------------------------------------------------------------
+
+[source,console]
+------------------------------------------------------------
+POST jinaai-index/_search
+{
+  "retriever": {
+    "text_similarity_reranker": {
+      "retriever": {
+        "standard": {
+          "query": {
+            "semantic": {
+              "field": "content",
+              "query": "who inspired taking care of the sea?"
+            }
+          }
+        }
+      },
+      "field": "content",
+      "rank_window_size": 100,
+      "inference_id": "jinaai-rerank",
+      "inference_text": "who inspired taking care of the sea?"
+    }
+  }
+}
+------------------------------------------------------------