Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Add documentation for JinaAI service #118782

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions docs/reference/inference/inference-apis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ include::service-elser.asciidoc[]
include::service-google-ai-studio.asciidoc[]
include::service-google-vertex-ai.asciidoc[]
include::service-hugging-face.asciidoc[]
include::service-jinaai.asciidoc[]
include::service-mistral.asciidoc[]
include::service-openai.asciidoc[]
include::service-watsonx-ai.asciidoc[]
3 changes: 2 additions & 1 deletion docs/reference/inference/put-inference.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ Click the links to review the configuration details of the services:
* <<infer-service-mistral,Mistral>> (`text_embedding`)
* <<infer-service-openai,OpenAI>> (`completion`, `text_embedding`)
* <<infer-service-watsonx-ai>> (`text_embedding`)
* <<infer-service-jinaai,JinaAI>> (`text_embedding`, `rerank`)

The {es} and ELSER services run on a {ml} node in your {es} cluster. The rest of
the services connect to external providers.
Expand All @@ -87,4 +88,4 @@ When adaptive allocations are enabled:
- The number of allocations scales up automatically when the load increases.
- Allocations scale down to a minimum of 0 when the load decreases, saving resources.

For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
For more information about adaptive allocations and resources, refer to the {ml-docs}/ml-nlp-auto-scale.html[trained model autoscaling] documentation.
253 changes: 253 additions & 0 deletions docs/reference/inference/service-jinaai.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
[[infer-service-jinaai]]
=== JinaAI {infer} service

Creates an {infer} endpoint to perform an {infer} task with the `jinaai` service.


[discrete]
[[infer-service-jinaai-api-request]]
==== {api-request-title}

`PUT /_inference/<task_type>/<inference_id>`

[discrete]
[[infer-service-jinaai-api-path-params]]
==== {api-path-parms-title}

`<inference_id>`::
(Required, string)
include::inference-shared.asciidoc[tag=inference-id]

`<task_type>`::
(Required, string)
include::inference-shared.asciidoc[tag=task-type]
+
--
Available task types:

* `text_embedding`,
* `rerank`.
--

[discrete]
[[infer-service-jinaai-api-request-body]]
==== {api-request-body-title}

`chunking_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=chunking-settings]

`max_chunking_size`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-max-chunking-size]

`overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-overlap]

`sentence_overlap`:::
(Optional, integer)
include::inference-shared.asciidoc[tag=chunking-settings-sentence-overlap]

`strategy`:::
(Optional, string)
include::inference-shared.asciidoc[tag=chunking-settings-strategy]

`service`::
(Required, string)
The type of service supported for the specified task type. In this case,
`jinaai`.

`service_settings`::
(Required, object)
include::inference-shared.asciidoc[tag=service-settings]
+
--
These settings are specific to the `jinaai` service.
--

`api_key`:::
(Required, string)
A valid API key of your JinaAI account.
You can find in:
https://jina.ai/embeddings/.
JoanFM marked this conversation as resolved.
Show resolved Hide resolved
+
--
include::inference-shared.asciidoc[tag=api-key-admonition]
--

`rate_limit`:::
(Optional, object)
By default, the `jinaai` service sets the number of requests allowed per minute to `2000`.
This value is the same for all task types.
To modify this, set the `requests_per_minute` setting of this object in your service settings:
JoanFM marked this conversation as resolved.
Show resolved Hide resolved
+
--
include::inference-shared.asciidoc[tag=request-per-minute-example]

More information about JinaAI's rate limits can be found in https://jina.ai/contact-sales/#rate-limit.
--
+
.`service_settings` for the `rerank` task type
[%collapsible%closed]
=====
`model_id`::
(Optional, string)
JoanFM marked this conversation as resolved.
Show resolved Hide resolved
The name of the model to use for the {infer} task.
To review the available `rerank` models, refer to the
https://jina.ai/reranker.
JoanFM marked this conversation as resolved.
Show resolved Hide resolved
=====
+
.`service_settings` for the `text_embedding` task type
[%collapsible%closed]
=====
`model_id`:::
(Optional, string)
The name of the model to use for the {infer} task.
To review the available `text_embedding` models, refer to the
https://jina.ai/embeddings/.

`similarity`:::
(Optional, string)
Similarity measure. One of `cosine`, `dot_product`, `l2_norm`.
Defaults based on the `embedding_type` (`float` -> `dot_product`, `int8/byte` -> `cosine`).
=====



`task_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=task-settings]
+
.`task_settings` for the `rerank` task type
[%collapsible%closed]
=====
`return_documents`::
(Optional, boolean)
Specify whether to return doc text within the results.

`top_n`::
(Optional, integer)
The number of most relevant documents to return, defaults to the number of the documents.
If this {infer} endpoint is used in a `text_similarity_reranker` retriever query and `top_n` is set, it must be greater than or equal to `rank_window_size` in the query.
=====
+
.`task_settings` for the `text_embedding` task type
[%collapsible%closed]
=====
`task`:::
(Optional, string)
Specifies the task passed to the model.
Valid values are:
* `classification`: use it for embeddings passed through a text classifier.
* `clustering`: use it for the embeddings run through a clustering algorithm.
* `ingest`: use it for storing document embeddings in a vector database.
* `search`: use it for storing embeddings of search queries run against a vector database to find relevant documents.
=====


[discrete]
[[inference-example-jinaai]]
==== JinaAI service examples

The following example shows how to create {infer} endpoints to get `text_embeddings` and `rerank` and to use them in a search application.
JoanFM marked this conversation as resolved.
Show resolved Hide resolved

First, we create the `embeddings` service:

[source,console]
------------------------------------------------------------
PUT _inference/text_embedding/jinaai-embeddings
{
"service": "jinaai",
"service_settings": {
"model_id": "jina-embeddings-v3",
"api_key": "<api_key>",
JoanFM marked this conversation as resolved.
Show resolved Hide resolved
},
"task_settings": {}
JoanFM marked this conversation as resolved.
Show resolved Hide resolved
}
------------------------------------------------------------
JoanFM marked this conversation as resolved.
Show resolved Hide resolved

Then, we create the `rerank` service:
[source,console]
------------------------------------------------------------
PUT _inference/rerank/jinaai-rerank
{
"service": "jinaai",
"service_settings": {
"api_key": "<API-KEY>",
JoanFM marked this conversation as resolved.
Show resolved Hide resolved
"model_id": "jina-reranker-v2-base-multilingual"
},
"task_settings": {
"top_n": 10,
"return_documents": true
}
}
------------------------------------------------------------

Now we can create an index that will use `jinaai-embeddings` service to index the documents.

[source,console]
------------------------------------------------------------
PUT jinaai-index
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": "jinaai-embeddings"
}
}
}
}
------------------------------------------------------------

[source,console]
------------------------------------------------------------
PUT jinaai-index/_bulk
{ "index" : { "_index" : "jinaai-index", "_id" : "1" } }
{"content": "Sarah Johnson is a talented marine biologist working at the Oceanographic Institute. Her groundbreaking research on coral reef ecosystems has garnered international attention and numerous accolades."}
{ "index" : { "_index" : "jinaai-index", "_id" : "2" } }
{"content": "She spends months at a time diving in remote locations, meticulously documenting the intricate relationships between various marine species. "}
{ "index" : { "_index" : "jinaai-index", "_id" : "3" } }
{"content": "Her dedication to preserving these delicate underwater environments has inspired a new generation of conservationists."}
------------------------------------------------------------

Now, with the index created, we can search with and without the reranker service.

[source,console]
------------------------------------------------------------
GET jinaai-index/_search
{
"query": {
"semantic": {
"field": "content",
"query": "who inspired taking care of the sea?"
}
}
}
------------------------------------------------------------

[source,console]
------------------------------------------------------------
POST jinaai-index/_search
{
"retriever": {
"text_similarity_reranker": {
"retriever": {
"standard": {
"query": {
"semantic": {
"field": "content",
"query": "who inspired taking care of the sea?"
}
}
}
},
"field": "content",
"rank_window_size": 100,
"inference_id": "jinaai-rerank",
"inference_text": "who inspired taking care of the sea?"
}
}
}
------------------------------------------------------------