Skip to content

Commit

Permalink
Updates chunk settings documentation (elastic#116719) (elastic#116722)
Browse files Browse the repository at this point in the history
(cherry picked from commit bada2a6)
  • Loading branch information
kosabogi authored Nov 13, 2024
1 parent fa541d2 commit e1af8cc
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions docs/reference/mapping/types/semantic-text.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -87,16 +87,15 @@ Trying to <<delete-inference-api,delete an {infer} endpoint>> that is used on a

[discrete]
[[auto-text-chunking]]
==== Automatic text chunking
==== Text chunking

{infer-cap} endpoints have a limit on the amount of text they can process.
To allow for large amounts of text to be used in semantic search, `semantic_text` automatically generates smaller passages if needed, called _chunks_.

Each chunk will include the text subpassage and the corresponding embedding generated from it.
When querying, the individual passages will be automatically searched for each document, and the most relevant passage will be used to compute a score.

Documents are split into 250-word sections with a 100-word overlap so that each section shares 100 words with the previous section.
This overlap ensures continuity and prevents vital contextual information in the input text from being lost by a hard break.
For more details on chunking and how to configure chunking settings, see <<infer-chunking-config, Configuring chunking>> in the Inference API documentation.


[discrete]
Expand Down

0 comments on commit e1af8cc

Please sign in to comment.