Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating retriever-examples documentation to run validation tests on the provided snippets #116643

Merged
merged 19 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 87 additions & 1 deletion docs/reference/search/rrf.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ The `rrf` retriever does not currently support:
* <<rescore, rescore>>

Using unsupported features as part of a search with an `rrf` retriever results in an exception.
+

IMPORTANT: It is best to avoid providing a <<search-api-pit, point in time>> as part of the request, as
RRF creates one internally that is shared by all sub-retrievers to ensure consistent results.

Expand Down Expand Up @@ -703,3 +703,89 @@ So for the same params as above, we would now have:

* `from=0, size=2` would return [`1`, `5`] with ranks `[1, 2]`
* `from=2, size=2` would return an empty result set as it would fall outside the available `rank_window_size` results.

==== Aggregations in RRF

Using the `rrf` retriever, we can also gather aggregations from all its specified sub-retrievers. The aggregations gathered
are irrespective of the specified `rank_window_size` but instead we refer to the union of the result sets from all sub-retrievers,
i.e. we collect all matching documents and not just the top `rank_window_size`.
pmpailis marked this conversation as resolved.
Show resolved Hide resolved

So for example, assuming that we have the following documents:
pmpailis marked this conversation as resolved.
Show resolved Hide resolved
[source,python]
----
# doc | termA | termB |
_id: 1 = foo
_id: 2 = foo bar
_id: 3 = aardvark bar
_id: 4 = foo bar
----
pmpailis marked this conversation as resolved.
Show resolved Hide resolved
// NOTCONSOLE

And the following `rrf` query with a term aggregation specified on field `termA`:
pmpailis marked this conversation as resolved.
Show resolved Hide resolved
[source,js]
----
{
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"term": {
"termB": "bar"
}
}
}
},
{
"standard": {
"query": {
"match_all": { }
}
}
}
],
"rank_window_size": 1
}
},
"size": 1,
"aggs": {
"termA_agg": {
"terms": {
"field": "termA"
}
}
}
}
----
// NOTCONSOLE

Even though we have a `rank_window_size: 1`, the aggregations will be computed against **all** matching documents from the nested sub-retrievers.
pmpailis marked this conversation as resolved.
Show resolved Hide resolved
So, the expected aggs would be:
pmpailis marked this conversation as resolved.
Show resolved Hide resolved
[source, js]
----
{
"foo": 3,
"aardvark": 1
}

----
// NOTCONSOLE

==== Highlighting in RRF

Similarly to above, we can also add <<highlighting, highlight snippets>> to the `rrf` retriever's results. Highlighted snippets are computed based
pmpailis marked this conversation as resolved.
Show resolved Hide resolved
on the matching text queries defined on the sub-retrievers.

NOTE:: Highlighting on vector fields, using either the `knn` retriever or a `knn` query, is not supported at the moment.
pmpailis marked this conversation as resolved.
Show resolved Hide resolved

A more specific example of highlighting in RRF can also be found in the <<retrievers-examples-highlighting-retriever-results, retrievers examples>> page.

==== Inner hits in RRF

Computing <<inner-hits, inner hits>> is now also an option for RRF. We can specify inner hits as part of a nested sub-retriever, which
will be propagated to the top level parent retriever. Actual inner hit computation will take place only at end of `rrf` retriever's evaluation on the top matching documents,
and not as part of the query execution of the nested sub-retrievers.
pmpailis marked this conversation as resolved.
Show resolved Hide resolved

IMPORTANT:: If we have more than one `inner_hit` sections defined across all sub-retrievers for RRF, then we have
to provide a custom name for each `inner_hit` that would be unique across all sub-retrievers for the search request.
pmpailis marked this conversation as resolved.
Show resolved Hide resolved
Loading