Refactor of the neural sparse search tutorial #7922

zhichao-aws · 2024-08-07T08:52:58Z

Description

Recently we get the feedback that it's hard to find using neural sparse search with raw sparse vectors in existing documentations. Besides, the neural sparse search tutorial is not well structured compared with neural search tutorial.

This PR refact the neural sparse search tutorial. Mainly for these points:

Using seperate sections neural sparse search with built-in pipelines and with raw sparse vectors
Add a seperate sub section about Set up an ML sparse encoding model. It takes a ref from https://opensearch.org/docs/latest/search-plugins/neural-search-tutorial/#step-1-set-up-an-ml-language-model, but with more contents about 2 working mode in neural sparse, and a table to show the model combinations we offer in OpenSearch (we'll release v2 models soon, and present the combinations in a table will be more clear)
Refact the sections/subsections structure to give a more clear view

Issues Resolved

List any issues this PR will resolve, e.g. Closes [...].

Version

2.15, 2.16

Frontend features

If you're submitting documentation for an OpenSearch Dashboards feature, add a video that shows how a user will interact with the UI step by step. A voiceover is optional.

Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: zhichao-aws <[email protected]>

github-actions · 2024-08-07T08:53:09Z

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

Signed-off-by: zhichao-aws <[email protected]>

zhichao-aws · 2024-08-07T09:12:43Z

This PR is ready for review, thanks!

Signed-off-by: Fanit Kolchina <[email protected]>

natebower

@kolchfa-aws @zhichao-aws Please see my comments and changes and let me know if you have any questions. Thanks!

natebower · 2024-08-13T15:43:48Z

_search-plugins/neural-sparse-search.md

-Before using neural sparse search, make sure to set up a [pretrained sparse embedding model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models) or your own sparse embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
-{: .note}
+- Generate vector embeddings within OpenSearch: Configure an ingest pipeline to generate and store sparse vector embeddings from document text at ingestion time. At query time, input plain text, which will be automatically converted into vector embeddings for search. For complete setup steps, see [Configuring ingest pipelines for neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/).
+- Ingest raw sparse vectors and search using them directly. For complete setup steps, see [Ingesting and searching raw vectors]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-raw-vectors/).


"search using them directly" => "use them to search directly"?

_search-plugins/neural-sparse-with-pipelines.md

natebower · 2024-08-13T15:47:45Z

_search-plugins/neural-sparse-with-pipelines.md

+
+For this tutorial, you’ll use neural sparse search with OpenSearch’s built-in ML model hosting and ingest pipelines. Because the transformation of text to embeddings is performed within OpenSearch, you'll use text when ingesting and searching documents. 
+
+At ingestion time, neural sparse search uses a sparse encoding to generate sparse vector embeddings from text fields during ingestion. 


Should "model" following "encoding"? It looks like we don't need "during ingestion" because we already have "At ingestion time".

natebower · 2024-08-13T15:50:34Z

_search-plugins/neural-sparse-with-pipelines.md

+
+This tutorial consists of the following steps:
+
+1. [**Configure a sparse encoding model/tokenizer**](#step-1-configure-a-sparse-encoding-modeltokenizer).


It looks like the periods aren't necessary in this list.

_search-plugins/neural-sparse-with-pipelines.md

natebower · 2024-08-13T17:08:42Z

_search-plugins/neural-sparse-with-raw-vectors.md

+
+This tutorial consists of the following steps:
+
+1. [**Ingest sparse vectors**](#step-1-ingest-sparse-vectors).


No periods necessary in this list.

natebower · 2024-08-13T17:09:22Z

_search-plugins/neural-sparse-with-raw-vectors.md

+1. [**Ingest sparse vectors**](#step-1-ingest-sparse-vectors).
+    1. [Create an index](#step-1a-create-an-index).
+    1. [Ingest documents into the index](#step-1b-ingest-documents-into-the-index).
+1. [**Search the data using raw sparse vector**](#step-2-search-the-data-using-a-sparse-vector).


Suggested change

1. [**Search the data using raw sparse vector**](#step-2-search-the-data-using-a-sparse-vector).

1. [**Search the data using a sparse vector**](#step-2-search-the-data-using-a-sparse-vector)

natebower · 2024-08-13T17:10:01Z

_search-plugins/neural-sparse-with-raw-vectors.md

+
+## Step 1: Ingest sparse vectors
+
+Once you have generated sparse vector embeddings, you can ingest them into OpenSearch directly.


"directly ingest"?

natebower · 2024-08-13T17:10:39Z

_search-plugins/neural-sparse-with-raw-vectors.md

+
+### Step 1(a): Create an index
+
+In order to ingest documents of raw sparse vectors, create a rank features index:


Is "of" the right word here, or do we mean something like "containing" or "with"?

_search-plugins/neural-sparse-with-raw-vectors.md

_search-plugins/neural-sparse-search.md

_search-plugins/neural-sparse-with-pipelines.md

_search-plugins/neural-sparse-with-raw-vectors.md

Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>

kolchfa-aws

Thank you, @zhichao-aws!

* refactor Signed-off-by: zhichao-aws <[email protected]> * fix Signed-off-by: zhichao-aws <[email protected]> * Doc review Signed-off-by: Fanit Kolchina <[email protected]> * Link fix Signed-off-by: Fanit Kolchina <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> --------- Signed-off-by: zhichao-aws <[email protected]> Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Co-authored-by: Fanit Kolchina <[email protected]> Co-authored-by: kolchfa-aws <[email protected]> Co-authored-by: Nathan Bower <[email protected]> (cherry picked from commit ecd2232) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

refactor

e10c3f4

Signed-off-by: zhichao-aws <[email protected]>

zhichao-aws requested review from hdhalter, kolchfa-aws, Naarcha-AWS, vagimeli, AMoo-Miki, natebower, dlvenable and epugh as code owners August 7, 2024 08:52

github-actions bot assigned kolchfa-aws Aug 7, 2024

fix

7b7a348

Signed-off-by: zhichao-aws <[email protected]>

kolchfa-aws added 2 commits August 13, 2024 11:24

Doc review

27c6e09

Signed-off-by: Fanit Kolchina <[email protected]>

Link fix

31de9d6

Signed-off-by: Fanit Kolchina <[email protected]>

natebower reviewed Aug 13, 2024

View reviewed changes