Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of the neural sparse search tutorial #7922

Merged

Conversation

zhichao-aws
Copy link
Member

Description

Recently we get the feedback that it's hard to find using neural sparse search with raw sparse vectors in existing documentations. Besides, the neural sparse search tutorial is not well structured compared with neural search tutorial.

This PR refact the neural sparse search tutorial. Mainly for these points:

  1. Using seperate sections neural sparse search with built-in pipelines and with raw sparse vectors
  2. Add a seperate sub section about Set up an ML sparse encoding model. It takes a ref from https://opensearch.org/docs/latest/search-plugins/neural-search-tutorial/#step-1-set-up-an-ml-language-model, but with more contents about 2 working mode in neural sparse, and a table to show the model combinations we offer in OpenSearch (we'll release v2 models soon, and present the combinations in a table will be more clear)
  3. Refact the sections/subsections structure to give a more clear view

Issues Resolved

List any issues this PR will resolve, e.g. Closes [...].

Version

2.15, 2.16

Frontend features

If you're submitting documentation for an OpenSearch Dashboards feature, add a video that shows how a user will interact with the UI step by step. A voiceover is optional.

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: zhichao-aws <[email protected]>
Copy link

github-actions bot commented Aug 7, 2024

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

Signed-off-by: zhichao-aws <[email protected]>
@zhichao-aws
Copy link
Member Author

This PR is ready for review, thanks!

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws @zhichao-aws Please see my comments and changes and let me know if you have any questions. Thanks!

Before using neural sparse search, make sure to set up a [pretrained sparse embedding model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sparse-encoding-models) or your own sparse embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
{: .note}
- Generate vector embeddings within OpenSearch: Configure an ingest pipeline to generate and store sparse vector embeddings from document text at ingestion time. At query time, input plain text, which will be automatically converted into vector embeddings for search. For complete setup steps, see [Configuring ingest pipelines for neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-pipelines/).
- Ingest raw sparse vectors and search using them directly. For complete setup steps, see [Ingesting and searching raw vectors]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-with-raw-vectors/).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"search using them directly" => "use them to search directly"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded.

_search-plugins/neural-sparse-with-pipelines.md Outdated Show resolved Hide resolved

For this tutorial, you’ll use neural sparse search with OpenSearch’s built-in ML model hosting and ingest pipelines. Because the transformation of text to embeddings is performed within OpenSearch, you'll use text when ingesting and searching documents.

At ingestion time, neural sparse search uses a sparse encoding to generate sparse vector embeddings from text fields during ingestion.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "model" following "encoding"? It looks like we don't need "during ingestion" because we already have "At ingestion time".


This tutorial consists of the following steps:

1. [**Configure a sparse encoding model/tokenizer**](#step-1-configure-a-sparse-encoding-modeltokenizer).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the periods aren't necessary in this list.

_search-plugins/neural-sparse-with-pipelines.md Outdated Show resolved Hide resolved

This tutorial consists of the following steps:

1. [**Ingest sparse vectors**](#step-1-ingest-sparse-vectors).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No periods necessary in this list.

1. [**Ingest sparse vectors**](#step-1-ingest-sparse-vectors).
1. [Create an index](#step-1a-create-an-index).
1. [Ingest documents into the index](#step-1b-ingest-documents-into-the-index).
1. [**Search the data using raw sparse vector**](#step-2-search-the-data-using-a-sparse-vector).
Copy link
Collaborator

@natebower natebower Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. [**Search the data using raw sparse vector**](#step-2-search-the-data-using-a-sparse-vector).
1. [**Search the data using a sparse vector**](#step-2-search-the-data-using-a-sparse-vector)


## Step 1: Ingest sparse vectors

Once you have generated sparse vector embeddings, you can ingest them into OpenSearch directly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"directly ingest"?


### Step 1(a): Create an index

In order to ingest documents of raw sparse vectors, create a rank features index:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "of" the right word here, or do we mean something like "containing" or "with"?

_search-plugins/neural-sparse-with-raw-vectors.md Outdated Show resolved Hide resolved
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @zhichao-aws!

@kolchfa-aws kolchfa-aws merged commit ecd2232 into opensearch-project:main Aug 13, 2024
5 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 13, 2024
* refactor

Signed-off-by: zhichao-aws <[email protected]>

* fix

Signed-off-by: zhichao-aws <[email protected]>

* Doc review

Signed-off-by: Fanit Kolchina <[email protected]>

* Link fix

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Fanit Kolchina <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit ecd2232)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants