Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ML tutorials #7180

Merged
merged 19 commits into from
Jun 3, 2024
Merged

Add ML tutorials #7180

merged 19 commits into from
Jun 3, 2024

Conversation

kolchfa-aws
Copy link
Collaborator

Closes #6673

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Fanit Kolchina <[email protected]>
@kolchfa-aws kolchfa-aws self-assigned this May 16, 2024
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
kolchfa-aws and others added 6 commits May 29, 2024 18:04
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Copy link
Contributor

@vagimeli vagimeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Doc review complete. Minimal copyedits. Tutorial is crisp and clear. Well done!

_ml-commons-plugin/tutorials/build-chatbot.md Outdated Show resolved Hide resolved
_ml-commons-plugin/tutorials/build-chatbot.md Outdated Show resolved Hide resolved
_ml-commons-plugin/tutorials/build-chatbot.md Outdated Show resolved Hide resolved
_ml-commons-plugin/tutorials/build-chatbot.md Outdated Show resolved Hide resolved
_ml-commons-plugin/tutorials/reranking-cohere.md Outdated Show resolved Hide resolved
nav_order: 10
---

# Semantic search using byte quantized vectors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the titles match, or is it shortened for navigation menu?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shortened for left nav :)

_ml-commons-plugin/tutorials/index.md Outdated Show resolved Hide resolved
Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Great job on this 😄. Please see my comments and changes and let me know if you have any questions. Thanks!

_ml-commons-plugin/agents-tools/index.md Outdated Show resolved Hide resolved
_ml-commons-plugin/tutorials/build-chatbot.md Outdated Show resolved Hide resolved

## Prerequisite

Log in to the OpenSearch Dashboards homepage, select **Add sample data**, and add **Sample eCommerce orders** data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "the" precede Sample eCommerce orders?


## Step 1: Configure a knowledge base

Follow Prerequisite and Step 1 of the [RAG with a conversational flow agent tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/rag-conversational-agent/) to configure the `test_population_data` knowledge base index, which contains US city population data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "Prerequisite and Step 1" the name of a section, or should it read something like "Meet the prerequisite and follow step 1 of the..."?

- `llm`: Defines the LLM configuration:
- `"max_iteration": 5`: The agent runs the LLM a maximum of five times.
- `"response_filter": "$.completion"`: Needed to retrieve the LLM answer from the Bedrock Claude model response.
- `"message_history_limit": 5`: The agent retrieves a maximum of the five most recent history messages and adds them to the LLM context. Set this parameter to `0` to omit message history in the context.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "history" necessary between "recent" and "messages"? It reads a bit awkwardly.

```
{% include copy-curl.html %}

For compatibility with the Neural Search plugin, the `data_type` (output in the `inference_results.output.data_type` field of the response) must be set to `FLOAT32` in the post-processing function, even though the actual embedding type will be `INT8`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For compatibility with the Neural Search plugin, the `data_type` (output in the `inference_results.output.data_type` field of the response) must be set to `FLOAT32` in the post-processing function, even though the actual embedding type will be `INT8`.
To ensure compatibility with the Neural Search plugin, the `data_type` (output in the `inference_results.output.data_type` field of the response) must be set to `FLOAT32` in the post-processing function, even though the actual embedding type will be `INT8`.

```
{% include copy-curl.html %}

Note the model ID in the response; you'll use it in the following steps.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor consistency nit: At this point in the tutorials, we sometimes say "next steps" and sometimes say "following steps".

```
{% include copy-curl.html %}

Next, create a k-NN index and set the `data_type` on the `passage_embedding` field to `byte` so it can hold byte-quantized vectors:
Copy link
Collaborator

@natebower natebower Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Next, create a k-NN index and set the `data_type` on the `passage_embedding` field to `byte` so it can hold byte-quantized vectors:
Next, create a k-NN index and set the `data_type` for the `passage_embedding` field to `byte` so that it can hold byte-quantized vectors:

```
{% include copy-curl.html %}

Next, create a k-NN index and set the `data_type` on the `passage_embedding` field to `byte` so it can hold byte-quantized vectors:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"on" => "for"?


## Step 3: Configure semantic search

Create a connector to an embedding model that has the `search_query` input type:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"that has" => "containing" or "with"?


Note the agent ID; you'll use it in the next step.

## Step 4: Execute the agent
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Step 4: Execute the agent
## Step 4: Run the agent

```
{% include copy-curl.html %}

The response contains the LLM answer:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The response contains the LLM answer:
The response contains the answer generated by the LLM:

```
{% include copy-curl.html %}

The response contains the LLM answer:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The response contains the LLM answer:
The response contains the answer generated by the LLM:

```
{% include copy-curl.html %}

The agent will run the tools sequentially in the new order defined in `selected_tools`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The agent will run the tools sequentially in the new order defined in `selected_tools`.
The agent will run the tools one by one in the new order defined in `selected_tools`.


### Natural language query

The `PPLTool` can translate a natural language query (NLQ) to [PPL]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) and execute the generated PPL query.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `PPLTool` can translate a natural language query (NLQ) to [PPL]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) and execute the generated PPL query.
The `PPLTool` can translate a natural language query (NLQ) to [Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) and execute the generated PPL query.

```
{% include copy-curl.html %}

### Step 2: Execute the agent with an NLQ
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Step 2: Execute the agent with an NLQ
### Step 2: Run the agent with an NLQ

```
{% include copy-curl.html %}

The response contains the LLM answer:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The response contains the LLM answer:
The response contains the answer generated by the LLM:


## Step 3: Configure semantic search

Create a connector to an embedding model that has the `search_query` input type:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Create a connector to an embedding model that has the `search_query` input type:
Create a connector to an embedding model with the `search_query` input type:

- `llm`: Defines the LLM configuration:
- `"max_iteration": 5`: The agent runs the LLM a maximum of five times.
- `"response_filter": "$.completion"`: Needed to retrieve the LLM answer from the Bedrock Claude model response.
- `"message_history_limit": 5`: The agent retrieves a maximum of the five most recent history messages and adds them to the LLM context. Set this parameter to `0` to omit message history in the context.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `"message_history_limit": 5`: The agent retrieves a maximum of the five most recent history messages and adds them to the LLM context. Set this parameter to `0` to omit message history in the context.
- `"message_history_limit": 5`: The agent retrieves a maximum of the five most recent history messages and adds them to the LLM context. Set this parameter to `0` to omit message history in the context.
Suggested change
- `"message_history_limit": 5`: The agent retrieves a maximum of the five most recent history messages and adds them to the LLM context. Set this parameter to `0` to omit message history in the context.
- `"message_history_limit": 5`: The agent retrieves a maximum of the five most recent historical messages and adds them to the LLM context. Set this parameter to `0` to omit message history in the context.

kolchfa-aws and others added 2 commits June 3, 2024 13:56
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
@kolchfa-aws kolchfa-aws added backport 2.14 PR: Backport label for 2.14 labels Jun 3, 2024
@kolchfa-aws kolchfa-aws merged commit d01e74f into main Jun 3, 2024
6 checks passed
@github-actions github-actions bot deleted the ml-tutorials branch June 3, 2024 19:19
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 3, 2024
* Add ML tutorials

Signed-off-by: Fanit Kolchina <[email protected]>

* Writing

Signed-off-by: Fanit Kolchina <[email protected]>

* Conversational search

Signed-off-by: Fanit Kolchina <[email protected]>

* Add rag chatbot

Signed-off-by: Fanit Kolchina <[email protected]>

* Writing

Signed-off-by: Fanit Kolchina <[email protected]>

* Add RAG chatbot and convo agent

Signed-off-by: Fanit Kolchina <[email protected]>

* Add reranking cohere tutorial

Signed-off-by: Fanit Kolchina <[email protected]>

* Add semantic search tutorial

Signed-off-by: Fanit Kolchina <[email protected]>

* Add generating embeddings

Signed-off-by: Fanit Kolchina <[email protected]>

* Add generate embeddings to index

Signed-off-by: Fanit Kolchina <[email protected]>

* Rewriting

Signed-off-by: Fanit Kolchina <[email protected]>

* Rewriting

Signed-off-by: Fanit Kolchina <[email protected]>

* Rewriting

Signed-off-by: Fanit Kolchina <[email protected]>

* Reword

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Consistency

Signed-off-by: Fanit Kolchina <[email protected]>

* Add tutorials section to index page

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit d01e74f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
github-actions bot pushed a commit that referenced this pull request Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.14 PR: Backport label for 2.14
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Port/link ml-commons Github tutorials
3 participants