Skip to content

Commit

Permalink
Add ML tutorials (#7180)
Browse files Browse the repository at this point in the history
* Add ML tutorials

Signed-off-by: Fanit Kolchina <[email protected]>

* Writing

Signed-off-by: Fanit Kolchina <[email protected]>

* Conversational search

Signed-off-by: Fanit Kolchina <[email protected]>

* Add rag chatbot

Signed-off-by: Fanit Kolchina <[email protected]>

* Writing

Signed-off-by: Fanit Kolchina <[email protected]>

* Add RAG chatbot and convo agent

Signed-off-by: Fanit Kolchina <[email protected]>

* Add reranking cohere tutorial

Signed-off-by: Fanit Kolchina <[email protected]>

* Add semantic search tutorial

Signed-off-by: Fanit Kolchina <[email protected]>

* Add generating embeddings

Signed-off-by: Fanit Kolchina <[email protected]>

* Add generate embeddings to index

Signed-off-by: Fanit Kolchina <[email protected]>

* Rewriting

Signed-off-by: Fanit Kolchina <[email protected]>

* Rewriting

Signed-off-by: Fanit Kolchina <[email protected]>

* Rewriting

Signed-off-by: Fanit Kolchina <[email protected]>

* Reword

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Consistency

Signed-off-by: Fanit Kolchina <[email protected]>

* Add tutorials section to index page

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
  • Loading branch information
3 people authored Jun 3, 2024
1 parent e20114b commit d01e74f
Show file tree
Hide file tree
Showing 14 changed files with 3,350 additions and 2 deletions.
1 change: 1 addition & 0 deletions .github/vale/styles/Vocab/OpenSearch/Products/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Amazon SageMaker
Ansible
Auditbeat
AWS Cloud
Cohere Command
Cognito
Dashboards Query Language
Data Prepper
Expand Down
1 change: 1 addition & 0 deletions _dashboards/dashboards-assistant/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,4 @@ The following screenshot shows a saved conversation, along with actions you can

- [Getting started guide for OpenSearch Assistant in OpenSearch Dashboards](https://github.com/opensearch-project/dashboards-assistant/blob/main/GETTING_STARTED_GUIDE.md)
- [OpenSearch Assistant configuration through the REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/opensearch-assistant/)
- [Build your own chatbot]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/build-chatbot/)
2 changes: 1 addition & 1 deletion _ml-commons-plugin/agents-tools/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ An _agent_ is a coordinator that uses a large language model (LLM) to solve a pr

- [_Flow agent_](#flow-agents): Runs tools sequentially, in the order specified in its configuration. The workflow of a flow agent is fixed. Useful for retrieval-augmented generation (RAG).
- [_Conversational flow agent_](#conversational-flow-agents): Runs tools sequentially, in the order specified in its configuration. The workflow of a conversational flow agent is fixed. Stores conversation history so that users can ask follow-up questions. Useful for creating a chatbot.
- [_Conversational agent_](#conversational-agents): Reasons in order to provide a response based on the available knowledge, including the LLM knowledge base and a set of tools provided to the LLM. Stores conversation history so that users can ask follow-up questions. The workflow of a conversational agent is variable, based on follow-up questions. For specific questions, uses the Chain-of-Thought (CoT) process to select the best tool from the configured tools for providing a response to the question. Useful for creating a chatbot that employs RAG.
- [_Conversational agent_](#conversational-agents): Reasons in order to provide a response based on the available knowledge, including the LLM knowledge base and a set of tools provided to the LLM. The LLM reasons iteratively to decide what action to take until it obtains the final answer or reaches the iteration limit. Stores conversation history so that users can ask follow-up questions. The workflow of a conversational agent is variable, based on follow-up questions. For specific questions, uses the Chain-of-Thought (CoT) process to select the best tool from the configured tools for providing a response to the question. Useful for creating a chatbot that employs RAG.

### Flow agents

Expand Down
6 changes: 5 additions & 1 deletion _ml-commons-plugin/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,8 @@ ML Commons supports various algorithms to help train ML models and make predicti

## ML Commons API

ML Commons provides its own set of REST APIs. For more information, see [ML Commons API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/index/).
ML Commons provides its own set of REST APIs. For more information, see [ML Commons API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/index/).

## Tutorials

Using the OpenSearch ML framework, you can build various applications, from implementing conversational search to building your own chatbot. For more information, see [Tutorials]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/index/).
910 changes: 910 additions & 0 deletions _ml-commons-plugin/tutorials/build-chatbot.md

Large diffs are not rendered by default.

228 changes: 228 additions & 0 deletions _ml-commons-plugin/tutorials/conversational-search-cohere.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
---
layout: default
title: Conversational search with Cohere Command
parent: Tutorials
nav_order: 20
---

# Conversational search using the Cohere Command model

This tutorial illustrates how to configure conversational search using the Cohere Command model. For more information, see [Conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/).

Replace the placeholders beginning with the prefix `your_` with your own values.
{: .note}

Alternatively, you can build a RAG/conversational search using agents and tools. For more information, see [Retrieval-augmented generation chatbot]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/rag-conversational-agent/).

## Prerequisite

Ingest test data:

```json
POST _bulk
{"index": {"_index": "qa_demo", "_id": "1"}}
{"text": "Chart and table of population level and growth rate for the Ogden-Layton metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\nThe current metro area population of Ogden-Layton in 2023 is 750,000, a 1.63% increase from 2022.\nThe metro area population of Ogden-Layton in 2022 was 738,000, a 1.79% increase from 2021.\nThe metro area population of Ogden-Layton in 2021 was 725,000, a 1.97% increase from 2020.\nThe metro area population of Ogden-Layton in 2020 was 711,000, a 2.16% increase from 2019."}
{"index": {"_index": "qa_demo", "_id": "2"}}
{"text": "Chart and table of population level and growth rate for the New York City metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of New York City in 2023 is 18,937,000, a 0.37% increase from 2022.\\nThe metro area population of New York City in 2022 was 18,867,000, a 0.23% increase from 2021.\\nThe metro area population of New York City in 2021 was 18,823,000, a 0.1% increase from 2020.\\nThe metro area population of New York City in 2020 was 18,804,000, a 0.01% decline from 2019."}
{"index": {"_index": "qa_demo", "_id": "3"}}
{"text": "Chart and table of population level and growth rate for the Chicago metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of Chicago in 2023 is 8,937,000, a 0.4% increase from 2022.\\nThe metro area population of Chicago in 2022 was 8,901,000, a 0.27% increase from 2021.\\nThe metro area population of Chicago in 2021 was 8,877,000, a 0.14% increase from 2020.\\nThe metro area population of Chicago in 2020 was 8,865,000, a 0.03% increase from 2019."}
{"index": {"_index": "qa_demo", "_id": "4"}}
{"text": "Chart and table of population level and growth rate for the Miami metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of Miami in 2023 is 6,265,000, a 0.8% increase from 2022.\\nThe metro area population of Miami in 2022 was 6,215,000, a 0.78% increase from 2021.\\nThe metro area population of Miami in 2021 was 6,167,000, a 0.74% increase from 2020.\\nThe metro area population of Miami in 2020 was 6,122,000, a 0.71% increase from 2019."}
{"index": {"_index": "qa_demo", "_id": "5"}}
{"text": "Chart and table of population level and growth rate for the Austin metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of Austin in 2023 is 2,228,000, a 2.39% increase from 2022.\\nThe metro area population of Austin in 2022 was 2,176,000, a 2.79% increase from 2021.\\nThe metro area population of Austin in 2021 was 2,117,000, a 3.12% increase from 2020.\\nThe metro area population of Austin in 2020 was 2,053,000, a 3.43% increase from 2019."}
{"index": {"_index": "qa_demo", "_id": "6"}}
{"text": "Chart and table of population level and growth rate for the Seattle metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of Seattle in 2023 is 3,519,000, a 0.86% increase from 2022.\\nThe metro area population of Seattle in 2022 was 3,489,000, a 0.81% increase from 2021.\\nThe metro area population of Seattle in 2021 was 3,461,000, a 0.82% increase from 2020.\\nThe metro area population of Seattle in 2020 was 3,433,000, a 0.79% increase from 2019."}
```
{% include copy-curl.html %}

## Step 1: Create a connector and register a model

Conversational search only supports the [OpenAI](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/open_ai_connector_chat_blueprint.md)
and [Amazon Bedrock Claude](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_anthropic_claude_blueprint.md) input/output styles.
{: .important}

This tutorial follows the Amazon Bedrock Claude model input/output style by:
- Mapping the Cohere Command `message` input parameter to the `inputs` parameter in order to match the Cohere Claude model input style.
- Using a post-processing function to convert the Cohere Command model output to the Claude model output style.

Create a connector for the Cohere Command model:

```json
POST _plugins/_ml/connectors/_create
{
"name": "Cohere Chat Model",
"description": "The connector to Cohere's public chat API",
"version": "1",
"protocol": "http",
"credential": {
"cohere_key": "your_cohere_api_key"
},
"parameters": {
"model": "command"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "https://api.cohere.ai/v1/chat",
"headers": {
"Authorization": "Bearer ${credential.cohere_key}",
"Request-Source": "unspecified:opensearch"
},
"request_body": "{ \"message\": \"${parameters.inputs}\", \"model\": \"${parameters.model}\" }",
"post_process_function": "\n String escape(def input) { \n if (input.contains(\"\\\\\")) {\n input = input.replace(\"\\\\\", \"\\\\\\\\\");\n }\n if (input.contains(\"\\\"\")) {\n input = input.replace(\"\\\"\", \"\\\\\\\"\");\n }\n if (input.contains('\r')) {\n input = input = input.replace('\r', '\\\\r');\n }\n if (input.contains(\"\\\\t\")) {\n input = input.replace(\"\\\\t\", \"\\\\\\\\\\\\t\");\n }\n if (input.contains('\n')) {\n input = input.replace('\n', '\\\\n');\n }\n if (input.contains('\b')) {\n input = input.replace('\b', '\\\\b');\n }\n if (input.contains('\f')) {\n input = input.replace('\f', '\\\\f');\n }\n return input;\n }\n def name = 'response';\n def result = params.text;\n def json = '{ \"name\": \"' + name + '\",' +\n '\"dataAsMap\": { \"completion\": \"' + escape(result) +\n '\"}}';\n return json;\n \n "
}
]
}
```
{% include copy-curl.html %}

Starting in OpenSearch 2.12, you can use the default `escape` function directly in the `post_process_function`:

```json
"post_process_function": " \n def name = 'response';\n def result = params.text;\n def json = '{ \"name\": \"' + name + '\",' +\n '\"dataAsMap\": { \"completion\": \"' + escape(result) +\n '\"}}';\n return json;"
```
{% include copy-curl.html %}

Note the connector ID; you'll use it to register the model.

Register the Cohere Command model:

```json
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "Cohere command model",
"function_name": "remote",
"description": "Cohere command model",
"connector_id": "your_connector_id"
}
```
{% include copy-curl.html %}

Note the model ID; you'll use it in the following steps.

Test the model:

```json
POST /_plugins/_ml/models/your_model_id/_predict
{
"parameters": {
"inputs": "What is the weather like in Seattle?"
}
}
```
{% include copy-curl.html %}

The response contains the LLM completion:

```json
{
"inference_results": [
{
"output": [
{
"name": "response",
"dataAsMap": {
"completion": """It is difficult to provide a comprehensive answer without a specific location or time frame in mind.

As an AI language model, I have no access to real-time data or the ability to provide live weather reports. Instead, I can offer some general information about Seattle's weather, which is known for its mild, wet climate.

Located in the Pacific Northwest region of the United States, Seattle experiences a maritime climate with cool, dry summers and mild, wet winters. While it is best known for its rainy days, Seattle's annual rainfall is actually less than New York City and Boston.

Would you like me to provide more details on Seattle's weather? Or, if you have a specific date or location in mind, I can try to retrieve real-time or historical weather information for you."""
}
}
],
"status_code": 200
}
]
}
```

## Step 2: Configure conversational search

Create a search pipeline containing a RAG processor:

```json
PUT /_search/pipeline/my-conversation-search-pipeline-cohere
{
"response_processors": [
{
"retrieval_augmented_generation": {
"tag": "Demo pipeline",
"description": "Demo pipeline Using Cohere",
"model_id": "your_model_id_created_in_step1",
"context_field_list": [
"text"
],
"system_prompt": "You are a helpful assistant",
"user_instructions": "Generate a concise and informative answer in less than 100 words for the given question"
}
}
]
}
```
{% include copy-curl.html %}

To run a conversational search, specify its parameters in the `generative_qa_parameters` object:

```json
GET /qa_demo/_search?search_pipeline=my-conversation-search-pipeline-cohere
{
"query": {
"match": {
"text": "What's the population increase of New York City from 2021 to 2023?"
}
},
"size": 1,
"_source": [
"text"
],
"ext": {
"generative_qa_parameters": {
"llm_model": "bedrock/claude",
"llm_question": "What's the population increase of New York City from 2021 to 2023?",
"context_size": 5,
"timeout": 15
}
}
}
```
{% include copy-curl.html %}

The response contains the model's answer:

```json
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 6,
"relation": "eq"
},
"max_score": 9.042081,
"hits": [
{
"_index": "qa_demo",
"_id": "2",
"_score": 9.042081,
"_source": {
"text": """Chart and table of population level and growth rate for the New York City metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\nThe current metro area population of New York City in 2023 is 18,937,000, a 0.37% increase from 2022.\nThe metro area population of New York City in 2022 was 18,867,000, a 0.23% increase from 2021.\nThe metro area population of New York City in 2021 was 18,823,000, a 0.1% increase from 2020.\nThe metro area population of New York City in 2020 was 18,804,000, a 0.01% decline from 2019."""
}
}
]
},
"ext": {
"retrieval_augmented_generation": {
"answer": "The population of the New York City metro area increased by about 210,000 from 2021 to 2023. The 2021 population was 18,823,000, and in 2023 it was 18,937,000. The average growth rate is 0.23% yearly."
}
}
}
```
Loading

0 comments on commit d01e74f

Please sign in to comment.