-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
afb2d96
commit aee3b68
Showing
1 changed file
with
351 additions
and
0 deletions.
There are no files selected for viewing
351 changes: 351 additions & 0 deletions
351
tutorials/notebooks/GenAI/notebooks/llm_query_csv.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,351 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"### Query embeddings from structured data" | ||
], | ||
"metadata": { | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"### 1) Install dependencies" | ||
], | ||
"metadata": { | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"Use Python3 (ipykernel) kernel" | ||
], | ||
"metadata": { | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"pip install langchain openai" | ||
], | ||
"outputs": [], | ||
"execution_count": null, | ||
"metadata": { | ||
"jupyter": { | ||
"source_hidden": false, | ||
"outputs_hidden": false | ||
}, | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
}, | ||
"gather": { | ||
"logged": 1707424158923 | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"### 2) Import libraries" | ||
], | ||
"metadata": { | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"import os\n", | ||
"import pandas as pd\n", | ||
"from openai import AzureOpenAI\n" | ||
], | ||
"outputs": [], | ||
"execution_count": null, | ||
"metadata": { | ||
"jupyter": { | ||
"source_hidden": false, | ||
"outputs_hidden": false | ||
}, | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
}, | ||
"gather": { | ||
"logged": 1707412445314 | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"### 3) Connect to the index\n", | ||
"This is the index you created via [these instructions](https://github.com/STRIDES/NIHCloudLabAzure/blob/main/docs/create_index_from_csv.md).\n", | ||
"Look [here](https://learn.microsoft.com/en-us/azure/search/search-create-service-portal#name-the-service) for your endpoint name, and [here](https://learn.microsoft.com/en-us/azure/search/search-security-api-keys?tabs=portal-use%2Cportal-find%2Cportal-query#find-existing-keys) for your index key." | ||
], | ||
"metadata": { | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"endpoint=\"<Your AI Search Endpoint>\"\n", | ||
"index_name=\"<Your Index Name>\"\n", | ||
"index_key='<Your Index Key>'" | ||
], | ||
"outputs": [], | ||
"execution_count": null, | ||
"metadata": { | ||
"jupyter": { | ||
"source_hidden": false, | ||
"outputs_hidden": false | ||
}, | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
}, | ||
"gather": { | ||
"logged": 1707412411658 | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"#connect to vector store \n", | ||
"from azure.search.documents import SearchClient\n", | ||
"from azure.core.credentials import AzureKeyCredential\n", | ||
"\n", | ||
"search_client = SearchClient(endpoint, index_name, AzureKeyCredential(index_key))" | ||
], | ||
"outputs": [], | ||
"execution_count": null, | ||
"metadata": { | ||
"jupyter": { | ||
"source_hidden": false, | ||
"outputs_hidden": false | ||
}, | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"### 4) Connect to your model\n", | ||
"First, make sure you have a [model deployed](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-openai), and if not, deploy a model.\n", | ||
"To get your endpoint, key, and version number, just go to the Chat Playground and click **View Code** at the top." | ||
], | ||
"metadata": { | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"#connect to model\n", | ||
"os.environ[\"AZURE_OPENAI_ENDPOINT\"] = \"<Azure AI Studio Endpoint>\"\n", | ||
"os.environ[\"AZURE_OPENAI_API_KEY\"] = \"<Azure AI Studio API Key\"\n", | ||
"\n", | ||
"client = AzureOpenAI(\n", | ||
" api_key=os.getenv(\"AZURE_OPENAI_KEY\"), \n", | ||
" api_version=\"2023-08-01-preview\",\n", | ||
" azure_endpoint = os.getenv(\"AZURE_OPENAI_ENDPOINT\")\n", | ||
" )" | ||
], | ||
"outputs": [], | ||
"execution_count": null, | ||
"metadata": { | ||
"jupyter": { | ||
"source_hidden": false, | ||
"outputs_hidden": false | ||
}, | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
}, | ||
"gather": { | ||
"logged": 1707412412208 | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"### 5) Query the Vector Store" | ||
], | ||
"metadata": { | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"First, enter your question. Feel free to experiment with different variations or prompts" | ||
], | ||
"metadata": { | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"query = \" \\\n", | ||
" Your input data is a list of grants. \\\n", | ||
" Based on only the 'Project_Title' \\\n", | ||
" list the 'Project_Number' and 'Total_Cost' \\\n", | ||
" of all grants related to breast cancer \\\n", | ||
"\"" | ||
], | ||
"outputs": [], | ||
"execution_count": null, | ||
"metadata": { | ||
"jupyter": { | ||
"source_hidden": false, | ||
"outputs_hidden": false | ||
}, | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"Now we feed the query and the input embeddings to our LLM and return the results " | ||
], | ||
"metadata": { | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"#run query output on model\n", | ||
"search_results = str(list(search_client.search(query)))\n", | ||
"response = client.chat.completions.create(\n", | ||
" model=\"gpt-4\",\n", | ||
" messages=[\n", | ||
" {\"role\": \"system\", \"content\": \"You are an NIH Program Officer\"},\n", | ||
" {\"role\": \"user\", \"content\": \"Context: \"+ search_results + \"\\n\\n Query: \" + query}\n", | ||
" ],\n", | ||
")\n", | ||
"#view model output\n", | ||
"response.choices[0].message.content.strip()" | ||
], | ||
"outputs": [], | ||
"execution_count": null, | ||
"metadata": { | ||
"jupyter": { | ||
"source_hidden": false, | ||
"outputs_hidden": false | ||
}, | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"And that is it! You successfully created a simple chat bot that runs queries against structured data! This is a complex problem and there are a lot of good blogs out there that describe more complex architectures. We recommend you do some investigation and see if you can come up with an even better solution for your use case! " | ||
], | ||
"metadata": { | ||
"nteract": { | ||
"transient": { | ||
"deleting": false | ||
} | ||
} | ||
} | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"name": "python38-azureml", | ||
"language": "python", | ||
"display_name": "Python 3.8 - AzureML" | ||
}, | ||
"language_info": { | ||
"name": "python", | ||
"version": "3.8.5", | ||
"mimetype": "text/x-python", | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"pygments_lexer": "ipython3", | ||
"nbconvert_exporter": "python", | ||
"file_extension": ".py" | ||
}, | ||
"microsoft": { | ||
"ms_spell_check": { | ||
"ms_spell_check_language": "en" | ||
}, | ||
"host": { | ||
"AzureML": { | ||
"notebookHasBeenCompleted": true | ||
} | ||
} | ||
}, | ||
"kernel_info": { | ||
"name": "python38-azureml" | ||
}, | ||
"nteract": { | ||
"version": "[email protected]" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |