added llm csv query notebook

STRIDES · Feb 9, 2024 · aee3b68 · aee3b68
1 parent afb2d96
commit aee3b68
Showing 1 changed file with 351 additions and 0 deletions.
diff --git a/tutorials/notebooks/GenAI/notebooks/llm_query_csv.ipynb b/tutorials/notebooks/GenAI/notebooks/llm_query_csv.ipynb
@@ -0,0 +1,351 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Query embeddings from structured data"
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 1) Install dependencies"
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Use Python3 (ipykernel) kernel"
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "pip install langchain openai"
+      ],
+      "outputs": [],
+      "execution_count": null,
+      "metadata": {
+        "jupyter": {
+          "source_hidden": false,
+          "outputs_hidden": false
+        },
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        },
+        "gather": {
+          "logged": 1707424158923
+        }
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 2) Import libraries"
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import os\n",
+        "import pandas as pd\n",
+        "from openai import AzureOpenAI\n"
+      ],
+      "outputs": [],
+      "execution_count": null,
+      "metadata": {
+        "jupyter": {
+          "source_hidden": false,
+          "outputs_hidden": false
+        },
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        },
+        "gather": {
+          "logged": 1707412445314
+        }
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 3) Connect to the index\n",
+        "This is the index you created via [these instructions](https://github.com/STRIDES/NIHCloudLabAzure/blob/main/docs/create_index_from_csv.md).\n",
+        "Look [here](https://learn.microsoft.com/en-us/azure/search/search-create-service-portal#name-the-service) for your endpoint name, and [here](https://learn.microsoft.com/en-us/azure/search/search-security-api-keys?tabs=portal-use%2Cportal-find%2Cportal-query#find-existing-keys) for your index key."
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "endpoint=\"<Your AI Search Endpoint>\"\n",
+        "index_name=\"<Your Index Name>\"\n",
+        "index_key='<Your Index Key>'"
+      ],
+      "outputs": [],
+      "execution_count": null,
+      "metadata": {
+        "jupyter": {
+          "source_hidden": false,
+          "outputs_hidden": false
+        },
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        },
+        "gather": {
+          "logged": 1707412411658
+        }
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#connect to vector store   \n",
+        "from azure.search.documents import SearchClient\n",
+        "from azure.core.credentials import AzureKeyCredential\n",
+        "\n",
+        "search_client = SearchClient(endpoint, index_name, AzureKeyCredential(index_key))"
+      ],
+      "outputs": [],
+      "execution_count": null,
+      "metadata": {
+        "jupyter": {
+          "source_hidden": false,
+          "outputs_hidden": false
+        },
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 4) Connect to your model\n",
+        "First, make sure you have a [model deployed](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-openai), and if not, deploy a model.\n",
+        "To get your endpoint, key, and version number, just go to the Chat Playground and click **View Code** at the top."
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#connect to model\n",
+        "os.environ[\"AZURE_OPENAI_ENDPOINT\"] = \"<Azure AI Studio Endpoint>\"\n",
+        "os.environ[\"AZURE_OPENAI_API_KEY\"] = \"<Azure AI Studio API Key\"\n",
+        "\n",
+        "client = AzureOpenAI(\n",
+        "    api_key=os.getenv(\"AZURE_OPENAI_KEY\"),  \n",
+        "    api_version=\"2023-08-01-preview\",\n",
+        "    azure_endpoint = os.getenv(\"AZURE_OPENAI_ENDPOINT\")\n",
+        "    )"
+      ],
+      "outputs": [],
+      "execution_count": null,
+      "metadata": {
+        "jupyter": {
+          "source_hidden": false,
+          "outputs_hidden": false
+        },
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        },
+        "gather": {
+          "logged": 1707412412208
+        }
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 5) Query the Vector Store"
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "First, enter your question. Feel free to experiment with different variations or prompts"
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "query = \" \\\n",
+        "    Your input data is a list of grants. \\\n",
+        "    Based on only the 'Project_Title' \\\n",
+        "    list the 'Project_Number' and 'Total_Cost' \\\n",
+        "    of all grants related to breast cancer \\\n",
+        "\""
+      ],
+      "outputs": [],
+      "execution_count": null,
+      "metadata": {
+        "jupyter": {
+          "source_hidden": false,
+          "outputs_hidden": false
+        },
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now we feed the query and the input embeddings to our LLM and return the results "
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#run query output on model\n",
+        "search_results = str(list(search_client.search(query)))\n",
+        "response = client.chat.completions.create(\n",
+        "    model=\"gpt-4\",\n",
+        "    messages=[\n",
+        "        {\"role\": \"system\", \"content\": \"You are an NIH Program Officer\"},\n",
+        "        {\"role\": \"user\", \"content\": \"Context: \"+ search_results + \"\\n\\n Query: \" + query}\n",
+        "    ],\n",
+        ")\n",
+        "#view model output\n",
+        "response.choices[0].message.content.strip()"
+      ],
+      "outputs": [],
+      "execution_count": null,
+      "metadata": {
+        "jupyter": {
+          "source_hidden": false,
+          "outputs_hidden": false
+        },
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "And that is it! You successfully created a simple chat bot that runs queries against structured data! This is a complex problem and there are a lot of good blogs out there that describe more complex architectures. We recommend you do some investigation and see if you can come up with an even better solution for your use case! "
+      ],
+      "metadata": {
+        "nteract": {
+          "transient": {
+            "deleting": false
+          }
+        }
+      }
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "name": "python38-azureml",
+      "language": "python",
+      "display_name": "Python 3.8 - AzureML"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.8.5",
+      "mimetype": "text/x-python",
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "pygments_lexer": "ipython3",
+      "nbconvert_exporter": "python",
+      "file_extension": ".py"
+    },
+    "microsoft": {
+      "ms_spell_check": {
+        "ms_spell_check_language": "en"
+      },
+      "host": {
+        "AzureML": {
+          "notebookHasBeenCompleted": true
+        }
+      }
+    },
+    "kernel_info": {
+      "name": "python38-azureml"
+    },
+    "nteract": {
+      "version": "[email protected]"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 2
+}