diff --git a/README.md b/README.md
index 710d34f..e4b4561 100644
--- a/README.md
+++ b/README.md
@@ -54,7 +54,8 @@ Deployment: 6 mins
 | docs\_index\_id | The ID of the docs index |
 | documentai\_processor\_id | The full Document AI processor path ID |
 | firestore\_database\_name | The name of the Firestore database created |
-| neos\_walkthrough\_url | The URL to launch the in-console tutorial for the Generative AI Knowledge Base solution |
+| neos\_tutorial\_url | The URL to launch the in-console tutorial for the Generative AI Knowledge Base solution |
+| predictions\_notebook\_url | The URL to open the notebook for model predictions in Colab |
 | unique\_id | The unique ID for this deployment |
 
 <!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
diff --git a/metadata.yaml b/metadata.yaml
index f6a44d4..ba2876e 100644
--- a/metadata.yaml
+++ b/metadata.yaml
@@ -51,7 +51,7 @@ spec:
         label: Cloud Storage
   content:
     architecture:
-      diagramUrl: architecture-diagram.url
+      diagramUrl: assets/architecture_diagram.svg
       description:
         - Uploading a new document triggers the webhook Cloud Function.
         - Document AI extracts the text from the document file.
@@ -110,8 +110,10 @@ spec:
         description: The full Document AI processor path ID
       - name: firestore_database_name
         description: The name of the Firestore database created
-      - name: neos_walkthrough_url
+      - name: neos_tutorial_url
         description: The URL to launch the in-console tutorial for the Generative AI Knowledge Base solution
+      - name: predictions_notebook_url
+        description: The URL to open the notebook for model predictions in Colab
       - name: unique_id
         description: The unique ID for this deployment
   requirements:
diff --git a/notebooks/model-predictions.ipynb b/notebooks/model-predictions.ipynb
new file mode 100644
index 0000000..fb7ea08
--- /dev/null
+++ b/notebooks/model-predictions.ipynb
@@ -0,0 +1,487 @@
+{
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# Copyright 2024 Google LLC\n",
+        "#\n",
+        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+        "# you may not use this file except in compliance with the License.\n",
+        "# You may obtain a copy of the License at\n",
+        "#\n",
+        "#    https://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing, software\n",
+        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+        "# See the License for the specific language governing permissions and\n",
+        "# limitations under the License."
+      ],
+      "metadata": {
+        "id": "l6nGHoRo3mym"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Generative AI Knowledge Base model predictions\n",
+        "\n",
+        "Before you begin, make sure all the dependencies are installed."
+      ],
+      "metadata": {
+        "id": "PQFrKlY5Yi2w"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install google-cloud-aiplatform google-cloud-firestore"
+      ],
+      "metadata": {
+        "id": "W9C3mHjIiZn1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Overview\n",
+        "\n",
+        "A **Large Language Model (LLM)** can be very good at answering general questions.\n",
+        "But it might not do as well to answer questions from your documents on its own.\n",
+        "\n",
+        "The LLM will answer only from what it learned from its _training dataset_.\n",
+        "Your documents might include information or words that weren't on that dataset.\n",
+        "Or they might be used in a different or more specialized context.\n",
+        "\n",
+        "This is where **Vector Search** comes into place.\n",
+        "Each time you upload a document, the Cloud Function webhook processes it.\n",
+        "When a document is processed, each individual page is _indexed_.\n",
+        "This allows us to not only find documents, but the specific pages.\n",
+        "\n",
+        "The relevant pages can then be used as _context_ for the LLM to answer the question.\n",
+        "This _grounds_ the model to answer questions based on the documents only.\n",
+        "Without this, the model might give wrong answers, or _hallucinations_."
+      ],
+      "metadata": {
+        "id": "tXeqwSesfIjO"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## My Google Cloud resources\n",
+        "\n",
+        "Fill in your project ID, the\n",
+        "[Google Cloud location](https://cloud.google.com/about/locations)\n",
+        "you want to use, and your\n",
+        "[Vector Search index endpoint ID](https://console.cloud.google.com/vertex-ai/matching-engine/index-endpoints).\n",
+        "\n",
+        "> 💡 The Vector Search index endpoint ID looks like a number, like `1234567890123456789`.\n",
+        "\n",
+        "Run the following cell to set up your resources and authenticate to your account."
+      ],
+      "metadata": {
+        "id": "nZeNBhYcknZK"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title\n",
+        "from google.colab import auth\n",
+        "\n",
+        "project_id = \"\" # @param {type:\"string\"}\n",
+        "location = \"us-central1\" # @param {type:\"string\"}\n",
+        "index_endpoint_id = \"\" # @param {type:\"string\"}\n",
+        "deployed_index_id = \"deployed_index\" # @param {type:\"string\"}\n",
+        "\n",
+        "auth.authenticate_user(project_id=project_id)"
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "4EctJVdOj0MY"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The first step is to initialize the Vertex AI client library using the location of your choice."
+      ],
+      "metadata": {
+        "id": "1P7apRRQabq8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from google.cloud import aiplatform\n",
+        "\n",
+        "aiplatform.init(location=location)"
+      ],
+      "metadata": {
+        "id": "nkPB50oClSD6"
+      },
+      "execution_count": 2,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Get text embeddings\n",
+        "\n",
+        "You can use the Gecko model to get embeddings from text.\n",
+        "For more information, see the\n",
+        "[Get text embeddings](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings)\n",
+        "page."
+      ],
+      "metadata": {
+        "id": "5rDc4RataxgE"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from vertexai.language_models import TextEmbeddingModel\n",
+        "\n",
+        "def get_text_embedding(text: str) -> list[float]:\n",
+        "    model = TextEmbeddingModel.from_pretrained(\"textembedding-gecko@003\")\n",
+        "    return model.get_embeddings([text])[0].values\n",
+        "\n",
+        "\n",
+        "# Convert the question into an embedding.\n",
+        "question = \"What are LFs and why are they useful?\"\n",
+        "question_embedding = get_text_embedding(question)\n",
+        "print(f\"Embedding dimensions: {len(question_embedding)}\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "fQ97FaoBdO_8",
+        "outputId": "9409c1fa-5096-4575-aea6-b361b9518640"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Embedding dimensions: 768\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Find document context\n",
+        "\n",
+        "All the documents you have processed have been indexed into your Vector Search index.\n",
+        "You can query for the closest embeddings to a given embedding from your Vector Search index endpoint.\n",
+        "\n",
+        "> 💡 If you haven't processed any documents yet, you won't get any results."
+      ],
+      "metadata": {
+        "id": "vnJfXPXAb-1Y"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from itertools import groupby\n",
+        "\n",
+        "def find_document(question: str, index_endpoint_id: str, deployed_index_id: str) -> tuple[str, int]:\n",
+        "    # Get embeddings for the question.\n",
+        "    embedding = get_text_embedding(question)\n",
+        "\n",
+        "    # Find the closest point from the Vector Search index endpoint.\n",
+        "    endpoint = aiplatform.MatchingEngineIndexEndpoint(index_endpoint_id)\n",
+        "    point = endpoint.find_neighbors(\n",
+        "        deployed_index_id=deployed_index_id,\n",
+        "        queries=[embedding],\n",
+        "        num_neighbors=1,\n",
+        "    )[0][0]\n",
+        "\n",
+        "    # Get the document name and page number from the point ID.\n",
+        "    (filename, page_number) = point.id.split(':', 1)\n",
+        "    return (filename, int(page_number))\n",
+        "\n",
+        "# Query the Vector Search index for the most relevant page.\n",
+        "(filename, page_number) = find_document(question, index_endpoint_id, deployed_index_id)\n",
+        "print(f\"{filename=} {page_number=}\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "YxLfbjSLeaIh",
+        "outputId": "c7aed4f8-27a6-437e-82ef-f4dfd8aa5aa4"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "filename='9410009v1.pdf' page_number=3\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Get document text\n",
+        "\n",
+        "When documents were processed, their text was stored in Firestore as well.\n",
+        "The Vector Search query returned the relevant documents with their page numbers.\n",
+        "With this you can download the document's pages and give only the most relevant page to the model."
+      ],
+      "metadata": {
+        "id": "BzRC13xdeK5m"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from google.cloud import firestore\n",
+        "\n",
+        "def get_document_text(filename: str, page_number: int) -> str:\n",
+        "    db = firestore.Client(database='knowledge-base')\n",
+        "    doc = db.collection(\"documents\").document(filename)\n",
+        "    return doc.get().get('pages')[page_number]\n",
+        "\n",
+        "# Download the document's page text from Firestore.\n",
+        "context = get_document_text(filename, page_number)\n",
+        "print(f\"{context[:1000]}\\n...\\n...\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "nTJqJg1dfRY5",
+        "outputId": "09d25013-6e12-40ed-b52f-693735678657"
+      },
+      "execution_count": 5,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "EN SEM IND\n",
+            "FR SEM IND\n",
+            "VAR\n",
+            "REST {Magn( 1 )}\n",
+            "VAR\n",
+            "REST {Magn(\n",
+            "The interlingual status of the lexical function is\n",
+            "self-evident. Any occurrence of Magn will be left\n",
+            "intact during transfer and it will be the generation\n",
+            "component that ultimately assigns a monolingual\n",
+            "lexical entry to the LF.6\n",
+            "3.2 Problems\n",
+            "Lexical Functions abstract away from certain nu-\n",
+            "ances in meaning and from different syntactic re-\n",
+            "alizations. We discuss some of the problems raised\n",
+            "by this abstraction in this section.\n",
+            "Overgenerality An important problem stems\n",
+            "from the interpretation of LFs implied by their\n",
+            "use as an interlingua namely that the mean-\n",
+            "ing of the collocate in some ways reduces to the\n",
+            "meaning implied by the lexical function. This in-\n",
+            "terpretation is trouble-free if we assume that LFs\n",
+            "always deliver unique values; unfortunately cases\n",
+            "to the contrary can be readily observed. An exam-\n",
+            "ple attested from our corpus was the range of ad-\n",
+            "verbial constructions possible with the verbal head\n",
+            "oppose: adamantly, bitterly\n",
+            "...\n",
+            "...\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Ask a foundational model\n",
+        "\n",
+        "With the relevant context ready, you can now make a _prompt_ that includes both the context and the question."
+      ],
+      "metadata": {
+        "id": "5NB2BO0tSBFu"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "PROMPT = \"\"\"\\\n",
+        "CONTEXT:\n",
+        "{context}\n",
+        "\n",
+        "QUESTION:\n",
+        "{question}\n",
+        "\"\"\""
+      ],
+      "metadata": {
+        "id": "yawvx7yGhG7i"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This is the `text-bison`'s response."
+      ],
+      "metadata": {
+        "id": "qwmpl00eUohZ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from vertexai.language_models import TextGenerationModel\n",
+        "\n",
+        "# Ask the foundational model.\n",
+        "model = TextGenerationModel.from_pretrained('text-bison')\n",
+        "response = model.predict(PROMPT.format(context=context, question=question))\n",
+        "\n",
+        "print(f\"{question}\\n\")\n",
+        "response.text.strip()"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 157
+        },
+        "id": "pdUwysFwgqWU",
+        "outputId": "b83a0981-c86e-4af7-947f-0b59f96d3ea7"
+      },
+      "execution_count": 7,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "What are LFs and why are they useful?\n",
+            "\n"
+          ]
+        },
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "'Lexical Functions (LFs) are abstract representations of the meaning of words and phrases. They are useful because they allow us to represent the meaning of words and phrases in a way that is independent of any particular language. This makes it possible to translate words and phrases between languages without having to worry about the specific grammatical rules of each language.\\n\\nFor example, the LF for the word \"dog\" might be something like \"a four-legged mammal that barks\". This LF can be used to translate the word \"dog\" into any other language, regardless of the specific grammatical rules of that language.\\n\\nLFs are also useful for representing'"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "string"
+            }
+          },
+          "metadata": {},
+          "execution_count": 7
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## (Optional) Ask your tuned model\n",
+        "\n",
+        "If you tuned a model, provide your tuned model ID.\n",
+        "You can find it in the [Vertex AI Model Registry](https://console.cloud.google.com/vertex-ai/models) by clicking on your tuned model and then navigating to the \"Version details\".\n",
+        "\n",
+        "> 💡 The Model ID looks like a number, like `1234567890123456789`."
+      ],
+      "metadata": {
+        "id": "XyLNJ6fvXl1G"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title\n",
+        "tuned_model_id = \"\" # @param {type:\"string\"}"
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "OtnNmYxKXX8-"
+      },
+      "execution_count": 8,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This model was tuned with questions and answers generated with the [Gemini API](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/overview), so the responses will have a different tone."
+      ],
+      "metadata": {
+        "id": "LbKK0Gd8chGH"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Ask the tuned model.\n",
+        "model = TextGenerationModel.get_tuned_model(tuned_model_id)\n",
+        "response = model.predict(PROMPT.format(context=context, question=question))\n",
+        "\n",
+        "print(f\"{question}\\n\")\n",
+        "response.text.strip()"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 87
+        },
+        "id": "BRJEUb8Ag_Cp",
+        "outputId": "e147f8d5-a148-4699-aa4c-4f1a2c664ae2"
+      },
+      "execution_count": 9,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "What are LFs and why are they useful?\n",
+            "\n"
+          ]
+        },
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "'LFs (Lexical Functions) are abstract representations of the meaning of words and phrases. They are useful because they allow for a consistent and interlingual approach to the translation of collocations and other lexical phenomena.'"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "string"
+            }
+          },
+          "metadata": {},
+          "execution_count": 9
+        }
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "toc_visible": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
diff --git a/outputs.tf b/outputs.tf
index 189f34b..c77acfb 100644
--- a/outputs.tf
+++ b/outputs.tf
@@ -14,11 +14,16 @@
  * limitations under the License.
  */
 
-output "neos_walkthrough_url" {
-  value       = "https://console.cloud.google.com/products/solutions/deployments?walkthrough_id=panels--sic--document-knowledge-base-tour"
+output "neos_tutorial_url" {
+  value       = "https://console.cloud.google.com/products/solutions/deployments?walkthrough_id=panels--sic--generative-ai-knowledge-base_toc"
   description = "The URL to launch the in-console tutorial for the Generative AI Knowledge Base solution"
 }
 
+output "predictions_notebook_url" {
+  value       = "https://colab.research.google.com/github/GoogleCloudPlatform/terraform-genai-knowledge-base/blob/main/notebooks/model-predictions.ipynb"
+  description = "The URL to open the notebook for model predictions in Colab"
+}
+
 output "unique_id" {
   value       = random_id.unique_id.hex
   description = "The unique ID for this deployment"
diff --git a/webhook/main.py b/webhook/main.py
index 22146ce..1e2d51a 100644
--- a/webhook/main.py
+++ b/webhook/main.py
@@ -47,18 +47,11 @@
 """
 
 MODEL_INPUT_PROMPT = """\
-DOCUMENT:
-{text}
-----
-
-Please answer the following question given the provided document.
-
-Explain in simple terms.
+CONTEXT:
+{context}
 
 QUESTION:
 {question}
-
-ANSWER:
 """
 
 # Initialize Vertex AI client libraries.
@@ -326,7 +319,7 @@ def write_tuning_dataset(db: firestore.Client, output_bucket: str) -> int:
             entry = doc.to_dict() or {}
             line = {
                 "input_text": MODEL_INPUT_PROMPT.format(
-                    text=doc_pages[entry["filename"]][entry["page_number"]],
+                    context=doc_pages[entry["filename"]][entry["page_number"]],
                     question=entry["question"],
                 ),
                 "output_text": entry["answer"],