From 5fd617acc0aacaf7cef981ce56caada105b23e3a Mon Sep 17 00:00:00 2001
From: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Date: Sun, 19 May 2024 22:51:46 -0700
Subject: [PATCH] Update link to E2E notebook in LLaMA-2 blog (#20724)

### Description
This PR updates a reference link in the LLaMA-2 blog post and fixes a
word formatting issue.

### Motivation and Context
With these changes, the link to the example E2E notebook works again.
---
 src/routes/blogs/accelerating-llama-2/+page.svelte | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/routes/blogs/accelerating-llama-2/+page.svelte b/src/routes/blogs/accelerating-llama-2/+page.svelte
index 9fd316c14c555..9e0999aff80dc 100644
--- a/src/routes/blogs/accelerating-llama-2/+page.svelte
+++ b/src/routes/blogs/accelerating-llama-2/+page.svelte
@@ -102,7 +102,7 @@
 				>batch size * (prompt length + token generation length) / wall-clock latency</i
 			> where wall-clock latency = the latency from running end-to-end and token generation length =
 			256 generated tokens. The E2E throughput is 2.4X more (13B) and 1.8X more (7B) when compared to
-			PyTorch compile. For higher batch size, sequence length like 16, 2048 pytorch eager times out,
+			PyTorch compile. For higher batch size, sequence length pairs such as (16, 2048), PyTorch eager times out,
 			while ORT shows better performance than compile mode.
 		</p>
 		<div class="grid grid-cols-1 lg:grid-cols-2 gap-4">
@@ -151,7 +151,7 @@
 
 		<p class="mb-4">
 			More details on these metrics can be found <a
-				href="https://github.com/microsoft/onnxruntime-inference-examples/blob/main/python/models/llama2/README.md"
+				href="https://github.com/microsoft/onnxruntime-inference-examples/blob/main/python/models/llama/README.md"
 				class="text-blue-500">here</a
 			>.
 		</p>