From 5fd617acc0aacaf7cef981ce56caada105b23e3a Mon Sep 17 00:00:00 2001 From: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Date: Sun, 19 May 2024 22:51:46 -0700 Subject: [PATCH] Update link to E2E notebook in LLaMA-2 blog (#20724) ### Description This PR updates a reference link in the LLaMA-2 blog post and fixes a word formatting issue. ### Motivation and Context With these changes, the link to the example E2E notebook works again. --- src/routes/blogs/accelerating-llama-2/+page.svelte | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/routes/blogs/accelerating-llama-2/+page.svelte b/src/routes/blogs/accelerating-llama-2/+page.svelte index 9fd316c14c555..9e0999aff80dc 100644 --- a/src/routes/blogs/accelerating-llama-2/+page.svelte +++ b/src/routes/blogs/accelerating-llama-2/+page.svelte @@ -102,7 +102,7 @@ >batch size * (prompt length + token generation length) / wall-clock latency where wall-clock latency = the latency from running end-to-end and token generation length = 256 generated tokens. The E2E throughput is 2.4X more (13B) and 1.8X more (7B) when compared to - PyTorch compile. For higher batch size, sequence length like 16, 2048 pytorch eager times out, + PyTorch compile. For higher batch size, sequence length pairs such as (16, 2048), PyTorch eager times out, while ORT shows better performance than compile mode.

@@ -151,7 +151,7 @@

More details on these metrics can be found here.