From 5fd617acc0aacaf7cef981ce56caada105b23e3a Mon Sep 17 00:00:00 2001 From: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Date: Sun, 19 May 2024 22:51:46 -0700 Subject: [PATCH] Update link to E2E notebook in LLaMA-2 blog (#20724) ### Description This PR updates a reference link in the LLaMA-2 blog post and fixes a word formatting issue. ### Motivation and Context With these changes, the link to the example E2E notebook works again. --- src/routes/blogs/accelerating-llama-2/+page.svelte | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/routes/blogs/accelerating-llama-2/+page.svelte b/src/routes/blogs/accelerating-llama-2/+page.svelte index 9fd316c14c555..9e0999aff80dc 100644 --- a/src/routes/blogs/accelerating-llama-2/+page.svelte +++ b/src/routes/blogs/accelerating-llama-2/+page.svelte @@ -102,7 +102,7 @@ >batch size * (prompt length + token generation length) / wall-clock latency where wall-clock latency = the latency from running end-to-end and token generation length = 256 generated tokens. The E2E throughput is 2.4X more (13B) and 1.8X more (7B) when compared to - PyTorch compile. For higher batch size, sequence length like 16, 2048 pytorch eager times out, + PyTorch compile. For higher batch size, sequence length pairs such as (16, 2048), PyTorch eager times out, while ORT shows better performance than compile mode.
More details on these metrics can be found here.