diff --git a/src/routes/blogs/accelerating-llama-2/+page.svelte b/src/routes/blogs/accelerating-llama-2/+page.svelte index 9fd316c14c555..9e0999aff80dc 100644 --- a/src/routes/blogs/accelerating-llama-2/+page.svelte +++ b/src/routes/blogs/accelerating-llama-2/+page.svelte @@ -102,7 +102,7 @@ >batch size * (prompt length + token generation length) / wall-clock latency where wall-clock latency = the latency from running end-to-end and token generation length = 256 generated tokens. The E2E throughput is 2.4X more (13B) and 1.8X more (7B) when compared to - PyTorch compile. For higher batch size, sequence length like 16, 2048 pytorch eager times out, + PyTorch compile. For higher batch size, sequence length pairs such as (16, 2048), PyTorch eager times out, while ORT shows better performance than compile mode.

@@ -151,7 +151,7 @@

More details on these metrics can be found here.