diff --git a/src/routes/blogs/accelerating-llama-2/+page.svelte b/src/routes/blogs/accelerating-llama-2/+page.svelte index 9fd316c14c555..9e0999aff80dc 100644 --- a/src/routes/blogs/accelerating-llama-2/+page.svelte +++ b/src/routes/blogs/accelerating-llama-2/+page.svelte @@ -102,7 +102,7 @@ >batch size * (prompt length + token generation length) / wall-clock latency where wall-clock latency = the latency from running end-to-end and token generation length = 256 generated tokens. The E2E throughput is 2.4X more (13B) and 1.8X more (7B) when compared to - PyTorch compile. For higher batch size, sequence length like 16, 2048 pytorch eager times out, + PyTorch compile. For higher batch size, sequence length pairs such as (16, 2048), PyTorch eager times out, while ORT shows better performance than compile mode.
More details on these metrics can be found here.