diff --git a/src/images/blogs/accelerating-llama-2/Figure1-LLaMA-2-13B-E2E-Throughput.png b/src/images/blogs/accelerating-llama-2/Figure1-LLaMA-2-13B-E2E-Throughput.png index c1da7555eaded..6588146b16ad3 100644 Binary files a/src/images/blogs/accelerating-llama-2/Figure1-LLaMA-2-13B-E2E-Throughput.png and b/src/images/blogs/accelerating-llama-2/Figure1-LLaMA-2-13B-E2E-Throughput.png differ diff --git a/src/images/blogs/accelerating-llama-2/Figure1-LLaMA-2-7B-E2E-Throughput.png b/src/images/blogs/accelerating-llama-2/Figure1-LLaMA-2-7B-E2E-Throughput.png index e777913981c60..79c2efbe634db 100644 Binary files a/src/images/blogs/accelerating-llama-2/Figure1-LLaMA-2-7B-E2E-Throughput.png and b/src/images/blogs/accelerating-llama-2/Figure1-LLaMA-2-7B-E2E-Throughput.png differ diff --git a/src/images/blogs/accelerating-llama-2/Figure2-LLaMA-2-13B-Prompt-Latency.png b/src/images/blogs/accelerating-llama-2/Figure2-LLaMA-2-13B-Prompt-Latency.png index 9b46047d211bb..3bbc358d1486f 100644 Binary files a/src/images/blogs/accelerating-llama-2/Figure2-LLaMA-2-13B-Prompt-Latency.png and b/src/images/blogs/accelerating-llama-2/Figure2-LLaMA-2-13B-Prompt-Latency.png differ diff --git a/src/images/blogs/accelerating-llama-2/Figure2-LLaMA-2-7B-Prompt-Latency.png b/src/images/blogs/accelerating-llama-2/Figure2-LLaMA-2-7B-Prompt-Latency.png index b950cff87f885..996796482ba2f 100644 Binary files a/src/images/blogs/accelerating-llama-2/Figure2-LLaMA-2-7B-Prompt-Latency.png and b/src/images/blogs/accelerating-llama-2/Figure2-LLaMA-2-7B-Prompt-Latency.png differ diff --git a/src/images/blogs/accelerating-llama-2/Figure3-LLaMA-2-13B-Tokens-Generated-Throughput.png b/src/images/blogs/accelerating-llama-2/Figure3-LLaMA-2-13B-Tokens-Generated-Throughput.png index cfcc3016ad794..ae1771bd5e13e 100644 Binary files a/src/images/blogs/accelerating-llama-2/Figure3-LLaMA-2-13B-Tokens-Generated-Throughput.png and b/src/images/blogs/accelerating-llama-2/Figure3-LLaMA-2-13B-Tokens-Generated-Throughput.png differ diff --git a/src/images/blogs/accelerating-llama-2/Figure3-LLaMA-2-7B-Tokens-Generated-Throughput.png b/src/images/blogs/accelerating-llama-2/Figure3-LLaMA-2-7B-Tokens-Generated-Throughput.png index 9ba1c57c5b6b7..72dd1f6995cea 100644 Binary files a/src/images/blogs/accelerating-llama-2/Figure3-LLaMA-2-7B-Tokens-Generated-Throughput.png and b/src/images/blogs/accelerating-llama-2/Figure3-LLaMA-2-7B-Tokens-Generated-Throughput.png differ diff --git a/src/routes/blogs/accelerating-llama-2/+page.svelte b/src/routes/blogs/accelerating-llama-2/+page.svelte index e8bec9b4fc904..3164ebbf2c007 100644 --- a/src/routes/blogs/accelerating-llama-2/+page.svelte +++ b/src/routes/blogs/accelerating-llama-2/+page.svelte @@ -31,15 +31,15 @@ /> - + - + - +
@@ -133,7 +133,7 @@

Token generation throughput below is the average throughput of the first 256 tokens generated. - We see up to ~1.4X (7B) and ~1.7X (13B) gains in token generation throughput when compared to + We see up to ~1.3X (7B) and ~1.5X (13B) gains in token generation throughput when compared to PyTorch compile mode.