From ba05e2c29a7a94e765a4b838340e3a0cae005643 Mon Sep 17 00:00:00 2001
From: MaanavD <maanavdalal@microsoft.com>
Date: Fri, 17 Nov 2023 13:55:54 -0800
Subject: [PATCH] Small llama blog change, readme to contributing.

---
 README.md => CONTRIBUTING.md                       | 0
 src/routes/blogs/accelerating-llama-2/+page.svelte | 2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename README.md => CONTRIBUTING.md (100%)
diff --git a/README.md b/CONTRIBUTING.md
similarity index 100%
rename from README.md
rename to CONTRIBUTING.md
diff --git a/src/routes/blogs/accelerating-llama-2/+page.svelte b/src/routes/blogs/accelerating-llama-2/+page.svelte
index 0f5add02a8d4b..c5adea9cbd88b 100644
--- a/src/routes/blogs/accelerating-llama-2/+page.svelte
+++ b/src/routes/blogs/accelerating-llama-2/+page.svelte
@@ -137,7 +137,7 @@
 			shards the PyTorch model with FP16 precision into 4 partitions, converts each partition into ONNX
 			format, and then applies a new ONNX Runtime graph fusion on the converted ONNX model. The 70B
 			model has ~30 tokens per second throughput for token generation at batch size 1, and
-			end-to-end throughput starts at 30 ms for smaller sequence lengths with these optimizations.
+			end-to-end throughput starts at 30 tps for smaller sequence lengths with these optimizations.
 			You can find additional example scripts <a href="https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/llama/" class="text-blue-500">here</a>.
 		</p>