Skip to content

Commit

Permalink
Small llama blog change, readme to contributing.
Browse files Browse the repository at this point in the history
  • Loading branch information
MaanavD committed Nov 17, 2023
1 parent 0ce38ea commit ba05e2c
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 1 deletion.
File renamed without changes.
2 changes: 1 addition & 1 deletion src/routes/blogs/accelerating-llama-2/+page.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@
shards the PyTorch model with FP16 precision into 4 partitions, converts each partition into ONNX
format, and then applies a new ONNX Runtime graph fusion on the converted ONNX model. The 70B
model has ~30 tokens per second throughput for token generation at batch size 1, and
end-to-end throughput starts at 30 ms for smaller sequence lengths with these optimizations.
end-to-end throughput starts at 30 tps for smaller sequence lengths with these optimizations.
You can find additional example scripts <a href="https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/llama/" class="text-blue-500">here</a>.
</p>

Expand Down

0 comments on commit ba05e2c

Please sign in to comment.