Skip to content

Commit

Permalink
Update src/routes/blogs/accelerating-phi-2/+page.svx
Browse files Browse the repository at this point in the history
Co-authored-by: Sophie Schoenmeyer <[email protected]>
  • Loading branch information
MaanavD and sophies927 authored Feb 27, 2024
1 parent 4d69445 commit 84415fc
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/routes/blogs/accelerating-phi-2/+page.svx
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Optimized CUDA performance for prompt throughput (i.e., the rate at which the mo

<img class="m-auto w50" src="./Phi2_Float16_PromptThroughput.png" alt="Phi2 float16 prompt throughput comparison">

Token generation throughput is the average throughput of the first 256 tokens generated. ONNX Runtime with float16 is **on average 6.6x faster** than torch.compile and as high as **18.55x** faster. It also performs **up to 1.64x** faster than Llama.cpp.
Token generation throughput is the average throughput of the first 256 tokens generated. ONNX Runtime with float16 is **on average 6.6x faster** than torch.compile and **as high as 18.55x** faster. It also performs **up to 1.64x** faster than Llama.cpp.
<img class="m-auto w50" src="./Phi2_Float16_TokenGenerationThroughput.png" alt="Phi2 float16 token generation throughput comparison">

### ORT gains with int4
Expand Down

0 comments on commit 84415fc

Please sign in to comment.