Skip to content

Commit

Permalink
Remove duplicate wording
Browse files Browse the repository at this point in the history
  • Loading branch information
kunal-vaishnavi committed Nov 15, 2023
1 parent 80d694e commit a6dc597
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/routes/blogs/accelerating-llama-2/+page.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@

<p class="mb-4">
ONNX Runtime supports multi-GPU inference to enable serving large models. Even in FP16 precision,
the LLaMA-2 70B model in FP16 precision requires 140GB. Loading the model requires multiple GPUs
the LLaMA-2 70B model requires 140GB. Loading the model requires multiple GPUs
for inference, even with a powerful NVIDIA A100 80GB GPU.
</p>

Expand Down

0 comments on commit a6dc597

Please sign in to comment.