No measurable difference for Lama 3 model on L4 GPU. #1824

jmnarloch · 2024-06-22T22:39:35Z

jmnarloch
Jun 22, 2024

I run a benchmark of Lama 3 8B Instruct model running on two setups first vllm and second TensorRT and in both cases I was able to get almost exactly same generated throughput of 16.4 tokens/sec. TensorRT did not have any significant impact on the performance. I found that surprising as L4 GPU are base on the same Ada Lovelace architecture. Is this expected outcome?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No measurable difference for Lama 3 model on L4 GPU. #1824

{{title}}

Replies: 0 comments

Select a reply

No measurable difference for Lama 3 model on L4 GPU. #1824

jmnarloch Jun 22, 2024

Replies: 0 comments

jmnarloch
Jun 22, 2024