⚖️ Suprising performances : `gemma2:8b` 1.40 times faster than `gemma2:2b` on `ollama` 🦙 #372

adriens · 2024-08-03T03:15:48Z

❔ Context

As mentionned earlier :

feat: 🎁 🔀 Semantic Router ollama/gemma2 hotline #346

A gave a try to semantic-router and got really impressive results, see 🔀 Semantic Router w. ollama/gemma2 : real life 10ms hotline challenge 🤯 .

... but recently gemma2:2b has been released, then I switched to this model, with the hope that :

It should be faster
Be as good

... but surprinsngly it did as good, but slower.

👉 The goal of this issue is to understand why... and what could be done to make semantic router run even faster than 10ms.

⚖️ Data

Considering the following runs :

Below some performances, both with same output quality :

Cell N°	`gemma2:2b`	`gemma2:8b`
13	44.2 ms	33 ms
14	16 ms	12.7 ms
15	15.9 ms	12 ms
16	17 ms	12 ms
17	15.9 ms	11.8 ms
18	21.4 ms	11.4 ms
19	16.3 ms	12.6 ms
20	15.6 ms	11.6 ms

ℹ️ On each test, the 8b is faster than the 2b... and it's surprising:

📊 Benchmark conclusion

The average speed-up factor of gemma2:8b compared to gemma2:2b is approximately 1.40. This means that, on average, gemma2:8b is 1.40 times faster than gemma2:2b.

👉 Do you get the same performances ?

The text was updated successfully, but these errors were encountered:

adriens · 2024-08-03T03:24:57Z

According to your feedbacks, I'll push an issue to ollama

adriens changed the title ~~⚖️ Suprising performances : gemma2:8b alsways faster than gemma2:2b~~ ⚖️ Suprising performances : gemma2:8b 1.40 times faster than gemma2:2b on ollama 🦙 Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚖️ Suprising performances : `gemma2:8b` 1.40 times faster than `gemma2:2b` on `ollama` 🦙 #372

⚖️ Suprising performances : `gemma2:8b` 1.40 times faster than `gemma2:2b` on `ollama` 🦙 #372

adriens commented Aug 3, 2024 •

edited

Loading

adriens commented Aug 3, 2024

⚖️ Suprising performances : gemma2:8b 1.40 times faster than gemma2:2b on ollama 🦙 #372

⚖️ Suprising performances : gemma2:8b 1.40 times faster than gemma2:2b on ollama 🦙 #372

Comments

adriens commented Aug 3, 2024 • edited Loading

❔ Context

⚖️ Data

📊 Benchmark conclusion

adriens commented Aug 3, 2024

⚖️ Suprising performances : `gemma2:8b` 1.40 times faster than `gemma2:2b` on `ollama` 🦙 #372

⚖️ Suprising performances : `gemma2:8b` 1.40 times faster than `gemma2:2b` on `ollama` 🦙 #372

adriens commented Aug 3, 2024 •

edited

Loading