You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Below some performances, both with same output quality :
Cell NΒ°
gemma2:2b
gemma2:8b
13
44.2 ms
33 ms
14
16 ms
12.7 ms
15
15.9 ms
12 ms
16
17 ms
12 ms
17
15.9 ms
11.8 ms
18
21.4 ms
11.4 ms
19
16.3 ms
12.6 ms
20
15.6 ms
11.6 ms
βΉοΈ On each test, the 8b is faster than the 2b... and it's surprising:
π Benchmark conclusion
The average speed-up factor of gemma2:8b compared to gemma2:2b is approximately 1.40. This means that, on average, gemma2:8b is 1.40 times faster than gemma2:2b.
π Do you get the same performances ?
The text was updated successfully, but these errors were encountered:
adriens
changed the title
βοΈ Suprising performances : gemma2:8b alsways faster than gemma2:2b
βοΈ Suprising performances : gemma2:8b 1.40 times faster than gemma2:2b on ollama π¦
Aug 3, 2024
β Context
As mentionned earlier :
A gave a try to semantic-router and got really impressive results, see π Semantic Router w. ollama/gemma2 : real life 10ms hotline challenge π€― .
... but recently
gemma2:2b
has been released, then I switched to this model, with the hope that :... but surprinsngly it did as good, but slower.
π The goal of this issue is to understand why... and what could be done to make semantic router run even faster than 10ms.
βοΈ Data
Considering the following runs :
gemma2:2b
gemma2:8b
Below some performances, both with same output quality :
gemma2:2b
gemma2:8b
βΉοΈ On each test, the 8b is faster than the 2b... and it's surprising:
π Benchmark conclusion
The average speed-up factor of gemma2:8b compared to gemma2:2b is approximately 1.40. This means that, on average, gemma2:8b is 1.40 times faster than gemma2:2b.
π Do you get the same performances ?
The text was updated successfully, but these errors were encountered: