Replies: 1 comment 1 reply
-
Thanks for testing out! A bug fix related to ouput token length has been cherrypicked to 0.9 release, thus improving the latency. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey, just wanted to leave some feedback.
I have been testing StarCoder2-7b and 15b Q8_0 on a 7900XTX
RC4 was extremely slow responding and usually stalled on both models.
RC5 is much faster and more responsive, it's actually usable, still if it could be a bit faster it would be great.
I am using ROCm 6.0 on ubuntu 22.04
7950X
7900XTX
StarCoder2-15b-Q8_0 is using 18023MiB
hopefully you can squeeze more performance out of it.
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions