Slow speed for some models. #33

BadisG · 2023-05-21T10:38:36Z

Hey,

I tried using this fork and I realized that the speed was really slow for some models that I was using
https://huggingface.co/reeducator/vicuna-13b-cocktail/tree/main

For vicuna-cocktail for example I get something like 2 tokens/s even though I easily reach 10 tokens/s on on ooba's webui.

Some other models (like raw llama 13b) gives me 7 tokens/s which is fine

I guess this has to do with the vicuna-cocktail not having being saved with the "save_pretrained" option? I don't know just trying to guess there.

Anyway, if you could look at that and try to get "normal" speed with every situation that would be cool

Thanks in advance.

0cc4m · 2023-05-21T11:58:03Z

When loading a model, it tells you the quantization version. Versions 0 and 2 are slow. 0 because it is old, 2 because upstream GPTQ prefers accuracy over speed. If you want fast models, use version 1. They usually show up on Hugginface as compatible with KoboldAI.

BadisG · 2023-05-21T12:05:05Z

When loading a model, it tells you the quantization version.

Oh yeah I have the Version 2

But still, even with those "slow" models I have I can get 10 tokens/s on ooba's webui, so it means there's a way to get the same speed on KoboldAI

BadisG · 2023-05-21T19:55:36Z

But still, even with those "slow" models I have I can get 10 tokens/s on ooba's webui, so it means there's a way to get the same speed on KoboldAI

If you can't achieve that, I have then 2 questions:

How do you make a "Version 1 GPTQ" when you decided to quantize a model?
Do you loose a lot of accuracy when using the version 1?

liquidsnakeblue · 2023-08-05T05:33:43Z

I've tried a few models and am seeing the same. 2 tk/s with this version of kobald ai (same speed as standard) and 10-12 tk/s with oogabooga same models using exllama.

Merge united.

pi6am added a commit that referenced this issue Aug 28, 2023

Merge pull request #33 from henk717/united

dda5acd

Merge united.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow speed for some models. #33

Slow speed for some models. #33

BadisG commented May 21, 2023

0cc4m commented May 21, 2023

BadisG commented May 21, 2023 •

edited

Loading

BadisG commented May 21, 2023

liquidsnakeblue commented Aug 5, 2023 •

edited

Loading

Slow speed for some models. #33

Slow speed for some models. #33

Comments

BadisG commented May 21, 2023

0cc4m commented May 21, 2023

BadisG commented May 21, 2023 • edited Loading

BadisG commented May 21, 2023

liquidsnakeblue commented Aug 5, 2023 • edited Loading

BadisG commented May 21, 2023 •

edited

Loading

liquidsnakeblue commented Aug 5, 2023 •

edited

Loading