Replies: 8 comments
-
Hey - the naming convention of "q8_0" is primarily due to legacy reasons - it doesn't necessarily mean that only The approach you took with the registry is actually the recommended method for loading different model checkpoints. Feel free to name your model something like Related discussion: #1398 |
Beta Was this translation helpful? Give feedback.
-
You should just add it do documentation and close this, it's not really an issue, I ran FP16, Q6_K, Q5_M all using the same q8_0.v2.gguf file name without any issue. |
Beta Was this translation helpful? Give feedback.
-
Any updates about it? |
Beta Was this translation helpful? Give feedback.
-
@Mte90 you can make your own registry with models with different quantization. the name of the file has no relevance on the model ability to run with Q6 or Q8. Check my registry-tabby project has an example. |
Beta Was this translation helpful? Give feedback.
-
It isn't very handy create a custom registry just to try models. I was thinking to download it manually and rename/put the file inside the .tabby specific model folder to simplify. |
Beta Was this translation helpful? Give feedback.
-
that works to, I did that a lot, still you need to have the prompt_template well configured. |
Beta Was this translation helpful? Give feedback.
-
the problem is that if I put a single file for a tabby model that instead is split in the registry automatically delete it and download again the q8. |
Beta Was this translation helpful? Give feedback.
-
So I did in that way:
in this way downloads the models without doing a new registry somewhere :-) |
Beta Was this translation helpful? Give feedback.
-
Please describe the feature you want
Currently it appears that tabby internally assumes that all models are using Q8 quantization, but that appears to not be a requirement. I forked the registry and modifying a Q8 download to instead download a a Q4_K_M download of the deepseek 6.7B model as I needed a smaller usage of RAM so I could run on my nvidia 2080 SUPER.
Tabby still downloads the model to a file name
q8_0.v2.gguf
but we see the sha256sum does match the Q4_K_M.gguf that I overloaded in my fork of registry-tabby.Once I performed this "override" via my fork of registry-tabby, I was able to load the model without issue as llama-cpp doesn't require that we use only Q8 models.
I think this would probably require an additional field to the registry-tabby json structure that would allow tabby to map the model file to a different filename; in addition we couldn't hardcode the model filename as has been done here.
Implementation details aside, my main point is that llama-cpp supports loading
ggmlgguf models other than Q8 and it would be nice if tabby supported this without the ugly registry hack that I've doneAdditional context
Please reply with a 👍 if you want this feature.
Beta Was this translation helpful? Give feedback.
All reactions