-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nnpackage] Define block quantization type on circle format #13743
Comments
Below is schema draft to represent ggml's quantization type (block quntization)
It introduces new Below is #define QK4_0 32
typedef struct {
ggml_half d; // delta
uint8_t qs[QK4_0 / 2]; // nibbles / quants
} block_q4_0;
static_assert(sizeof(block_q4_0) == sizeof(ggml_half) + QK4_0 / 2, "wrong q4_0 block size/padding"); ( Addition:
|
How about adding a new dtype (QK4_0, etc) rather than extending Why?
|
@jinevening I've updated circle schema based on your comment
There is no issue on runtime to use this spec. |
|
No. I'll update to use negative value. |
Updated
|
negative value items are placed in the back.. does generated header code have no problem? |
No problem. I checked generated header code. |
If there is no more opinion, I'll update generated header file for runtime first ( |
I've found @jinevening's suggestion now. I think we need prefix before (ADD) I've found the comment on
It would be better to move the comment immediately before However, if others are ok, I don't oppose. |
What do you mean by Assuming you mean
I think using |
I think we agree to use new And maybe it will be ok to change enum name after release because name is used for print out only. |
It's about existing cpp class in |
Schema is updated. |
What?
Let's support block quantization data type on circle format to support LLM model.
Why?
To support LLM model, we need to support small size weight quantization with small precision loss.
So we need to introduce chunk quantization such as ggml (llama.cpp) 's quantization type.
To represent this, we need to expand circle schema's
QuantizationParameters
table or/andQuantizationDetails
union.Related issue: #13742
The text was updated successfully, but these errors were encountered: