You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Granite MoE uses a 3D tensor to hold the expert weights, so GPTQModel does not work out of the box.
There are two options
module swap GraniteMoeParallelExperts to hold a ModuleList of Linears, then AutoGPTQ will be able to detect them and replace them with QuantLinears
write a custom gptq module that handles the GraniteMoeParallelExperts case
Either of the two approaches will solve both quant + inference paths. Option 1 should be easier than Option 2, but in some sense Option 2 should be more proper.
When doing option 2 we should be reusing code from the original gptq.
also it should be written generally, to not just only handle this particular GraniteMoeParallelExperts instance, but all cases with 3D tensors
The text was updated successfully, but these errors were encountered:
Granite MoE uses a 3D tensor to hold the expert weights, so GPTQModel does not work out of the box.
There are two options
GraniteMoeParallelExperts
to hold a ModuleList of Linears, then AutoGPTQ will be able to detect them and replace them withQuantLinears
GraniteMoeParallelExperts
caseEither of the two approaches will solve both quant + inference paths. Option 1 should be easier than Option 2, but in some sense Option 2 should be more proper.
When doing option 2 we should be reusing code from the original gptq.
GraniteMoeParallelExperts
instance, but all cases with 3D tensorsThe text was updated successfully, but these errors were encountered: