Enable GPTQModel to handle GraniteMoeParallelExperts #112

fabianlim · 2024-11-20T02:31:04Z

Granite MoE uses a 3D tensor to hold the expert weights, so GPTQModel does not work out of the box.

There are two options

module swap GraniteMoeParallelExperts to hold a ModuleList of Linears, then AutoGPTQ will be able to detect them and replace them with QuantLinears
write a custom gptq module that handles the GraniteMoeParallelExperts case

Either of the two approaches will solve both quant + inference paths. Option 1 should be easier than Option 2, but in some sense Option 2 should be more proper.

When doing option 2 we should be reusing code from the original gptq.

also it should be written generally, to not just only handle this particular GraniteMoeParallelExperts instance, but all cases with 3D tensors

The text was updated successfully, but these errors were encountered:

fabianlim added the help wanted Extra attention is needed label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable GPTQModel to handle GraniteMoeParallelExperts #112

Enable GPTQModel to handle GraniteMoeParallelExperts #112

fabianlim commented Nov 20, 2024 •

edited

Loading

Enable GPTQModel to handle GraniteMoeParallelExperts #112

Enable GPTQModel to handle GraniteMoeParallelExperts #112

Comments

fabianlim commented Nov 20, 2024 • edited Loading

fabianlim commented Nov 20, 2024 •

edited

Loading