Added a tool to quantize Gather to GatherBlockQuantized (#21697) · microsoft/onnxruntime@64674c5

Commit

Added a tool to quantize Gather to GatherBlockQuantized (#21697)

### Description
Added code in MatMul4BitsQuantizer to quantize Gather to
GatherBlockQuantized.

Only Gather with constant data is quantized.

Since quantized data is in int4, the quantized model will force upgrade
to onnx opset 21.

The implementation purely relies on numpy. If optimization is needed,
C++ kernels can be added later.

Only support default RTN algorithm since GatherBlockQuantized require
zero points to have the same type as quantized data.

### Motivation and Context
Support quantizing gather to int4 in Web scenario.

Loading branch information

fajin-corp authored Aug 19, 2024

1 parent 7ae0b4c commit 64674c5

0 comments on commit `64674c5`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `64674c5`

Commit

There are no files selected for viewing

0 comments on commit 64674c5

0 comments on commit `64674c5`