Augment blockwise quantization #18101

chenfucn · 2023-10-25T18:49:59Z

Description

Augment block wise 4b quantization -- plain CPU impl

Motivation and Context

Allow column wise or row wise blocks. Experiments show row wise quantization in LLM weight matrices achieves better precision.

Tests for quantization and dequantization code.

onnxruntime/test/mlas/unittest/test_blockq4.cpp

onnxruntime/core/mlas/lib/q4_dq.cpp

yufenglee

onnxruntime/core/mlas/lib/q4_dq.cpp

onnxruntime/test/mlas/unittest/test_blockq4.cpp

onnxruntime/core/mlas/lib/q4_dq.cpp

onnxruntime/core/mlas/inc/mlas_q4.h

### Description Replace block-wise 4b quantization implementation ### Motivation and Context In #18101 we have an augmented block-wise 4b quantization interface and implementation. Here we use this new implementation in onnxruntime contrib ops --------- Co-authored-by: Edward Chen <[email protected]>

### Description Augment block wise 4b quantization -- plain CPU impl ### Motivation and Context Allow column wise or row wise blocks. Experiments show row wise quantization in LLM weight matrices achieves better precision. Added tests for quantization and dequantization code.

### Description Replace block-wise 4b quantization implementation ### Motivation and Context In microsoft#18101 we have an augmented block-wise 4b quantization interface and implementation. Here we use this new implementation in onnxruntime contrib ops --------- Co-authored-by: Edward Chen <[email protected]>

chenfucn requested a review from a team as a code owner October 25, 2023 18:50

github-advanced-security bot found potential problems Oct 25, 2023

View reviewed changes

onnxruntime/test/mlas/unittest/test_blockq4.cpp Fixed Show fixed Hide fixed

Augment blockwise quantization

7c8b095

chenfucn force-pushed the cfu_blkq branch from b5d03d4 to 7c8b095 Compare October 25, 2023 19:49

yufenglee reviewed Oct 25, 2023

View reviewed changes

onnxruntime/core/mlas/lib/q4_dq.cpp Show resolved Hide resolved

yufenglee reviewed Oct 25, 2023

View reviewed changes

onnxruntime/core/mlas/lib/q4_dq.cpp Show resolved Hide resolved

chenfucn added 4 commits October 25, 2023 15:27

fix compiler warnings

6662a31

relax quantized shape to whole blocks

1980500

Towards 2d block specification

d54bd43

fix annoying tabs added by copilot

7b5aa66

yufenglee reviewed Oct 26, 2023

View reviewed changes

onnxruntime/core/mlas/lib/q4_dq.cpp Show resolved Hide resolved

edgchen1 mentioned this pull request Oct 26, 2023

MLAS AArch64 quantized int4 Gemm kernel #18031

Merged

yufenglee previously approved these changes Oct 26, 2023

View reviewed changes

edgchen1 reviewed Oct 27, 2023

View reviewed changes

chenfucn dismissed yufenglee’s stale review via 463cd5f October 27, 2023 21:41

chenfucn force-pushed the cfu_blkq branch from bc7d93c to 7b5aa66 Compare October 27, 2023 22:21

chenfucn added 3 commits October 27, 2023 15:39

address comments

b22e689

function interface

f1b801a

lint complain about empty lines

93af026

edgchen1 reviewed Oct 27, 2023

View reviewed changes

onnxruntime/core/mlas/inc/mlas_q4.h Outdated Show resolved Hide resolved

edgchen1 previously approved these changes Oct 27, 2023

View reviewed changes

spelling

8df8f4e

chenfucn dismissed edgchen1’s stale review via 8df8f4e October 29, 2023 00:19

yufenglee approved these changes Oct 30, 2023

View reviewed changes

chenfucn merged commit 4819fbf into microsoft:main Oct 30, 2023
79 of 84 checks passed

chenfucn deleted the cfu_blkq branch October 30, 2023 16:14

chenfucn mentioned this pull request Oct 30, 2023

Block-wise 4b quantization matmul operator change #18172

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Augment blockwise quantization #18101

Augment blockwise quantization #18101

chenfucn commented Oct 25, 2023

yufenglee left a comment

Augment blockwise quantization #18101

Augment blockwise quantization #18101

Conversation

chenfucn commented Oct 25, 2023

Description

Motivation and Context

yufenglee left a comment

Choose a reason for hiding this comment