webnn: Support block-wise quantization for DirectML backend #49083

chromium-wpt-export-bot · 2024-11-09T00:51:17Z

Block-wise quantization divides input tensors into smaller blocks that
are independently quantized, resulting in faster optimization and high
precision quantization 1. It is used for popular language models,
such as phi-3 mini int4 quantized model 2. Related WG issue 3 has
been opened to discussion.

Firstly, this CL validates scale and zero point tensors for block-wise
quantization. Besides, this CL also implements the block-wise
quantization in DirectML backend by using DML_OPERATOR_QUANTIZE and
DML_OPERATOR_DEQUANTIZE which are available in FL >= 6.3.

More validation and conformance tests are added to verify the
implementation.

Bug: 40206287
Change-Id: I977b0be57deebd7afcae216edc3ddc3818b8c09f
Cq-Include-Trybots: luci.chromium.try:mac14.arm64-blink-rel, mac14-blink-rel, mac15.arm64-blink-rel, mac15-blink-rel, linux-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5964816
Reviewed-by: Rafael Cintron <[email protected]>
Reviewed-by: ningxin hu <[email protected]>
Commit-Queue: ningxin hu <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1380767}

Block-wise quantization divides input tensors into smaller blocks that are independently quantized, resulting in faster optimization and high precision quantization [1]. It is used for popular language models, such as phi-3 mini int4 quantized model [2]. Related WG issue [3] has been opened to discussion. Firstly, this CL validates scale and zero point tensors for block-wise quantization. Besides, this CL also implements the block-wise quantization in DirectML backend by using DML_OPERATOR_QUANTIZE and DML_OPERATOR_DEQUANTIZE which are available in FL >= 6.3. More validation and conformance tests are added to verify the implementation. [1]: https://arxiv.org/abs/2110.02861 [2]: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct [3]: webmachinelearning/webnn#779 Bug: 40206287 Change-Id: I977b0be57deebd7afcae216edc3ddc3818b8c09f Cq-Include-Trybots: luci.chromium.try:mac14.arm64-blink-rel, mac14-blink-rel, mac15.arm64-blink-rel, mac15-blink-rel, linux-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5964816 Reviewed-by: Rafael Cintron <[email protected]> Reviewed-by: ningxin hu <[email protected]> Commit-Queue: ningxin hu <[email protected]> Cr-Commit-Position: refs/heads/main@{#1380767}

wpt-pr-bot

The review process for this patch is being conducted in the Chromium project.

chromium-wpt-export-bot added chromium-export do not merge yet labels Nov 9, 2024

chromium-wpt-export-bot marked this pull request as ready for review November 9, 2024 02:17

chromium-wpt-export-bot force-pushed the chromium-export-cl-5964816 branch from dacbde4 to 34684e6 Compare November 9, 2024 02:17

chromium-wpt-export-bot removed the do not merge yet label Nov 9, 2024

wpt-pr-bot added webnn wg-s_webmachinelearning labels Nov 9, 2024

wpt-pr-bot approved these changes Nov 9, 2024

View reviewed changes

chromium-wpt-export-bot merged commit 8686b7a into master Nov 9, 2024
20 checks passed

chromium-wpt-export-bot deleted the chromium-export-cl-5964816 branch November 9, 2024 02:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webnn: Support block-wise quantization for DirectML backend #49083

webnn: Support block-wise quantization for DirectML backend #49083

chromium-wpt-export-bot commented Nov 9, 2024 •

edited

Loading

wpt-pr-bot left a comment

webnn: Support block-wise quantization for DirectML backend #49083

webnn: Support block-wise quantization for DirectML backend #49083

Conversation

chromium-wpt-export-bot commented Nov 9, 2024 • edited Loading

wpt-pr-bot left a comment

Choose a reason for hiding this comment

chromium-wpt-export-bot commented Nov 9, 2024 •

edited

Loading