Skip to content

Commit

Permalink
Bug 1930286 [wpt PR 49083] - webnn: Support block-wise quantization f…
Browse files Browse the repository at this point in the history
…or DirectML backend, a=testonly

Automatic update from web-platform-tests
webnn: Support block-wise quantization for DirectML backend

Block-wise quantization divides input tensors into smaller blocks that
are independently quantized, resulting in faster optimization and high
precision quantization [1]. It is used for popular language models,
such as phi-3 mini int4 quantized model [2]. Related WG issue [3] has
been opened to discussion.

Firstly, this CL validates scale and zero point tensors for block-wise
quantization. Besides, this CL also implements the block-wise
quantization in DirectML backend by using DML_OPERATOR_QUANTIZE and
DML_OPERATOR_DEQUANTIZE which are available in FL >= 6.3.

More validation and conformance tests are added to verify the
implementation.

[1]: https://arxiv.org/abs/2110.02861
[2]: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
[3]: webmachinelearning/webnn#779

Bug: 40206287
Change-Id: I977b0be57deebd7afcae216edc3ddc3818b8c09f
Cq-Include-Trybots: luci.chromium.try​:mac14.arm64-blink-rel, mac14-blink-rel, mac15.arm64-blink-rel, mac15-blink-rel, linux-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5964816
Reviewed-by: Rafael Cintron <rafael.cintronmicrosoft.com>
Reviewed-by: ningxin hu <ningxin.huintel.com>
Commit-Queue: ningxin hu <ningxin.huintel.com>
Cr-Commit-Position: refs/heads/main{#1380767}

--

wpt-commits: 8686b7a6d288d3b2c22b5ddb5a21773619b22b85
wpt-pr: 49083

UltraBlame original commit: 6b8a19bf1f5562bfae60549575af9c2b422b4975
  • Loading branch information
marco-c committed Nov 16, 2024
1 parent 9ae305d commit 5732af4
Show file tree
Hide file tree
Showing 4 changed files with 2,350 additions and 241 deletions.
Loading

0 comments on commit 5732af4

Please sign in to comment.