[js/webgpu] optimize MatmulNBits #21747

qjia7 · 2024-08-15T08:08:33Z

Description

See 2x speedup for phi3 on the integrated intel gpu with this optimization.

The optimization is mainly to store input A's data into local variable instead of loading them from global memory each time when calculate them with B data.

Motivation and Context

qjia7 · 2024-08-15T08:29:49Z

cc @guschmue FYI Not ready for review yet since I need to make this PR more comprehensive to support outputNumber > 1.

qjia7 · 2024-08-16T06:17:58Z

@guschmue @fs-eire @satyajandhyala Please take a look, thanks.

fs-eire · 2024-08-16T06:22:35Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

fs-eire · 2024-08-16T06:22:37Z

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline

fs-eire · 2024-08-16T06:22:38Z

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

azure-pipelines · 2024-08-16T06:22:50Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2024-08-16T06:22:50Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2024-08-16T06:22:52Z

Azure Pipelines successfully started running 1 pipeline(s).

fs-eire · 2024-08-18T09:52:34Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

fs-eire · 2024-08-18T09:52:36Z

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline

fs-eire · 2024-08-18T09:52:38Z

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

azure-pipelines · 2024-08-18T09:52:50Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2024-08-18T09:52:51Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2024-08-18T09:52:52Z

Azure Pipelines successfully started running 1 pipeline(s).

qjia7 · 2024-08-19T07:28:41Z

@guschmue @fs-eire @satyajandhyala The code is ready for review. But I see three bots are failed ONNX Runtime React Native CI Pipeline, ONNX Runtime React Native CI Pipeline (React Native CI ReactNative_CI), ONNX Runtime Web CI Pipeline (Test_web_MultiBrowsers build_onnxruntime_web_windows) in this PR and another PR #19388. I think the errors are not related with my changes. Please let me know if you know the possible reasons. Thanks.

guschmue · 2024-08-19T16:40:36Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

guschmue · 2024-08-19T16:40:44Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

azure-pipelines · 2024-08-19T16:40:49Z

Azure Pipelines successfully started running 1 pipeline(s).

guschmue · 2024-08-19T16:40:51Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2024-08-19T16:40:54Z

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

azure-pipelines · 2024-08-19T16:41:05Z

Azure Pipelines successfully started running 1 pipeline(s).

guschmue · 2024-08-19T23:16:55Z

sure going the right direction - I see 2x on Xe

guschmue · 2024-08-19T23:27:53Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2024-08-19T23:28:00Z

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

satyajandhyala · 2024-08-20T01:18:30Z

Is it possible to add improvement into existing code rather than writing new function? I understand refactoring is more demanding. However less code is easier to maintain.

qjia7 · 2024-08-20T01:40:02Z

Is it possible to add improvement into existing code rather than writing new function? I understand refactoring is more demanding. However less code is easier to maintain.

Yes, my final target is to only keep one blockwise program once the new added one supports all features, like zeroPoint as input, remove extra limitations, like nBlocksPerCol < maxComputeWorkgroupSizes[0]. To speedup the progress, I use a new program instead of modifying the current code. In my local, I am also experimenting other optimization ways to see the result. If everything goes well, I will do the refactoring or submit a new optimization way in the following days. Thanks.

qjia7 · 2024-08-22T07:44:14Z

@satyajandhyala The refactor is done. Currently, only one version is provided for MatmulNBits. All limitations are removed.

js/web/lib/wasm/jsep/webgpu/ops/matmulnbits.ts

satyajandhyala · 2024-08-22T22:35:50Z

Does the pr yields 2x perf improvement on Intel GPU while at least keeping perf same or better on Nvidia?

js/web/lib/wasm/jsep/webgpu/ops/matmulnbits.ts

qjia7

Does the pr yields 2x perf improvement on Intel GPU while at least keeping perf same or better on Nvidia?

I think so. This PR is a common optimization. Theoretically, it can bring perf improvement on all gpus. On Nvidia, I see about 80 tokens for phi3 on NV RTX 4090. At least, I didn't see any regression.

@satyajandhyala @guschmue Please help merge if no more issues. Thanks.

js/web/lib/wasm/jsep/webgpu/ops/matmulnbits.ts

guschmue · 2024-08-23T22:06:18Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

guschmue · 2024-08-23T22:06:27Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

azure-pipelines · 2024-08-23T22:06:30Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2024-08-23T22:06:36Z

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

guschmue · 2024-08-23T22:06:37Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2024-08-23T22:06:50Z

Azure Pipelines successfully started running 1 pipeline(s).

guschmue · 2024-08-23T23:35:30Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2024-08-23T23:35:37Z

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

qjia7 added 2 commits August 16, 2024 08:19

opt matmulnbits

79a4ac1

add outputNumber > 1

6bd5417

qjia7 force-pushed the opt_matmulnbits branch from 8d5c18e to bba6259 Compare August 16, 2024 05:59

clean code

d28ac0a

qjia7 force-pushed the opt_matmulnbits branch from bba6259 to d28ac0a Compare August 16, 2024 06:07

qjia7 changed the title ~~[WIP] Opt matmulnbits~~ Opt matmulnbits Aug 16, 2024

qjia7 marked this pull request as ready for review August 16, 2024 06:17

fs-eire changed the title ~~Opt matmulnbits~~ [js/webgpu] optimize MatmulNBits Aug 16, 2024

formant and add missing shaderCache hints

dab4542

qjia7 added 3 commits August 19, 2024 13:14

use global_idx

e86150a

tune outputNumber

e007210

add limitations

2bf70ef

satyajandhyala added the ep:WebGPU ort-web webgpu provider label Aug 20, 2024

qjia7 added 4 commits August 21, 2024 16:25

Merge branch 'main' into opt_matmulnbits

f68a2da

fix workgroupSize to reduce shader recompilation

cbacc4a

support zeroPoints as input

cb775ed

replace the old algorithm

2ab2c4e

satyajandhyala reviewed Aug 22, 2024

View reviewed changes

js/web/lib/wasm/jsep/webgpu/ops/matmulnbits.ts Show resolved Hide resolved

satyajandhyala reviewed Aug 22, 2024

View reviewed changes

js/web/lib/wasm/jsep/webgpu/ops/matmulnbits.ts Show resolved Hide resolved

qjia7 commented Aug 23, 2024

View reviewed changes

js/web/lib/wasm/jsep/webgpu/ops/matmulnbits.ts Show resolved Hide resolved

js/web/lib/wasm/jsep/webgpu/ops/matmulnbits.ts Show resolved Hide resolved

qjia7 requested a review from satyajandhyala August 23, 2024 05:12

satyajandhyala approved these changes Aug 23, 2024

View reviewed changes

guschmue approved these changes Aug 23, 2024

View reviewed changes

guschmue merged commit 87165b9 into microsoft:main Aug 23, 2024
46 checks passed

[js/webgpu] optimize MatmulNBits #21747

[js/webgpu] optimize MatmulNBits #21747

Conversation

qjia7 commented Aug 15, 2024

Description

Motivation and Context

qjia7 commented Aug 15, 2024

qjia7 commented Aug 16, 2024

fs-eire commented Aug 16, 2024

fs-eire commented Aug 16, 2024

fs-eire commented Aug 16, 2024

azure-pipelines bot commented Aug 16, 2024

azure-pipelines bot commented Aug 16, 2024

azure-pipelines bot commented Aug 16, 2024

fs-eire commented Aug 18, 2024

fs-eire commented Aug 18, 2024

fs-eire commented Aug 18, 2024

azure-pipelines bot commented Aug 18, 2024

azure-pipelines bot commented Aug 18, 2024

azure-pipelines bot commented Aug 18, 2024

qjia7 commented Aug 19, 2024

guschmue commented Aug 19, 2024

guschmue commented Aug 19, 2024

azure-pipelines bot commented Aug 19, 2024

guschmue commented Aug 19, 2024

azure-pipelines bot commented Aug 19, 2024

azure-pipelines bot commented Aug 19, 2024

guschmue commented Aug 19, 2024

guschmue commented Aug 19, 2024

azure-pipelines bot commented Aug 19, 2024

satyajandhyala commented Aug 20, 2024

qjia7 commented Aug 20, 2024

qjia7 commented Aug 22, 2024

satyajandhyala commented Aug 22, 2024

qjia7 left a comment

Choose a reason for hiding this comment

guschmue commented Aug 23, 2024

guschmue commented Aug 23, 2024

azure-pipelines bot commented Aug 23, 2024

azure-pipelines bot commented Aug 23, 2024

guschmue commented Aug 23, 2024

azure-pipelines bot commented Aug 23, 2024

guschmue commented Aug 23, 2024

azure-pipelines bot commented Aug 23, 2024