[webgpu] Always use tile matmulnbits for block_size = 32 #23140

qjia7 · 2024-12-18T09:38:12Z

Description

After the optimization of prefill time with #23102, it seems that always using the tile matmulnibits with block_size = 32 can bring better performance even for discrete gpu for phi3 model.

Phi3 becomes 42.64 tokens/sec from 32.82 tokens/sec in easy mode on my NV RTX 2000 GPU.

qjia7 · 2024-12-18T09:42:22Z

@guschmue @fs-eire Please help check other gpus you have at hand to see the overall results. Thanks.

guschmue · 2024-12-19T16:26:35Z

yes, I can confirm - I had taken out the check for intel and saw gains on a2000, 3060 and m4.
Can run a full benchmark later today.

guschmue · 2024-12-19T16:28:03Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

guschmue · 2024-12-19T16:28:10Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

guschmue · 2024-12-19T16:28:16Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2024-12-19T16:28:20Z

Azure Pipelines successfully started running 2 pipeline(s).

guschmue · 2024-12-19T16:28:23Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2024-12-19T16:28:39Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2024-12-19T16:28:39Z

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines · 2024-12-19T16:28:56Z

Azure Pipelines successfully started running 9 pipeline(s).

guschmue · 2024-12-20T00:22:41Z

for all models / scenarios for the gpu's impacted:

token/sec sppedup: avg ratio=1.19, >10% speedup=56.0%, >10% slowdown=4.0%, inside 10%=40.0%
prefill(500) speedup: sum avg ratio=2.20, >10% speedup=100.0%, >10% slowdown=0.0%, inside 10%=0.0%
prefill(1000) speedup: sum_long avg ratio=2.11, >10% speedup=100.0%, >10% slowdown=0.0%, inside 10%=0.0%

### Description After the optimization of prefill time with #23102, it seems that always using the tile matmulnibits with block_size = 32 can bring better performance even for discrete gpu for phi3 model. Phi3 becomes 42.64 tokens/sec from 32.82 tokens/sec in easy mode on my NV RTX 2000 GPU.

[webgpu] Always use tile matmulnbits for block_size = 32

160747d

guschmue approved these changes Dec 19, 2024

View reviewed changes

guschmue added the ep:WebGPU ort-web webgpu provider label Dec 19, 2024

guschmue merged commit 7c782f6 into microsoft:main Dec 20, 2024
75 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[webgpu] Always use tile matmulnbits for block_size = 32 #23140

[webgpu] Always use tile matmulnbits for block_size = 32 #23140

qjia7 commented Dec 18, 2024 •

edited

Loading

qjia7 commented Dec 18, 2024

guschmue commented Dec 19, 2024 •

edited

Loading

guschmue commented Dec 19, 2024

guschmue commented Dec 19, 2024

guschmue commented Dec 19, 2024

azure-pipelines bot commented Dec 19, 2024

guschmue commented Dec 19, 2024

azure-pipelines bot commented Dec 19, 2024

azure-pipelines bot commented Dec 19, 2024

azure-pipelines bot commented Dec 19, 2024

guschmue commented Dec 20, 2024

[webgpu] Always use tile matmulnbits for block_size = 32 #23140

[webgpu] Always use tile matmulnbits for block_size = 32 #23140

Conversation

qjia7 commented Dec 18, 2024 • edited Loading

Description

qjia7 commented Dec 18, 2024

guschmue commented Dec 19, 2024 • edited Loading

guschmue commented Dec 19, 2024

guschmue commented Dec 19, 2024

guschmue commented Dec 19, 2024

azure-pipelines bot commented Dec 19, 2024

guschmue commented Dec 19, 2024

azure-pipelines bot commented Dec 19, 2024

azure-pipelines bot commented Dec 19, 2024

azure-pipelines bot commented Dec 19, 2024

guschmue commented Dec 20, 2024

qjia7 commented Dec 18, 2024 •

edited

Loading

guschmue commented Dec 19, 2024 •

edited

Loading