Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[js/webgpu] Provide a vectorized algorithm for GroupedConv #18884

Merged
merged 5 commits into from
Jan 11, 2024

Conversation

qjia7
Copy link
Contributor

@qjia7 qjia7 commented Dec 20, 2023

Description

This PR provides a vectorized algorithm for NHWC GroupedConv to improve performance.

The aggregate time of GroupedConv in mobilenetv2-12 becomes ~1ms from ~4ms on Intel Alder Lake machine. About 20% improvement for the whole model.

@qjia7
Copy link
Contributor Author

qjia7 commented Dec 20, 2023

@fs-eire @guschmue @satyajandhyala Please take a look, thanks.

@guschmue
Copy link
Contributor

/azp run ONNX Runtime Web CI Pipeline

@guschmue
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@guschmue
Copy link
Contributor

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

guschmue
guschmue previously approved these changes Dec 22, 2023
@fs-eire
Copy link
Contributor

fs-eire commented Jan 4, 2024

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@fs-eire
Copy link
Contributor

fs-eire commented Jan 4, 2024

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-python-checks-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Android CI Pipeline

@fs-eire
Copy link
Contributor

fs-eire commented Jan 4, 2024

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

2 similar comments
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@gyagp
Copy link

gyagp commented Jan 4, 2024

@fs-eire @guschmue @satyajandhyala any more comments on this?

@gyagp
Copy link

gyagp commented Jan 10, 2024

@guschmue @fs-eire Do you think these unsuccessful checks relevant?

@fs-eire
Copy link
Contributor

fs-eire commented Jan 10, 2024

@guschmue @fs-eire Do you think these unsuccessful checks relevant?

It looks like the unittest are failing:

[webgpu]Conv - conv - vectorize group - B
[webgpu]Conv - conv - vectorize group - D

@fs-eire
Copy link
Contributor

fs-eire commented Jan 11, 2024

I am running the unit tests on my local machine to check if CI reports false error.

@fs-eire
Copy link
Contributor

fs-eire commented Jan 11, 2024

I am running the unit tests on my local machine to check if CI reports false error.

Passed on my local as well. It should be a false error. Let me merge it and watch the main branch.

@fs-eire fs-eire merged commit fd6bab4 into microsoft:main Jan 11, 2024
40 of 52 checks passed
@qjia7
Copy link
Contributor Author

qjia7 commented Jan 11, 2024

@guschmue @fs-eire Do you think these unsuccessful checks relevant?

It looks like the unittest are failing:

[webgpu]Conv - conv - vectorize group - B
[webgpu]Conv - conv - vectorize group - D

I locally tried NV and Intel machines. All cases pass on both of them on latest main. I can't reproduce those failures. Any idea for this?

qjia7 added a commit to qjia7/onnxruntime that referenced this pull request Jan 11, 2024
…icrosoft#18884)"

This reverts commit fd6bab4 due to
below cases failure on bots
[webgpu]Conv - conv - vectorize group - B
[webgpu]Conv - conv - vectorize group - D
siweic0 pushed a commit to siweic0/onnxruntime-web that referenced this pull request May 9, 2024
…#18884)

### Description
This PR provides a vectorized algorithm for NHWC GroupedConv to improve
performance.

The aggregate time of GroupedConv in mobilenetv2-12 becomes ~1ms from
~4ms on Intel Alder Lake machine. About 20% improvement for the whole
model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants