New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[js/webgpu] Refactor timestamp-query and introduce timestamp-query-inside-passes #18894

Merged

fs-eire merged 11 commits into microsoft:main from gyagp:timestamp

Jan 13, 2024

gyagp commented Dec 20, 2023 •

edited

Loading

We submit kernels in a batch (a fixed number 16 is used except for the last batch) for better performance. However, timestamp query support is at pass level so we disable the batch execution in profiling mode in previous implementation. Actually we can have multiple passes in a batch so that we don't have to disable batch execution, which is the first enhancement of this PR.
Furthermore, WebGPU has an extension to support timestamp query inside passes, which isn't supported by all the platforms (e.g., Windows supports it, while macOS doesn't). This is expected to have lower cost compared with multiple passes solution. So this PR also introduce this support when available.
This PR also refactors some implementation related to kernelInfo, and try to unify the related kernel names.

gyagp closed this

gyagp reopened this

gyagp marked this pull request as draft

December 23, 2023 02:55

gyagp changed the title ~~[js/webgpu] Experiment on timestamp query between atPasses and insidePasses~~ [js/webgpu] Introduce timestamp-query-inside-passes

Author

gyagp commented Dec 23, 2023

I did some tests to compare between timestamp query at pass and inside pass on Windows, and didn't find obvious difference. More details can be found at https://docs.google.com/document/d/1eAavWUvp2YdvfiR1a2kpUH_BvTjF2APwCuYY18l7G9g/edit.
In later discussion with WebGPU folks, they told me compute pass on Windows is implemented as no-op, so the above observation is expected. On some other platforms, like Vulkan, there might be some difference. Timestamp query inside pass is expected to have lower cost on Vulkan, but it doesn't come totally free (Need to keep the order of commands so some optimizations can't be aggressive). We still need to understand more about the diff between these two, so it's better to keep the both paths. Please note that we can unify the solution, so to support timestamp query inside pass doesn't increase the code a lot.


          [js/webgpu] Experiment on timestamp query between atPasses and inside…

4ee5b41

…Passes

This is to experiment the timestamp query solutions atPasses and
insidePasses to see if insidePasses has obvious lower cost.

gyagp marked this pull request as ready for review

December 25, 2023 06:30

Author

gyagp commented Dec 25, 2023

@qjia, can you take a first look?

qjia7 reviewed

View reviewed changes

Contributor

qjia7 left a comment

LGTM with some nits.

js/web/lib/wasm/jsep/webgpu/program-manager.ts Outdated Show resolved Hide resolved

js/web/lib/wasm/jsep/webgpu/program-manager.ts Outdated Show resolved Hide resolved

js/web/lib/wasm/jsep/backend-webgpu.ts Outdated Show resolved Hide resolved


          Address Jiajia's comments and refactor kernels

gyagp commented

View reviewed changes

js/web/lib/wasm/jsep/backend-webgpu.ts Show resolved Hide resolved

js/web/lib/wasm/jsep/backend-webgpu.ts Outdated Show resolved Hide resolved

js/web/lib/wasm/jsep/init.ts Outdated Show resolved Hide resolved

qjia7 approved these changes

View reviewed changes

Contributor

qjia7 left a comment

LGTM with some nits.
@fs-eire @guschmue @satyajandhyala Please take a look, thanks.

js/web/lib/wasm/jsep/backend-webgpu.ts Show resolved Hide resolved

js/web/lib/wasm/jsep/backend-webgpu.ts Show resolved Hide resolved

js/web/lib/wasm/jsep/backend-webgpu.ts Outdated Show resolved Hide resolved

js/web/lib/wasm/jsep/init.ts Outdated Show resolved Hide resolved

js/web/lib/wasm/jsep/backend-webgpu.ts Outdated Show resolved Hide resolved

js/web/lib/wasm/jsep/backend-webgpu.ts Show resolved Hide resolved

js/web/lib/wasm/jsep/backend-webgpu.ts Outdated Show resolved Hide resolved


          Add more comments from Jiajia

c4951a1

gyagp changed the title ~~[js/webgpu] Introduce timestamp-query-inside-passes~~ [js/webgpu] Refactor timestamp-query and introduce timestamp-query-inside-passes

gyagp mentioned this pull request

[js/webgpu] Introduce trace support #18928

Merged

fs-eire reviewed

View reviewed changes

js/web/lib/wasm/jsep/backend-webgpu.ts Outdated Show resolved Hide resolved

js/web/lib/wasm/jsep/init.ts Outdated Show resolved Hide resolved

js/web/lib/wasm/jsep/backend-webgpu.ts Outdated Show resolved Hide resolved

Yang Gu added 3 commits

January 3, 2024 15:34


          Address Yulong's comments

585ec83


          Unify the terms

40790b2


          Incorporate trace support

3504a34

Author

gyagp commented Jan 4, 2024

@qjia7 @fs-eire @guschmue @satyajandhyala
I just merged the recent changes, including the trace framework. Based on it, I added full support of trace in this PR, especially the GPU timeline. Please take another look!

fs-eire reviewed

View reviewed changes

js/web/lib/wasm/jsep/webgpu/types.ts Outdated Show resolved Hide resolved

fs-eire reviewed

View reviewed changes

js/web/lib/wasm/jsep/backend-webgpu.ts Show resolved Hide resolved

fs-eire reviewed

View reviewed changes

js/web/lib/wasm/jsep/backend-webgpu.ts Outdated Show resolved Hide resolved

Yang Gu added 3 commits

January 11, 2024 22:41


          Address Yulong's comments

e0859ee


          Put TimestampQuery to the end of file

982a99e


          Merge branch 'main' of https://github.com/gyagp/onnxruntime into time…

bbe09ad

…stamp

Author

gyagp commented Jan 11, 2024

@fs-eire, thanks for the comments! I addressed all your comments and pulled the latest code. Please take another look!

fs-eire previously approved these changes

View reviewed changes

Contributor

fs-eire commented Jan 11, 2024

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

Contributor

fs-eire commented Jan 11, 2024

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-python-checks-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Android CI Pipeline

azure-pipelines bot commented Jan 11, 2024

Azure Pipelines successfully started running 9 pipeline(s).

Contributor

fs-eire commented Jan 12, 2024

The IO-binding tests are failed: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1264363&view=logs&j=4cf9212c-8936-5e77-cbdb-290c1c5567eb&t=8a4f1cc9-2423-5c1e-71f3-5ad7e67ee036


          Call endComputePass in flush and fix the name of writeTimestamp

ab20d47

gyagp dismissed fs-eire’s stale review via

ab20d47

January 12, 2024 09:30

Author

gyagp commented Jan 12, 2024

The IO-binding tests are failed: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1264363&view=logs&j=4cf9212c-8936-5e77-cbdb-290c1c5567eb&t=8a4f1cc9-2423-5c1e-71f3-5ad7e67ee036

The root cause is I intentionally removed endComputePass() in flush to avoid duplication. However, in io-binding, runAsync calls flush() without ending the GPUComputePassEncoder. Adding back the endComputePass() to fix the issue. Local tests below are happy now.
npm test -- suite1 -b=webgpu --io-binding=gpu-location
npm test -- suite1 -b=webgpu --io-binding=gpu-tensor

I also renamed the func writeTimeStamp() to writeTimestamp().

Author

gyagp commented Jan 12, 2024

run web CI

Contributor

fs-eire commented Jan 12, 2024

/azp run ONNX Runtime Web CI Pipeline

azure-pipelines bot commented Jan 12, 2024

Azure Pipelines successfully started running 1 pipeline(s).

Contributor

guschmue commented Jan 12, 2024

/azp run ONNX Runtime Web CI Pipeline

Contributor

guschmue commented Jan 12, 2024

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

Contributor

guschmue commented Jan 12, 2024

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline

azure-pipelines bot commented Jan 12, 2024

Azure Pipelines successfully started running 1 pipeline(s).

guschmue added the ep:WebGPU label

azure-pipelines bot commented Jan 12, 2024

Azure Pipelines successfully started running 7 pipeline(s).

azure-pipelines bot commented Jan 12, 2024

Azure Pipelines successfully started running 9 pipeline(s).


          Merge branch 'main' of https://github.com/gyagp/onnxruntime into time…

bfdf6aa

…stamp

Author

gyagp commented Jan 13, 2024

run web CI

Contributor

fs-eire commented Jan 13, 2024

/azp run ONNX Runtime Web CI Pipeline

azure-pipelines bot commented Jan 13, 2024

Azure Pipelines successfully started running 1 pipeline(s).

Contributor

fs-eire commented Jan 13, 2024

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

Contributor

fs-eire commented Jan 13, 2024

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline

azure-pipelines bot commented Jan 13, 2024

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines bot commented Jan 13, 2024

Azure Pipelines successfully started running 7 pipeline(s).

Contributor

fs-eire commented Jan 13, 2024

/azp run Windows ARM64 QNN CI Pipeline

azure-pipelines bot commented Jan 13, 2024

Azure Pipelines successfully started running 1 pipeline(s).

fs-eire approved these changes

View reviewed changes

fs-eire merged commit e803f8e into microsoft:main

64 checks passed

gyagp deleted the timestamp branch

January 13, 2024 11:08

mszhanyi pushed a commit that referenced this pull request


          [js/webgpu] Refactor timestamp-query and introduce timestamp-query-in…

0ed42c5

…side-passes (#18894)

We submit kernels in a batch (a fixed number 16 is used except for the
last batch) for better performance. However, timestamp query support is
at pass level so we disable the batch execution in profiling mode in
previous implementation. Actually we can have multiple passes in a batch
so that we don't have to disable batch execution, which is the first
enhancement of this PR.
Furthermore, WebGPU has an extension to support timestamp query inside
passes, which isn't supported by all the platforms (e.g., Windows
supports it, while macOS doesn't). This is expected to have lower cost
compared with multiple passes solution. So this PR also introduce this
support when available.
This PR also refactors some implementation related to kernelInfo, and
try to unify the related kernel names.

siweic0 pushed a commit to siweic0/onnxruntime-web that referenced this pull request


          [js/webgpu] Refactor timestamp-query and introduce timestamp-query-in…

296d5c2

…side-passes (microsoft#18894)

We submit kernels in a batch (a fixed number 16 is used except for the
last batch) for better performance. However, timestamp query support is
at pass level so we disable the batch execution in profiling mode in
previous implementation. Actually we can have multiple passes in a batch
so that we don't have to disable batch execution, which is the first
enhancement of this PR.
Furthermore, WebGPU has an extension to support timestamp query inside
passes, which isn't supported by all the platforms (e.g., Windows
supports it, while macOS doesn't). This is expected to have lower cost
compared with multiple passes solution. So this PR also introduce this
support when available.
This PR also refactors some implementation related to kernelInfo, and
try to unify the related kernel names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels