support user_compute_stream for rocm ep #19619

kailums · 2024-02-23T09:01:55Z

Description

According to the pr #19229 supporting cuda EP use external compute stream, we add support for rocm EP.

And when we testing this feature with torch, we found torch use stream 0 for the default stream, and torch.cuda.current_stream() returns 0 for current stream, but ort treat 0 or nullptr as invalid, and reset has_user_compute_stream to false.

This makes it confusing that we don't know if it works or not, so we add a warning log for invalid compute_stream.

Motivation and Context

The motivation for this pr is that we want to use torch.cuda.graph to capture ort running kernel, which requires torch and ort are running in the same stream, so we use this API to set ort's working stream.

tianleiwu · 2024-02-23T16:20:00Z

Please resolve the build error in pipelines.

onnxruntime/core/providers/rocm/rocm_execution_provider_info.cc

tianleiwu · 2024-02-23T16:51:54Z

Please add document for ROCm provider options after this pull request.
For example, no provider option is described in https://onnxruntime.ai/docs/execution-providers/ROCm-ExecutionProvider.html.

kailums · 2024-02-26T03:28:16Z

Please resolve the build error in pipelines.

The modification of cuda ep makes test failed on user_compute_stream testcase, so I revert the change of cuda ep, and make rocm ep same as cuda ep.

onnxruntime/core/providers/rocm/rocm_execution_provider_info.cc

tianleiwu · 2024-02-26T04:53:02Z

Please add a test case in test/python/onnxruntime_test_python.py

### Description  * Implement `user_compute_stream` python api for TensorRT EP * Using this option will implicitly set `has_user_compute_stream` as `true` * Extend existing TRTEP unit test to verify `user_compute_stream` option * This has been verified in local pytorch env, with `torch.cuda.Stream()` passing into `user_compute_stream`: ```python ... # Before inference if torch.cuda.is_available(): s = torch.cuda.Stream() option = {"user_compute_stream": str(s.cuda_stream)} sess.set_providers(["TensorrtExecutionProvider"], [option]) options = sess.get_provider_options() assert "TensorrtExecutionProvider" in options assert options["TensorrtExecutionProvider"].get("user_compute_stream", "") == str(s.cuda_stream) assert options["TensorrtExecutionProvider"].get("has_user_compute_stream", "") == "1" ... ``` ### Motivation and Context  Align with existing `user_compute_stream` python implementations for [CUDA EP](https://github.com/microsoft/onnxruntime/pull/19229)/[ROCm EP](#19619)

) ### Description  * Implement `user_compute_stream` python api for TensorRT EP * Using this option will implicitly set `has_user_compute_stream` as `true` * Extend existing TRTEP unit test to verify `user_compute_stream` option * This has been verified in local pytorch env, with `torch.cuda.Stream()` passing into `user_compute_stream`: ```python ... # Before inference if torch.cuda.is_available(): s = torch.cuda.Stream() option = {"user_compute_stream": str(s.cuda_stream)} sess.set_providers(["TensorrtExecutionProvider"], [option]) options = sess.get_provider_options() assert "TensorrtExecutionProvider" in options assert options["TensorrtExecutionProvider"].get("user_compute_stream", "") == str(s.cuda_stream) assert options["TensorrtExecutionProvider"].get("has_user_compute_stream", "") == "1" ... ``` ### Motivation and Context  Align with existing `user_compute_stream` python implementations for [CUDA EP](https://github.com/microsoft/onnxruntime/pull/19229)/[ROCm EP](microsoft#19619)

According to the pr #19229 supporting cuda EP use external compute stream, we add support for rocm EP. And when we testing this feature with torch, we found torch use stream 0 for the default stream, and `torch.cuda.current_stream()` returns `0` for current stream, but ort treat `0` or `nullptr` as invalid, and reset has_user_compute_stream to false. Will remove has_user_compute_stream option in the future.  The motivation for this pr is that we want to use torch.cuda.graph to capture ort running kernel, which requires torch and ort are running in the same stream, so we use this API to set ort's working stream.

support user_compute_stream for rocm ep

19df71e

kailums requested a review from tianleiwu February 23, 2024 09:01

tianleiwu reviewed Feb 23, 2024

View reviewed changes

onnxruntime/core/providers/rocm/rocm_execution_provider_info.cc Show resolved Hide resolved

fix test failed

4b81224

tianleiwu reviewed Feb 26, 2024

View reviewed changes

onnxruntime/core/providers/rocm/rocm_execution_provider_info.cc Outdated Show resolved Hide resolved

tianleiwu reviewed Feb 26, 2024

View reviewed changes

onnxruntime/core/providers/rocm/rocm_execution_provider_info.cc Show resolved Hide resolved

fix build error and add test for rocm ep

d56e9d9

tianleiwu approved these changes Feb 27, 2024

View reviewed changes

kailums merged commit 6f56656 into main Feb 27, 2024
95 checks passed

kailums deleted the use_compute_stream_rocm_ep branch February 27, 2024 03:31

yf711 mentioned this pull request Apr 11, 2024

[TensorRT EP] support user_compute_stream in python API #20168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support user_compute_stream for rocm ep #19619

support user_compute_stream for rocm ep #19619

kailums commented Feb 23, 2024

tianleiwu commented Feb 23, 2024

tianleiwu commented Feb 23, 2024

kailums commented Feb 26, 2024

tianleiwu commented Feb 26, 2024

support user_compute_stream for rocm ep #19619

support user_compute_stream for rocm ep #19619

Conversation

kailums commented Feb 23, 2024

Description

Motivation and Context

tianleiwu commented Feb 23, 2024

tianleiwu commented Feb 23, 2024

kailums commented Feb 26, 2024

tianleiwu commented Feb 26, 2024