-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TensorRT ExecutionProvider] Cannot infer the model on a GPU device with an ID other than 0 #21276
Comments
I tried running ResNet50 on device other than 0 with TRT EP, and it can successfully run the inference. Can you run inference on device other than 0 with CUDA EP? Also, it also help if you turn on the verbose log and share the log.
|
I have used the [email protected] Rust binding code from C (since I am not familiar with C code). Below is a simple code snippet that I have used:: let model_path = "warehouse/model.onnx";
let global_options = ort::EnvironmentGlobalThreadPoolOptions {
intra_op_parallelism: Some(16),
..Default::default()
};
let _ = ort::init().with_global_thread_pool(global_options).commit()?;
let provider = ort::TensorRTExecutionProvider::default()
.with_device_id(1)
.with_engine_cache(true)
.with_engine_cache_path("warehouse".to_owned())
.with_profile_min_shapes("images:1x3x256x256".to_owned())
.with_profile_opt_shapes("images:32x3x256x256".to_owned())
.with_profile_max_shapes("images:64x3x256x256".to_owned())
.with_max_partition_iterations(10)
.with_max_workspace_size(2 * 1024 * 1024 * 1024)
.build();
let session = Session::builder()?
.with_optimization_level(ort::GraphOptimizationLevel::Level3)?
.with_execution_providers([provider])?
.commit_from_file(model_path)?; Inference with let outputs = session.run_async(inputs)?.await?; I encountered the error mentioned above. However, when I tried running it with the CUDA Execution Provider (EP), it worked perfectly fine. |
Hi @dat58 does other model work on your script with TensorRT EP and gpu_id=1? Like ResNet50? Btw I saw you previously posted an issue pykeio/ort#226 with gpu_id=0 and same error type. Is this issue still happening to single GPU? If you already fixed it, what did you do to fix? |
Thanks for sharing. Did you enable both TRT/CUDA EP to your multiGPU script as well? If so, you are welcome to share your script file and model (without your IP) bundle to help repro this issue. Btw, are all your multiple GPUs same architecture? Is that possible that your existing engine cache generated by your gpu_id:0 and consumed by gpu_id:1, which has different architecture and not compatible? I am not sure if this rust binding could allow that |
To reproduce the issue, it could be complicated. I have written a mini Rust code to reproduce this problem. Please download it from the source. In my shared folder, there are three files: The To create the environment and run the Rust project, use the docker build -t rustai:ort . And execute bash run.sh Extract file After following these steps, you will have mounted the projects directory inside the container. Now, you must enter the docker exec -it ort bash And then follow these scenarios: Scenario 1: Start the server using the CPU EP to verify that the server was configured correctly.# terminal 1
cd /projects/trtsample
cargo run --release
# terminal 2
cd /projects/loadtest
bash run.sh The HTTP server must be successfully running. Scenario 2: Start the server using the TENSORRT EP.# terminal 1
cd /projects/trtsample
cargo run --release -F tensorrt
# terminal 2
cd /projects/loadtest
bash run.sh In my test, the HTTP server panics immediately after running the loadtest. I have set the default GPU_ID = 1 in the file Scenario 3: Start the server using the TENSORRT EP + CUDA EP.# terminal 1
cd /projects/trtsample
cargo run --release -F tensorrt_cuda
# terminal 2
cd /projects/loadtest
bash run.sh You should change GPU_ID = 1 before running this test to observe that the issue has been resolved with the TensorRT EP. NOTE: My machine is equipped with 8 NVIDIA RTX 4090 GPUs, and the driver version is 550.90.07. |
Describe the issue
In a scenario where multiple GPU devices are available, when selecting the TensorrtExecutionProvider and choosing device_id = 0, the model infers perfectly. However, when using a different device_id (not equal to 0), an error is thrown during inference:
I noticed that a similar problem occurred with the CudaExecutionProvider a few years ago, and it was resolved in issue #1815 (I have tested it, and it works correctly). It is possible that a similar issue has occurred with the TensorrtExecutionProvider.
To reproduce
Tested on both versions:
Urgency
No response
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.3 | 1.17.3 | 1.18.1
ONNX Runtime API
Rust with binding from C
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
CUDA 11.8
The text was updated successfully, but these errors were encountered: