Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT EP] Enable a minimal CUDA EP compilation without kernels #19052

Merged
merged 2 commits into from
Jan 17, 2024

Conversation

gedoensmax
Copy link
Contributor

Adresses #18542.
I followed the advice given by @RyanUnderhill here and went with a minimal CUDA EP for now.

@tianleiwu
Copy link
Contributor

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, ONNX Runtime React Native CI Pipeline, Windows x64 QNN CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux MIGraphX CI Pipeline, orttraining-amd-gpu-ci-pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@tianleiwu
Copy link
Contributor

tianleiwu commented Jan 11, 2024

@gedoensmax,

What's the example build command line to use this?

I tried the following but there is build error:

export CUDA_HOME=/usr/local/cuda-12.2
export CUDNN_HOME=/usr/lib/x86_64-linux-gnu/
export CUDACXX=/usr/local/cuda-12.2/bin/nvcc
export TRT_HOME=/usr/src/tensorrt

sh build.sh --config Release  --build_shared_lib --parallel --cuda_version 12.2 \
            --cuda_home $CUDA_HOME --cudnn_home $CUDNN_HOME --build_wheel --skip_tests \
            --use_tensorrt --tensorrt_home $TRT_HOME \
            --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF \
            --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=80 \
            --cmake_extra_defines onnxruntime_CUDA_MINIMAL=ON

BTW, there are conflicts.

@gedoensmax
Copy link
Contributor Author

@tianleiwu I believe you are missing --use_cuda and I also use onnxruntime_DISABLE_CONTRIB_OPS=ON. To be honest I always build using cmake directly, but I can try tomorrow with the build script.

@gedoensmax
Copy link
Contributor Author

Ok I could not hold back to try. I verified that this works on my end:

./build.sh --config Release  --build_shared_lib --parallel --cuda_version 12.2 \
            --cuda_home $CUDA_HOME --cudnn_home $CUDNN_HOME --build_wheel --skip_tests \
            --use_tensorrt --tensorrt_home $TRT_HOME \
            --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF \
            --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=89 \
            --cmake_extra_defines onnxruntime_CUDA_MINIMAL=ON \
            --cmake_extra_defines onnxruntime_DISABLE_CONTRIB_OPS=ON \
            --build_dir build_script

@tianleiwu
Copy link
Contributor

@gedoensmax, I tried your build command, and it works before the last merge commit, but failed after the merge:

onnxruntime/onnxruntime/core/providers/cuda/cuda_stream_handle.cc:59:1: error: no declaration matches ‘onnxruntime::CudaStream::CudaStream(cudaStream_t, const OrtDevice&, onnxruntime::AllocatorPtr, bool, bool, cudnnHandle_t, cublasHandle_t)’
   59 | CudaStream::CudaStream(cudaStream_t stream,
      | ^~~~~~~~~~

@gedoensmax gedoensmax force-pushed the trt_compile_no_cu_ops branch from beb86b5 to a343bb2 Compare January 12, 2024 12:53
@gedoensmax
Copy link
Contributor Author

Sorry the webui (or me using it) must have messed this up.

@tianleiwu
Copy link
Contributor

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, ONNX Runtime React Native CI Pipeline, Windows x64 QNN CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux MIGraphX CI Pipeline, orttraining-amd-gpu-ci-pipeline

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@tianleiwu
Copy link
Contributor

/azp run orttraining-amd-gpu-ci-pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tianleiwu tianleiwu merged commit bc219ed into microsoft:main Jan 17, 2024
72 of 74 checks passed
yf711 added a commit that referenced this pull request Dec 5, 2024
yf711 added a commit that referenced this pull request Dec 19, 2024
### Description
<!-- Describe your changes. -->
New CI:
[Linux_TRT_Minimal_CUDA_Test_CI](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=230&_a=summary)
and [Win_TRT_Minimal_CUDA_Test_CI
](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=231)
Setting config for new CI to monitor if there's no issue to build
ORT-TRTEP with minimal CUDA
* yaml content is following Linux TRT CI yaml, with different build
arg/cache name
* build arg is following [[TensorRT EP] Enable a minimal CUDA EP
compilation without
kernels](#19052 (comment))



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Monitor if user is able to build ORT-TRTEP-minimalCUDA without any
blocker
(which takes ~30min to build)
guschmue pushed a commit that referenced this pull request Dec 20, 2024
### Description
<!-- Describe your changes. -->
New CI:
[Linux_TRT_Minimal_CUDA_Test_CI](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=230&_a=summary)
and [Win_TRT_Minimal_CUDA_Test_CI
](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=231)
Setting config for new CI to monitor if there's no issue to build
ORT-TRTEP with minimal CUDA
* yaml content is following Linux TRT CI yaml, with different build
arg/cache name
* build arg is following [[TensorRT EP] Enable a minimal CUDA EP
compilation without
kernels](#19052 (comment))



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Monitor if user is able to build ORT-TRTEP-minimalCUDA without any
blocker
(which takes ~30min to build)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants