[Build] Cuda Execution Provider library is needed despite we only use TensoRT Execution provider #22960

jcdatin · 2024-11-27T09:32:47Z

Describe the issue

There must be a way to build onnxruntime with tensorRt without the cuda execution provider and its cuda unused dependencies.
libonnxruntime_providers_cuda.so is big (220MB) and is dragging other big dependencies like libcufft or libcublas that we don't use in inference (another 400MB).

Urgency

non blocking

Target platform

linux

Build script

build.py

Error / output

N/A

Visual Studio Version

N/A

GCC / Compiler Version

gcc11

skottmckay · 2024-11-27T11:00:11Z

onnxruntime/cmake/CMakeLists.txt

Line 90 in b930b4a

    
           option(onnxruntime_CUDA_MINIMAL "Build CUDA without any operations apart from memcpy ops. Usefuel for a very minial TRT build" OFF)

Add this to your build command line --cmake_extra_defines onnxruntime_CUDA_MINIMAL=ON

jcdatin · 2024-11-27T13:01:39Z

thank you ! (I could not find it in build.py)
Trying this.

jcdatin · 2024-11-27T14:40:08Z

unfortunately this does not work , there is still cuda dependencies that caused compilation errors
[ 35%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime/onnxruntime/contrib_ops/cuda/bert/decoder_attention.cc.o
In file included from /onnxruntime/onnxruntime/contrib_ops/cuda/bert/attention.cc:5:
/onnxruntime/onnxruntime/core/providers/cuda/shared_inc/fpgeneric.h:22:8: error: ‘cublasStatus_t’ does not name a type; did you mean ‘cublasStrsv’?
22 | inline cublasStatus_t
| ^~~~~~~~~~~~~~

jcdatin · 2024-11-27T14:42:09Z

same for
Building CXX object CMakeFiles/onnxruntime_framework.dir/onnxruntime/onnxruntime/core/framework/config_options.cc (refers to cublas)
and a lot more ...

skottmckay · 2024-12-02T06:27:30Z

@gedoensmax is this expected? Not sure if there are other build settings required to use onnxruntime_CUDA_MINIMAL

gedoensmax · 2024-12-03T00:48:28Z

The cmake variable you mention Scott prunes all the dependencies from CUDA and makes the CUDA EP lib very small. Essentially it will only be able to execute memory copies and stresm management.
It is broken on some commits from time to time but we will fix it on newer releases.

This build scenario is also published at: https://github.com/NVIDIA/onnxruntime/releases

poweiw · 2024-12-03T05:24:35Z

Hello @jcdatin ! The error comes from some new code in 1.20 not guarded by the cuda minimal macro. The fix has been merged and hopefully will be included in the next patch release of ORT. #22751

jcdatin · 2024-12-03T09:13:25Z

Hello @jcdatin ! The error comes from some new code in 1.20 not guarded by the cuda minimal macro. The fix has been merged and hopefully will be included in the next patch release of ORT. #22751

thx . happy to contribiye :-) Will use the next

github-actions · 2025-01-02T15:00:44Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

jcdatin · 2025-01-13T19:02:45Z

tried again , but got onnxrt compilation failure
[ 46%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/tmp/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.cc.o
In file included from /tmp/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.h:12,
from /tmp/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.cc:9:
/tmp/onnxruntime/onnxruntime/core/providers/cuda/shared_inc/cudnn_fe_call.h:9:10: fatal error: cudnn_frontend.h: No such file or directory
9 | #include <cudnn_frontend.h>
| ^~~~~~~~~~~~~~~~~~

using TRT 10.7.23 Cudnn 9.6.0.74, ONNXRT 1.20.1 with gcc11
building command =

CC=gcc-11 CXX=g++-11 ./build.sh \
--skip_submodule_sync --nvcc_threads 2 \
--config $ORT_BUILD_MODE --use_cuda \
--cudnn_home /usr/local/cuda/lib64 \
--cuda_home /usr/local/cuda/ \
--use_tensorrt --tensorrt_home /usr/local/TensorRT \
--build_shared_lib --parallel --skip_tests \
--allow_running_as_root \
--cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=89" \
--cmake_extra_defines onnxruntime_CUDA_MINIMAL=ON \
--cmake_extra_defines "CMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-11"

-- Summary
-- CMake version : 3.28.3
-- CMake command : /usr/bin/cmake
-- System : Linux
-- C++ compiler : /usr/bin/g++-11
-- C++ compiler version : 11.3.0
-- CXX flags : -ffunction-sections -fdata-sections -Wno-restrict -DCPUINFO_SUPPORTED -Wnon-virtual-dtor
-- Build type : Release
-- Compile definitions : ORT_ENABLE_STREAM;EIGEN_MPL2_ONLY;_GNU_SOURCE;__STDC_FORMAT_MACROS
-- CMAKE_PREFIX_PATH : /tmp/onnxruntime/build/Linux/Release/installed
-- CMAKE_INSTALL_PREFIX : /usr/local
-- CMAKE_MODULE_PATH : /tmp/onnxruntime/cmake/external

-- ONNX version : 1.16.1
-- ONNX NAMESPACE : onnx
-- ONNX_USE_LITE_PROTO : ON
-- USE_PROTOBUF_SHARED_LIBS : OFF
-- Protobuf_USE_STATIC_LIBS : ON
-- ONNX_DISABLE_EXCEPTIONS : OFF
-- ONNX_DISABLE_STATIC_REGISTRATION : OFF
-- ONNX_WERROR : OFF
-- ONNX_BUILD_TESTS : OFF
-- ONNX_BUILD_BENCHMARKS : OFF
-- ONNX_BUILD_SHARED_LIBS :
-- BUILD_SHARED_LIBS : OFF

-- Protobuf compiler :
-- Protobuf includes :
-- Protobuf libraries :
-- BUILD_ONNX_PYTHON : OFF

Namespace(build_dir='/tmp/onnxruntime/build/Linux', config=['Release'], update=False, build=False, clean=False, parallel=0, nvcc_threads=2, test=False, skip_tests=True, compile_no_warning_as_error=False, enable_nvtx_profile=False, enable_memory_profile=False, enable_training=False, enable_training_apis=False, enable_training_ops=False, enable_nccl=False, mpi_home=None, nccl_home=None, use_mpi=False, enable_onnx_tests=False, path_to_protoc_exe=None, fuzz_testing=False, enable_symbolic_shape_infer_tests=False, gen_doc=None, gen_api_doc=False, use_cuda=True, cuda_version=None, cuda_home='/usr/local/cuda/', cudnn_home='/usr/local/cuda/lib64', enable_cuda_line_info=False, enable_cuda_nhwc_ops=False, enable_pybind=False, build_wheel=False, wheel_name_suffix=None, skip_keras_test=False, build_csharp=False, build_nuget=False, msbuild_extra_options=None, build_java=False, build_nodejs=False, build_objc=False, build_shared_lib=True, build_apple_framework=False, cmake_extra_defines=[['CMAKE_CUDA_ARCHITECTURES=89'], ['onnxruntime_CUDA_MINIMAL=ON'], ['CMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-11']], target=None, x86=False, rv64=False, arm=False, arm64=False, arm64ec=False, buildasx=False, riscv_toolchain_root='', riscv_qemu_path='', msvc_toolset=None, windows_sdk_version=None, android=False, android_abi='arm64-v8a', android_api=27, android_sdk_path='', android_ndk_path='', android_cpp_shared=False, android_run_emulator=False, use_gdk=False, gdk_edition='.', gdk_platform='Scarlett', ios=False, visionos=False, macos=None, apple_sysroot='', ios_toolchain_file='', visionos_toolchain_file='', xcode_code_signing_team_id='', xcode_code_signing_identity='', cmake_generator=None, osx_arch='x86_64', apple_deploy_target=None, enable_address_sanitizer=False, use_binskim_compliant_compile_flags=False, disable_memleak_checker=False, use_vcpkg=False, build_wasm=False, build_wasm_static_lib=False, emsdk_version='3.1.59', enable_wasm_simd=False, enable_wasm_threads=False, disable_wasm_exception_catching=False, enable_wasm_api_exception_catching=False, enable_wasm_exception_throwing_override=True, wasm_run_tests_in_browser=False, enable_wasm_profiling=False, enable_wasm_debug_info=False, wasm_malloc=None, emscripten_settings=None, use_extensions=False, extensions_overridden_path=None, cmake_path='cmake', ctest_path='ctest', skip_submodule_sync=True, use_mimalloc=False, use_dnnl=False, dnnl_gpu_runtime='', dnnl_opencl_root='', use_openvino=None, dnnl_aarch64_runtime='', dnnl_acl_root='', use_coreml=False, use_webnn=False, use_snpe=False, snpe_root=None, use_nnapi=False, use_vsinpu=False, nnapi_min_api=None, use_jsep=False, use_webgpu=False, use_qnn=False, qnn_home=None, use_rknpu=False, use_preinstalled_eigen=False, eigen_path=None, enable_msinternal=False, llvm_path=None, use_vitisai=False, use_tvm=False, tvm_cuda_runtime=False, use_tvm_hash=False, use_tensorrt=True, use_tensorrt_builtin_parser=True, use_tensorrt_oss_parser=False, tensorrt_home='/usr/local/TensorRT', test_all_timeout='10800', use_migraphx=False, migraphx_home=None, use_full_protobuf=False, llvm_config='', skip_onnx_tests=False, skip_winml_tests=False, skip_nodejs_tests=False, enable_msvc_static_runtime=False, use_dml=False, dml_path='', use_winml=False, winml_root_namespace_override=None, dml_external_project=False, use_telemetry=False, enable_wcos=False, enable_lto=False, enable_transformers_tool_test=False, use_acl=False, acl_home=None, acl_libs=None, use_armnn=False, armnn_relu=False, armnn_bn=False, armnn_home=None, armnn_libs=None, build_micro_benchmarks=False, minimal_build=None, include_ops_by_config=None, enable_reduced_operator_type_support=False, disable_contrib_ops=False, disable_ml_ops=False, disable_rtti=False, disable_types=[], disable_exceptions=False, rocm_version=None, use_rocm=False, rocm_home=None, code_coverage=False, enable_lazy_tensor=False, ms_experimental=False, enable_external_custom_op_schemas=False, external_graph_transformer_path=None, enable_cuda_profiling=False, use_cann=False, cann_home=None, enable_rocm_profiling=False, use_xnnpack=False, use_avx512=False, use_azure=False, use_cache=False, use_triton_kernel=False, use_lock_free_queue=False, allow_running_as_root=True)

jcdatin added the build build issues; typically submitted using template label Nov 27, 2024

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Nov 27, 2024

jcdatin mentioned this issue Dec 13, 2024

[Documentation] how to modularize ONNXRT on CPU first , then on CPU with OpenVino EP then on Nvidia GPU with TRT EP simply by adding new provider libraries and all their dependencies #23104

Open

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Build] Cuda Execution Provider library is needed despite we only use TensoRT Execution provider #22960

[Build] Cuda Execution Provider library is needed despite we only use TensoRT Execution provider #22960

jcdatin commented Nov 27, 2024

skottmckay commented Nov 27, 2024

jcdatin commented Nov 27, 2024

jcdatin commented Nov 27, 2024

jcdatin commented Nov 27, 2024

skottmckay commented Dec 2, 2024

gedoensmax commented Dec 3, 2024 •

edited

Loading

poweiw commented Dec 3, 2024

jcdatin commented Dec 3, 2024

github-actions bot commented Jan 2, 2025

jcdatin commented Jan 13, 2025 •

edited

Loading

[Build] Cuda Execution Provider library is needed despite we only use TensoRT Execution provider #22960

[Build] Cuda Execution Provider library is needed despite we only use TensoRT Execution provider #22960

Comments

jcdatin commented Nov 27, 2024

Describe the issue

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

skottmckay commented Nov 27, 2024

jcdatin commented Nov 27, 2024

jcdatin commented Nov 27, 2024

jcdatin commented Nov 27, 2024

skottmckay commented Dec 2, 2024

gedoensmax commented Dec 3, 2024 • edited Loading

poweiw commented Dec 3, 2024

jcdatin commented Dec 3, 2024

github-actions bot commented Jan 2, 2025

jcdatin commented Jan 13, 2025 • edited Loading

gedoensmax commented Dec 3, 2024 •

edited

Loading

jcdatin commented Jan 13, 2025 •

edited

Loading