Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] Cuda Execution Provider library is needed despite we only use TensoRT Execution provider #22960

Open
jcdatin opened this issue Nov 27, 2024 · 10 comments
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider stale issues that have not been addressed in a while; categorized by a bot

Comments

@jcdatin
Copy link

jcdatin commented Nov 27, 2024

Describe the issue

There must be a way to build onnxruntime with tensorRt without the cuda execution provider and its cuda unused dependencies.
libonnxruntime_providers_cuda.so is big (220MB) and is dragging other big dependencies like libcufft or libcublas that we don't use in inference (another 400MB).

Urgency

non blocking

Target platform

linux

Build script

build.py

Error / output

N/A

Visual Studio Version

N/A

GCC / Compiler Version

gcc11

@jcdatin jcdatin added the build build issues; typically submitted using template label Nov 27, 2024
@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Nov 27, 2024
@skottmckay
Copy link
Contributor

option(onnxruntime_CUDA_MINIMAL "Build CUDA without any operations apart from memcpy ops. Usefuel for a very minial TRT build" OFF)

Add this to your build command line --cmake_extra_defines onnxruntime_CUDA_MINIMAL=ON

@jcdatin
Copy link
Author

jcdatin commented Nov 27, 2024

thank you ! (I could not find it in build.py)
Trying this.

@jcdatin
Copy link
Author

jcdatin commented Nov 27, 2024

unfortunately this does not work , there is still cuda dependencies that caused compilation errors
[ 35%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime/onnxruntime/contrib_ops/cuda/bert/decoder_attention.cc.o
In file included from /onnxruntime/onnxruntime/contrib_ops/cuda/bert/attention.cc:5:
/onnxruntime/onnxruntime/core/providers/cuda/shared_inc/fpgeneric.h:22:8: error: ‘cublasStatus_t’ does not name a type; did you mean ‘cublasStrsv’?
22 | inline cublasStatus_t
| ^~~~~~~~~~~~~~

@jcdatin
Copy link
Author

jcdatin commented Nov 27, 2024

same for
Building CXX object CMakeFiles/onnxruntime_framework.dir/onnxruntime/onnxruntime/core/framework/config_options.cc (refers to cublas)
and a lot more ...

@skottmckay
Copy link
Contributor

@gedoensmax is this expected? Not sure if there are other build settings required to use onnxruntime_CUDA_MINIMAL

@gedoensmax
Copy link
Contributor

gedoensmax commented Dec 3, 2024

The cmake variable you mention Scott prunes all the dependencies from CUDA and makes the CUDA EP lib very small. Essentially it will only be able to execute memory copies and stresm management.
It is broken on some commits from time to time but we will fix it on newer releases.

This build scenario is also published at: https://github.com/NVIDIA/onnxruntime/releases

@poweiw
Copy link
Contributor

poweiw commented Dec 3, 2024

Hello @jcdatin ! The error comes from some new code in 1.20 not guarded by the cuda minimal macro. The fix has been merged and hopefully will be included in the next patch release of ORT. #22751

@jcdatin
Copy link
Author

jcdatin commented Dec 3, 2024

Hello @jcdatin ! The error comes from some new code in 1.20 not guarded by the cuda minimal macro. The fix has been merged and hopefully will be included in the next patch release of ORT. #22751

thx . happy to contribiye :-) Will use the next

Copy link
Contributor

github-actions bot commented Jan 2, 2025

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jan 2, 2025
@jcdatin
Copy link
Author

jcdatin commented Jan 13, 2025

tried again , but got onnxrt compilation failure
[ 46%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/tmp/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.cc.o
In file included from /tmp/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.h:12,
from /tmp/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.cc:9:
/tmp/onnxruntime/onnxruntime/core/providers/cuda/shared_inc/cudnn_fe_call.h:9:10: fatal error: cudnn_frontend.h: No such file or directory
9 | #include <cudnn_frontend.h>
| ^~~~~~~~~~~~~~~~~~

using TRT 10.7.23 Cudnn 9.6.0.74, ONNXRT 1.20.1 with gcc11
building command =

CC=gcc-11 CXX=g++-11 ./build.sh \
--skip_submodule_sync --nvcc_threads 2 \
--config $ORT_BUILD_MODE --use_cuda \
--cudnn_home /usr/local/cuda/lib64 \
--cuda_home /usr/local/cuda/ \
--use_tensorrt --tensorrt_home /usr/local/TensorRT \
--build_shared_lib --parallel --skip_tests \
--allow_running_as_root \
--cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=89" \
--cmake_extra_defines onnxruntime_CUDA_MINIMAL=ON \
--cmake_extra_defines "CMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-11"

-- ******** Summary ********
-- CMake version : 3.28.3
-- CMake command : /usr/bin/cmake
-- System : Linux
-- C++ compiler : /usr/bin/g++-11
-- C++ compiler version : 11.3.0
-- CXX flags : -ffunction-sections -fdata-sections -Wno-restrict -DCPUINFO_SUPPORTED -Wnon-virtual-dtor
-- Build type : Release
-- Compile definitions : ORT_ENABLE_STREAM;EIGEN_MPL2_ONLY;_GNU_SOURCE;__STDC_FORMAT_MACROS
-- CMAKE_PREFIX_PATH : /tmp/onnxruntime/build/Linux/Release/installed
-- CMAKE_INSTALL_PREFIX : /usr/local
-- CMAKE_MODULE_PATH : /tmp/onnxruntime/cmake/external

-- ONNX version : 1.16.1
-- ONNX NAMESPACE : onnx
-- ONNX_USE_LITE_PROTO : ON
-- USE_PROTOBUF_SHARED_LIBS : OFF
-- Protobuf_USE_STATIC_LIBS : ON
-- ONNX_DISABLE_EXCEPTIONS : OFF
-- ONNX_DISABLE_STATIC_REGISTRATION : OFF
-- ONNX_WERROR : OFF
-- ONNX_BUILD_TESTS : OFF
-- ONNX_BUILD_BENCHMARKS : OFF
-- ONNX_BUILD_SHARED_LIBS :
-- BUILD_SHARED_LIBS : OFF

-- Protobuf compiler :
-- Protobuf includes :
-- Protobuf libraries :
-- BUILD_ONNX_PYTHON : OFF

Namespace(build_dir='/tmp/onnxruntime/build/Linux', config=['Release'], update=False, build=False, clean=False, parallel=0, nvcc_threads=2, test=False, skip_tests=True, compile_no_warning_as_error=False, enable_nvtx_profile=False, enable_memory_profile=False, enable_training=False, enable_training_apis=False, enable_training_ops=False, enable_nccl=False, mpi_home=None, nccl_home=None, use_mpi=False, enable_onnx_tests=False, path_to_protoc_exe=None, fuzz_testing=False, enable_symbolic_shape_infer_tests=False, gen_doc=None, gen_api_doc=False, use_cuda=True, cuda_version=None, cuda_home='/usr/local/cuda/', cudnn_home='/usr/local/cuda/lib64', enable_cuda_line_info=False, enable_cuda_nhwc_ops=False, enable_pybind=False, build_wheel=False, wheel_name_suffix=None, skip_keras_test=False, build_csharp=False, build_nuget=False, msbuild_extra_options=None, build_java=False, build_nodejs=False, build_objc=False, build_shared_lib=True, build_apple_framework=False, cmake_extra_defines=[['CMAKE_CUDA_ARCHITECTURES=89'], ['onnxruntime_CUDA_MINIMAL=ON'], ['CMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-11']], target=None, x86=False, rv64=False, arm=False, arm64=False, arm64ec=False, buildasx=False, riscv_toolchain_root='', riscv_qemu_path='', msvc_toolset=None, windows_sdk_version=None, android=False, android_abi='arm64-v8a', android_api=27, android_sdk_path='', android_ndk_path='', android_cpp_shared=False, android_run_emulator=False, use_gdk=False, gdk_edition='.', gdk_platform='Scarlett', ios=False, visionos=False, macos=None, apple_sysroot='', ios_toolchain_file='', visionos_toolchain_file='', xcode_code_signing_team_id='', xcode_code_signing_identity='', cmake_generator=None, osx_arch='x86_64', apple_deploy_target=None, enable_address_sanitizer=False, use_binskim_compliant_compile_flags=False, disable_memleak_checker=False, use_vcpkg=False, build_wasm=False, build_wasm_static_lib=False, emsdk_version='3.1.59', enable_wasm_simd=False, enable_wasm_threads=False, disable_wasm_exception_catching=False, enable_wasm_api_exception_catching=False, enable_wasm_exception_throwing_override=True, wasm_run_tests_in_browser=False, enable_wasm_profiling=False, enable_wasm_debug_info=False, wasm_malloc=None, emscripten_settings=None, use_extensions=False, extensions_overridden_path=None, cmake_path='cmake', ctest_path='ctest', skip_submodule_sync=True, use_mimalloc=False, use_dnnl=False, dnnl_gpu_runtime='', dnnl_opencl_root='', use_openvino=None, dnnl_aarch64_runtime='', dnnl_acl_root='', use_coreml=False, use_webnn=False, use_snpe=False, snpe_root=None, use_nnapi=False, use_vsinpu=False, nnapi_min_api=None, use_jsep=False, use_webgpu=False, use_qnn=False, qnn_home=None, use_rknpu=False, use_preinstalled_eigen=False, eigen_path=None, enable_msinternal=False, llvm_path=None, use_vitisai=False, use_tvm=False, tvm_cuda_runtime=False, use_tvm_hash=False, use_tensorrt=True, use_tensorrt_builtin_parser=True, use_tensorrt_oss_parser=False, tensorrt_home='/usr/local/TensorRT', test_all_timeout='10800', use_migraphx=False, migraphx_home=None, use_full_protobuf=False, llvm_config='', skip_onnx_tests=False, skip_winml_tests=False, skip_nodejs_tests=False, enable_msvc_static_runtime=False, use_dml=False, dml_path='', use_winml=False, winml_root_namespace_override=None, dml_external_project=False, use_telemetry=False, enable_wcos=False, enable_lto=False, enable_transformers_tool_test=False, use_acl=False, acl_home=None, acl_libs=None, use_armnn=False, armnn_relu=False, armnn_bn=False, armnn_home=None, armnn_libs=None, build_micro_benchmarks=False, minimal_build=None, include_ops_by_config=None, enable_reduced_operator_type_support=False, disable_contrib_ops=False, disable_ml_ops=False, disable_rtti=False, disable_types=[], disable_exceptions=False, rocm_version=None, use_rocm=False, rocm_home=None, code_coverage=False, enable_lazy_tensor=False, ms_experimental=False, enable_external_custom_op_schemas=False, external_graph_transformer_path=None, enable_cuda_profiling=False, use_cann=False, cann_home=None, enable_rocm_profiling=False, use_xnnpack=False, use_avx512=False, use_azure=False, use_cache=False, use_triton_kernel=False, use_lock_free_queue=False, allow_running_as_root=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

4 participants