-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Build] Error building with tensorrt on Linux #17991
Comments
mobile was wrongly guessed by bot. This is on Linux/desktop. |
fyi, to run cmake without building you can omit --build and only pass --update to the build script. onnxruntime/tools/ci_build/github/linux/docker/Dockerfile.ubuntu_cuda11_8_tensorrt8_6 Line 34 in 35ecce4
|
btw, i suspect the source of your issue is you're installing tensorrt-dev without specifying a version so I think it's picking up the latest version that's built against cuda 12.x , which may conflict/update base image version of 11.8 , that's why cmake isn't able to figure out which cuda compiler to use. |
tensorrt install instructions are at https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing the relevant section states sudo apt-mark hold tensorrt-dev If the CUDA network repository and a TensorRT local repository are enabled at the same time you may observe package conflicts with either TensorRT or cuDNN. You will need to configure APT so that it prefers local packages over network packages. You can do this by creating a new file at /etc/apt/preferences.d/local-repo with the following lines: |
After I adjusted the TensorRT install to be for CUDA 11.8 I get a compilation that starts it seems, but never finishes (with never being defined as 8 hours). After this gitlab kills the job and I don't get the printouts I added to build.py so I added an internal timeout in build.py of 2 hours and now I see that processing is killed when working on a file: Unfortunately there are no time stamps so I don't know if it actually took 2 hours to get there. I will retry with a longer timeout. But I also see this in the log: Could this be related or should it be disregarded even if it says fatal? I found one other bug report (#12922) that had a log pasted with this contents, but the discussion there does not mention it. |
does it consistently hang on compiling that obj ? if you adjust your timeout to 3 hours, is it the same? |
Example script to build in Linux (I build only one SM and disable tests, and it usually took a few minutes to build): If your build failed in flash attention, usually the cause is out of memory. Since you are using nvcc_threads 1 (I saw the message onnxruntime/tools/ci_build/build.py Line 161 in 2c6b31c
|
I pip install psutil in the docker container and allow it 46 GByte. Compilation still hangs forever. So I changed back to the unpatched onnxruntime repo to see if I messed something up but it still hangs but now I don't see anything as the output is not captured. Overall onnxruntime building is extremely brittle and when it fails it is impossible to figure out why. I would love to use precompiled binaries but I can't as you don't provide any binaries that can use any make GPU which we must do as we don't know what our customers have. I don't actually understand why onnxruntime.dll/so can't always support all providers so that we can then download provider packages which work with any onnxruntime libraries. I have now spent man months trying to compile this (after a consultant tried for 6 months with limited success). And then I haven't even started on the worst platforms we need it to work on, Android and iOS. There must be a better way! |
I get this: tools_python_utils [INFO] - flatbuffers module is not installed. parse_config will not be available Which I treated as a warning. I have no idea what parse_config is and I don't know if it wants me to pip install flatbuffers or apt install flatbuffers. My hope was that maybe this has to do with not being able to show any error messages when cmake fails, but that's clutching at straws really. |
that message is coming from onnxruntime/tools/python/util/__init__.py Line 14 in dabd395
you can pip install flatbuffers , but this seems unrelated to your build issues since it's just an INFO message. is it still hanging building the same obj file flash_fwd_hdm224_fp16_sm80 ? onnxruntime/tools/ci_build/build.py Line 881 in dabd395
it tries to estimate the number of nvcc_threads to use, but needs psutil to do so. otherwise it defaults to nvcc_threads=1 there are 2 other experiments you can try.
|
I've had so many different issues with this. I'm still not even sure that I was able to enable more memory to docker. I put a free -m in the docker run step but apparently it reports the memory in the host, not in the docker image. So now I'm running with a hard limit nvcc_thread = 1 I hacked into your build.py in my fork for testing. Before that the latest run ended with this: Finished fetching external dependencies This is the last output of the stderr stream of the cmake config step, Cmake then returns 0 as build.py continues with the cmake --build step which hangs without any output for 8 hours, despite the fact that I give a 2 hour timeout to Python's subprocess.run, so cmake seems not to be responding to signals while building. Its fairly frustrating. Which I have asked about before. It would seem that we have a too-old nvcc, but that doesn't explain why it hangs. Our docker image starts "FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04". We then explicitly install gcc 11.2. I hope they haven't removed this option. We've had strange errors before where it turned out that the problem was not actually in the cuda compiler itself but when it launched the host compiler. From this: https://itecnote.com/tecnote/c-nvcc-strange-interaction-with-xcompiler/ it seems that the option may be directed to the wrong compiler. Well, I grepped for strict-alias in your .py and .cmake files and came up with a few no-strict-aliases and a few strict-aliases in nlohmann_json's cmake files but I hope you are not compiling json parsers with nvcc.. Now I rebuilt again and now I get another error possibly related to the new Eigen version that was introduced magically as your deps.txt didn't point to a fixed file. I blindly updated the hash to "actual" yesterday in my fork and this version causes an error: In file included from /src/onnxruntime/build_gpu/Linux/Release/_deps/eigen-src/unsupported/Eigen/CXX11/Tensor:59, My guess is that main branch already has a fix for this, so I will merge it into my fork again. |
i don't think your nvcc is too old. cuda 11.8 is fully supported. as i mentioned previously, can you try using our dockerfile as reference |
I finally got this working. I can't really tell at this point what change made it work, too much went on in the process. |
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
Works now. |
Describe the issue
When trying to build onnxruntime with -use_tensorrt I get basically the error that cmake --build can't find any of the files that should have been created by the cmake configuration run.
Unfortunately build.py does not provide a way to see the output of cmake if there is no error return. Equally unfortunately cmake does not seem to return an error code even though it erred out and didn't return any files.
I hacked build.py to unconditionally print stdout and stderr from the cmake subprocess and got this error message:
NVCC_ERROR =
NVCC_OUT = No such file or directory
CMake Error at /usr/local/share/cmake-3.27/Modules/CMakeDetermineCUDACompiler.cmake:603 (message):
Failed to detect a default CUDA architecture.
Compiler output:
Call Stack (most recent call first):
CMakeLists.txt:674 (enable_language)
There was also earlier in stderr a complaint about CMP0104 being "OLD":
CMake Deprecation Warning at CMakeLists.txt:14 (cmake_policy):
The OLD behavior for policy CMP0104 will be removed from a future version
of CMake.
This seems to be related to not setting up any default CUDA architecture.
I don't know if it is normal that building tensorrt requires this.
My initial thought was that building tensorrt without also building cuda was not supported, as there seemed to be some code that sets the variable CMAKE_CUDA_ARCHITECTURES mentioned in https://cmake.org/cmake/help/latest/policy/CMP0104.html
This didn't help at all.
Now I'm totally at a loss.
Urgency
Medium urgency. There is still time before our deadline, we just want to be able to build onnxruntime on all paltforms with the best providers for each graphics board make. We don't know what make our customers have so we need to have an omnipotent onnxruntime library which does not seem to exist as precompiled binaries.
Target platform
Linux 64 bit, ubuntu 20.04 using NVIDIA docker image
Build script
build.py --build_dir /src/onnxruntime/build/Linux --config Release --parallel 4 --build_shared_lib --build_dir build_gpu/Linux --skip_tests --use_tensorrt --cuda_version=11.8 --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda --tensorrt_home /usr/include/x86_64-linux-gnu
Note that the --tensorrt_home directory is what I could guess after installing apt install tensorrt-dev. This is not documented and the frustrated questions found on the web have not been answered. It would be appreciated if the installation instructions were a bit more elabroate than "install tenstorrt". Took me days to figure out that what you mean was tensorrt-dev and then to find the directory of the header files.
I don't think that this directory selection is the root cause in this case though.
Error / output
NVCC_ERROR =
NVCC_OUT = No such file or directory
CMake Error at /usr/local/share/cmake-3.27/Modules/CMakeDetermineCUDACompiler.cmake:603 (message):
Failed to detect a default CUDA architecture.
Compiler output:
Call Stack (most recent call first):
CMakeLists.txt:674 (enable_language)
printed to stderr by the cmake configure step but this does not stop build.py from trying to do cmake --build which of course fails as there are no files for it to act on.
I also print all thearguments you send to cmake, but I have no idee what to look for so here you go:
Namespace(acl_home=None, acl_libs=None, allow_running_as_root=False, android=False, android_abi='arm64-v8a', android_api=27, android_cpp_shared=False, android_ndk_path='', android_run_emulator=False, android_sdk_path='', apple_deploy_target=None, arm=False, arm64=False, arm64ec=False, armnn_bn=False, armnn_home=None, armnn_libs=None, armnn_relu=False, build=False, build_apple_framework=False, build_csharp=False, build_dir='build_gpu/Linux', build_java=False, build_micro_benchmarks=False, build_nodejs=False, build_nuget=False, build_objc=False, build_shared_lib=True, build_wasm=False, build_wasm_static_lib=False, build_wheel=False, cann_home=None, clean=False, cmake_extra_defines=None, cmake_generator=None, cmake_path='cmake', code_coverage=False, compile_no_warning_as_error=False, config=['Release'], ctest_path='ctest', cuda_home='/usr/local/cuda', cuda_version='11.8', cudnn_home='/usr/local/cuda', disable_contrib_ops=False, disable_exceptions=False, disable_memleak_checker=False, disable_ml_ops=False, disable_rtti=False, disable_types=[], disable_wasm_exception_catching=False, dml_external_project=False, dml_path='', dnnl_gpu_runtime='', dnnl_opencl_root='', eigen_path=None, emscripten_settings=None, emsdk_version='3.1.44', enable_cuda_line_info=False, enable_cuda_profiling=False, enable_external_custom_op_schemas=False, enable_language_interop_ops=False, enable_lazy_tensor=False, enable_lto=False, enable_memory_profile=False, enable_msinternal=False, enable_msvc_static_runtime=False, enable_nccl=False, enable_nvtx_profile=False, enable_onnx_tests=False, enable_pybind=False, enable_reduced_operator_type_support=False, enable_rocm_profiling=False, enable_symbolic_shape_infer_tests=False, enable_training=False, enable_training_apis=False, enable_training_ops=False, enable_transformers_tool_test=False, enable_wasm_api_exception_catching=False, enable_wasm_debug_info=False, enable_wasm_exception_throwing_override=True, enable_wasm_profiling=False, enable_wasm_simd=False, enable_wasm_threads=False, enable_wcos=False, extensions_overridden_path=None, external_graph_transformer_path=None, fuzz_testing=False, gdk_edition='.', gdk_platform='Scarlett', gen_api_doc=False, gen_doc=None, include_ops_by_config=None, ios=False, ios_sysroot='', ios_toolchain_file='', llvm_config='', llvm_path=None, migraphx_home=None, minimal_build=None, mpi_home=None, ms_experimental=False, msbuild_extra_options=None, msvc_toolset=None, nccl_home=None, nnapi_min_api=None, numpy_version=None, nvcc_threads=-1, osx_arch='x86_64', parallel=4, path_to_protoc_exe=None, qnn_home=None, rocm_home=None, rocm_version=None, skip_keras_test=False, skip_nodejs_tests=False, skip_onnx_tests=False, skip_submodule_sync=False, skip_tests=True, skip_winml_tests=False, snpe_root=None, target=None, tensorrt_home='/usr/include/x86_64-linux-gnu', test=False, test_all_timeout='10800', tvm_cuda_runtime=False, update=False, use_acl=None, use_armnn=False, use_azure=False, use_cache=False, use_cann=False, use_coreml=False, use_cuda=False, use_dml=False, use_dnnl=False, use_extensions=False, use_full_protobuf=False, use_gdk=False, use_jsep=False, use_lock_free_queue=False, use_migraphx=False, use_mimalloc=False, use_mpi=False, use_nnapi=False, use_openvino=None, use_preinstalled_eigen=False, use_qnn=False, use_rknpu=False, use_rocm=False, use_snpe=False, use_telemetry=False, use_tensorrt=True, use_tensorrt_builtin_parser=True, use_tensorrt_oss_parser=False, use_triton_kernel=False, use_tvm=False, use_tvm_hash=False, use_vitisai=False, use_webnn=False, use_winml=False, use_xnnpack=False, wasm_malloc=None, wasm_run_tests_in_browser=False, wheel_name_suffix=None, windows_sdk_version=None, winml_root_namespace_override=None, x86=False, xcode_code_signing_identity='', xcode_code_signing_team_id='')
Failed to import psutil. Please
pip install psutil
for better estimation of nvcc threads. Use nvcc_threads=1Making dir: build_gpu/Linux/Release
Calling cmake with 91 arguments.
Visual Studio Version
No response
GCC / Compiler Version
gcc 10.4, cmake 3.27, pthon 3.10 running in docker image FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
The text was updated successfully, but these errors were encountered: