Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] ONNX Runtime build fails OOM (v1.20.x) #22859

Closed
mc-nv opened this issue Nov 15, 2024 · 18 comments
Closed

[Build] ONNX Runtime build fails OOM (v1.20.x) #22859

mc-nv opened this issue Nov 15, 2024 · 18 comments
Labels
build build issues; typically submitted using template contributions welcome lower priority issues for the core ORT teams ep:CUDA issues related to the CUDA execution provider

Comments

@mc-nv
Copy link
Contributor

mc-nv commented Nov 15, 2024

Describe the issue

Getting issue trying to compile against rel-1.20.0 branch.
We are getting out of memory issue, for both Linux and Windows platforms.

windows config (64GB RAM):

BUILDTOOLS_VERSION:17.12.35506.116 
CMAKE_VERSION:3.30.5 
CUDA_VERSION:12.6.2 
CUDNN_VERSION:9.5.1.17 
PYTHON_VERSION:3.12.3 
TENSORRT_VERSION:10.6.0.26 
VCPGK_VERSION:2024.03.19

LInux (64GB RAM):

CMAKE_VERSION:3.28.3
CUDA_VERSION:12.6.2 
CUDNN_VERSION:9.5.1.17 
PYTHON_VERSION:3.12.3 
TENSORRT_VERSION:10.6.0.26 

Urgency

ASAP

Target platform

Linux, Windows

Build script

Windows:

onnxruntime/tools/ci_build/build.py `
   --cmake_generator "Visual Studio 17 2022" `
   --config Release `
   --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=75;80;86;90" `
   --skip_submodule_sync `
   --parallel `
   --build_shared_lib `
   --compile_no_warning_as_error `
   --skip_tests `
   --update `
   --build `
   --build_dir /workspace/build `
   --use_cuda `
   --cuda_home ${env:CUDA_PATH} `
   --cudnn_home ${env:CUDA_PATH} `
   --use_tensorrt --tensorrt_home "/tensorrt" ; `

linux:

./build.sh \
  --config Release \
  --skip_submodule_sync \
  --parallel \
  --build_shared_lib     \
  --compile_no_warning_as_error \
  --build_dir /workspace/build \
  --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES='75;80;86;90'  \
  --update \
  --build \
  --use_cuda \
  --cuda_home "/usr/local/cuda" \
  --cudnn_home "/usr" \
  --use_tensorrt \
  --use_tensorrt_builtin_parser \
  --tensorrt_home "/usr/src/tensorrt" \
  --allow_running_as_root \
  --use_openvino CPU

Error / output

No error, container fails out of memory.

Visual Studio Version

No response

GCC / Compiler Version

No response

@mc-nv mc-nv added the build build issues; typically submitted using template label Nov 15, 2024
@mc-nv
Copy link
Contributor Author

mc-nv commented Nov 15, 2024

@snnn for viz

@mc-nv mc-nv changed the title [Build] ONNX Runtime build fails OOM [Build] ONNX Runtime build fails OOM (v1.20.0) Nov 15, 2024
@snnn
Copy link
Member

snnn commented Nov 15, 2024

Use " --parallel <n>" to reduce the parallelism.

@snnn
Copy link
Member

snnn commented Nov 15, 2024

It is more about how much memory you have for each CPU core than how much memory you have in total.

@mc-nv
Copy link
Contributor Author

mc-nv commented Nov 15, 2024

See linux build uses --parallel and it heavy machines where we never see issue building ONNX Runtime.

@snnn
Copy link
Member

snnn commented Nov 15, 2024

Sorry my response was eaten by a part because of formatting. I meant, put a number there after "--parallel", to limit the number of concurrent processes. Let's say you have 64GB memory and 16 CPUs. By default make/msbuild will create at most 16 subprocesses. Since we do not know if 4GB is enough for one compiler process, sometimes we might need to manually adjust the parallelism to avoid OOM.

@mc-nv
Copy link
Contributor Author

mc-nv commented Nov 15, 2024

Sounds like a suggestion to have 8Gb per process, am I right?

@mc-nv
Copy link
Contributor Author

mc-nv commented Nov 16, 2024

Sorry my response was eaten by a part because of formatting. I meant, put a number there after "--parallel", to limit the number of concurrent processes. Let's say you have 64GB memory and 16 CPUs. By default make/msbuild will create at most 8 subprocesses. Since we do not know if 4GB is enough for one compiler process, sometimes we might need to manually adjust the parallelism to avoid OOM.

See in my scenario we don't set limit to parallel jobs and using default which "1" by default: https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/build.py#L171

What will be the reason to set limit to 2 or 4 if we failing with OOO using single process?

@snnn
Copy link
Member

snnn commented Nov 16, 2024

Actually the default is not one. If the optional value is 0 or unspecified, it is interpreted as the number of CPUs. As you know how much CPUs the machine has, you may start with dividing it by half. For example, if we think the default value is 16, we try 8 first. If the error still exists, we decrease it further. Eventually it will pass because 64GB is definitely enough for one single compiler processs.

@snnn
Copy link
Member

snnn commented Nov 16, 2024

You may also need to tune the "--nvcc_threads" parameter. To be safe, you can set it to one.

@mc-nv
Copy link
Contributor Author

mc-nv commented Nov 16, 2024

My windows build environment has 2 CPUs.

@tianleiwu
Copy link
Contributor

tianleiwu commented Nov 16, 2024

Estimated memory usage is nvcc_threads * parallel * 8GB so you will need at least 16 GB memory for --parallel 2 --nvcc_threads 1. Otherwise, try --parallel 1 --nvcc_threads 1. If you do not set them, nvcc_threads=parallel=vCPU=2, so you will need 32GB.

@mc-nv
Copy link
Contributor Author

mc-nv commented Dec 2, 2024

With single thread I'm no-longer see the OOM on Windows with 64GB of RAM.

Need to validate/confirm same behavior on Linux.

@mc-nv mc-nv changed the title [Build] ONNX Runtime build fails OOM (v1.20.0) [Build] ONNX Runtime build fails OOM (v1.20.x) Dec 4, 2024
@mc-nv
Copy link
Contributor Author

mc-nv commented Dec 4, 2024

Keep seeing same issue on Linux, may need to add --nvcc_threads 1 explicitly:

./build.sh --config Release --skip_submodule_sync --parallel 2 --build_shared_lib     --compile_no_warning_as_error --build_dir /workspace/build --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES='75;80;86;90'  --update --build --use_cuda --cuda_home "/usr/local/cuda" --cudnn_home "/usr" --use_tensorrt --use_tensorrt_builtin_parser --tensorrt_home "/usr/src/tensorrt" --allow_running_as_root --use_openvino CPU

@mc-nv
Copy link
Contributor Author

mc-nv commented Jan 9, 2025

The issue caused the problem is that usage of build.py may miscalculate --nvcc_threads and --parallel values if utils packages is in use.
I would suggest to set default values to the clear minimum which is 1.

@snnn
Copy link
Member

snnn commented Jan 9, 2025

@tianleiwu , what do you think?

@snnn snnn added the ep:CUDA issues related to the CUDA execution provider label Jan 9, 2025
@tianleiwu
Copy link
Contributor

@snnn, It's fine to use nvcc_threads=1 as default. We also need add proper --nvcc_threads to cuda related build pipelines to avoid build time regression..

@snnn snnn added the contributions welcome lower priority issues for the core ORT teams label Jan 9, 2025
@snnn
Copy link
Member

snnn commented Jan 9, 2025

Why does it miscalculate --nvcc_threads and --parallel values if utils packages is in use? @mc-nv , could you please elaborate more? Is it something that we can fix?

@mc-nv
Copy link
Contributor Author

mc-nv commented Jan 9, 2025

Seems it's rely on CPU count, but by far CPU count can be high while memory limit low.
Default value should cover corner cases or validate it.

@mc-nv mc-nv closed this as completed Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template contributions welcome lower priority issues for the core ORT teams ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

No branches or pull requests

3 participants