-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Build] ONNX Runtime build fails OOM (v1.20.x) #22859
Comments
@snnn for viz |
Use " --parallel <n>" to reduce the parallelism. |
It is more about how much memory you have for each CPU core than how much memory you have in total. |
See linux build uses |
Sorry my response was eaten by a part because of formatting. I meant, put a number there after "--parallel", to limit the number of concurrent processes. Let's say you have 64GB memory and 16 CPUs. By default make/msbuild will create at most 16 subprocesses. Since we do not know if 4GB is enough for one compiler process, sometimes we might need to manually adjust the parallelism to avoid OOM. |
Sounds like a suggestion to have 8Gb per process, am I right? |
See in my scenario we don't set limit to parallel jobs and using default which "1" by default: https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/build.py#L171 What will be the reason to set limit to 2 or 4 if we failing with OOO using single process? |
Actually the default is not one. If the optional value is 0 or unspecified, it is interpreted as the number of CPUs. As you know how much CPUs the machine has, you may start with dividing it by half. For example, if we think the default value is 16, we try 8 first. If the error still exists, we decrease it further. Eventually it will pass because 64GB is definitely enough for one single compiler processs. |
You may also need to tune the "--nvcc_threads" parameter. To be safe, you can set it to one. |
My windows build environment has 2 CPUs. |
Estimated memory usage is nvcc_threads * parallel * 8GB so you will need at least 16 GB memory for |
With single thread I'm no-longer see the OOM on Windows with 64GB of RAM. Need to validate/confirm same behavior on Linux. |
Keep seeing same issue on Linux, may need to add
|
The issue caused the problem is that usage of |
@tianleiwu , what do you think? |
@snnn, It's fine to use |
Why does it miscalculate --nvcc_threads and --parallel values if utils packages is in use? @mc-nv , could you please elaborate more? Is it something that we can fix? |
Seems it's rely on CPU count, but by far CPU count can be high while memory limit low. |
Describe the issue
Getting issue trying to compile against
rel-1.20.0
branch.We are getting out of memory issue, for both Linux and Windows platforms.
windows config (64GB RAM):
LInux (64GB RAM):
Urgency
ASAP
Target platform
Linux, Windows
Build script
Windows:
linux:
Error / output
No error, container fails out of memory.
Visual Studio Version
No response
GCC / Compiler Version
No response
The text was updated successfully, but these errors were encountered: