Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can ORT be built with BuildTools 16.11? #17693

Open
nv-kmcgill53 opened this issue Sep 25, 2023 · 8 comments
Open

Can ORT be built with BuildTools 16.11? #17693

nv-kmcgill53 opened this issue Sep 25, 2023 · 8 comments
Labels
ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:windows issues related to the Windows platform stale issues that have not been addressed in a while; categorized by a bot

Comments

@nv-kmcgill53
Copy link

Describe the issue

Tritonserver is seeing a compiler regression when using BuildTools 17. My understanding is that ORT 1.16.0 must be built with BuildTools 17, is it possible to build ORT with BuildTools 16.*?

When building Tritonserver using BuildTools 17 with Release flags set, At runtime we observe a heap corruption error https://stackoverflow.com/questions/23471161/critical-error-detected-c0000374-c-dll-returns-pointer-off-allocated-memory. However, when building a Debug version of the application, we see no error and proper functionality.

If we are able to build ORT with an earlier version of BuildTools, this would unblock us as we can revert the version we are using to build our application.

To reproduce

N/A

Urgency

This is urgent as it's blocking our windows release

Platform

Windows

OS Version

mcr.microsoft.com/windows:10.0.19042.1889

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA, TensorRT

Execution Provider Library Version

CUDA 12.2.1

@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:windows issues related to the Windows platform labels Sep 25, 2023
@mc-nv
Copy link
Contributor

mc-nv commented Sep 25, 2023

Given build had used following tools:

BUILDTOOLS_VERSION=17.7.34024.191 
CMAKE_VERSION=3.27.1 
CUDA_VERSION=12.2.1 
CUDNN_VERSION=8.9.5.27 
PYTHON_VERSION=3.8.10 
TENSORRT_VERSION=8.6.1.6 
VCPGK_VERSION=2023.07.21

@snnn
Copy link
Member

snnn commented Sep 26, 2023

No.
Each ORT release is only tested with one Visual Studio version. The last release used visual studio 17.6. In order to make it work with 16.11, I think you need to make some code changes. If visual studio 17.6 doesn't work Triton, you may try visual studio 17.7 or visual studio 17.5 or some other 17.x versions.

@mc-nv
Copy link
Contributor

mc-nv commented Sep 26, 2023

Hi @snnn

Thank you for the given recommendation to regarding VisualStudio version change.

I would like to admit that we are using BuildTools of the VisualStudio installed from the following MS resource. Which is literally the same as your recommendation.

https://learn.microsoft.com/en-us/visualstudio/releases/2022/release-history

If you have a straight forward recommendation to use a specific version of VisualStudio which can solve the described above issue we will appreciate such a guidance.

@snnn
Copy link
Member

snnn commented Sep 26, 2023

We are preparing a patch release: ONNX Runtime 1.16.1. It will be built with Visual Studio 2022 17.7. I don't have experience with BuildTools, would love to learn more about it.

@pranavsharma
Copy link
Contributor

@nv-kmcgill53 We've not tested with 17.7. It's best to use 17.6 given this compiler issue.

@mc-nv
Copy link
Contributor

mc-nv commented Oct 4, 2023

I've tried to build it with BUILDTOOLS_VERSION:17.6.34031.178 with the same result...

Any ideas are welcome to help us understand the root cause...

@snnn
Copy link
Member

snnn commented Oct 5, 2023

Even in Release mode, VC still can generate debug symbols. The symbols usually are in separated PDB files. I think as the first step we should get a stacktrace when the crash happened. However, the stacktrace might not contain the culprit function, because the heap corruption might already happened much earlier. But it is still helpful. I don't except something was wrong in the compiler. I expect it would be a code error in Tritonserver and its dependencies ( like ORT). We need more information to proceed. You can also try application verifier, which is what we usually use for debugging memory issues.

Copy link
Contributor

github-actions bot commented Nov 4, 2023

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:windows issues related to the Windows platform stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

4 participants