Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Release 3.5.0 build failing on Windows using CUDA 12.6, and VS2022 17.11 #1732

Closed
levicki opened this issue Aug 21, 2024 · 13 comments
Closed
Labels

Comments

@levicki
Copy link

levicki commented Aug 21, 2024

Describe the bug
I initially reported this issue to xformers since xformers build was failing for me without realizing error was in CUTLASS submodule. After some back and forth and more testing on my end I realized the issue seems to be with CUTLASS 3.5.0.

Steps/Code to reproduce bug

  1. Install Visual Studio 2022 17.11.0 with C++ Desktop Development workload
  2. Install CUDA toolkit 12.6
  3. git clone https://github.com/NVIDIA/cutlass
  4. cd cutlass
  5. git checkout v3.5.0
  6. cmake-gui
  7. Select VS 2022
  8. Select x64
  9. Leave native compiler
  10. Click Configure
  11. Click Generate
  12. Click Open project
  13. Select Release
  14. Click Build

Expected behavior
Build should succeed, it is failing on this (please ignore C:/BUILD/xformers prefix -- the same compilation errors happen from within Visual Studio build):

C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): warning C4346: 'SharedStorage': dependent name is not a type
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): note: prefix the qualified-id with 'typename' to indicate a type
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): note: the template instantiation context (the oldest one first) is
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(60): note: while compiling class template partial specialization 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(124): note: while compiling class 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>::SharedStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(133): note: while compiling class 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>::SharedStorage::PipelineStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): error C2061: syntax error: identifier 'SharedStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(140): error C3646: 'math_wg_order': unknown override specifier
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(140): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int

Note that there might be other build errors as well, this was just the first place where building a project failed. It seems as if there might be some compiler issue with latest Visual Studio update?

Environment details (please complete the following information):

  • Environment location: Bare-metal

Additional context
cl.exe Version 19.41.34120 for x64

@levicki levicki added ? - Needs Triage bug Something isn't working labels Aug 21, 2024
@levicki
Copy link
Author

levicki commented Aug 21, 2024

Your setup.py is using -std=c++17 for CXX options — MSVC syntax is -std:c++17 or /std:c++17, using GNU syntax leads to a warning about unrecognized compiler option (and probably compilation without C++17 support). Also, -O3 doesn't exist for MSVC.

@levicki
Copy link
Author

levicki commented Aug 21, 2024

The culprit is CUDA 12.6 — I can build with CUDA 12.4.1 just fine.

@levicki
Copy link
Author

levicki commented Aug 22, 2024

NVIDIA bug ID #4820029.

@thakkarV
Copy link
Collaborator

tracking. Does 3.5.1 also fail with the same issue?

@levicki
Copy link
Author

levicki commented Aug 23, 2024

tracking. Does 3.5.1 also fail with the same issue?

@thakkarV I don't see a tag for 3.5.1 and it's not in releases yet?

@thakkarV
Copy link
Collaborator

Main is 3.5.1. We will tag soon

@levicki
Copy link
Author

levicki commented Aug 23, 2024

Main is 3.5.1. We will tag soon

@thakkarV Hopefully not before this issue is root-caused and at least worked around?

@thakkarV
Copy link
Collaborator

thakkarV commented Aug 23, 2024

It appears to be a CUDA toolkit issue. If you could try out with main that would be great cause there were some MSVC fixes in 3.5.1 too

@levicki
Copy link
Author

levicki commented Aug 23, 2024

If you could try out with main that would be great cause there were some MSVC fixes in 3.5.1 too

If you mean with CUDA 12.6, can you give repro steps for some minimal build that triggers it so I don't have to run the full build?

Even better if you can isolate just relevant code part which causes compiler errors so I can try to build just that from the developer command prompt.

EDIT: If I remember correctly I tried with main as well, didn't make any difference.

@egortech
Copy link

Any update on it? It's blocking to build onnxruntime with CUDA 12.6 (microsoft/onnxruntime#21676)

@levicki
Copy link
Author

levicki commented Sep 29, 2024

Any update on it? It's blocking to build onnxruntime with CUDA 12.6 (microsoft/onnxruntime#21676)

I asked on the ticket, no response yet from engineering team.

Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@levicki
Copy link
Author

levicki commented Oct 29, 2024

Closed because NVIDIA is apparently too lazy to fix it, what with resting on their laurels, it's a full time job.

@levicki levicki closed this as completed Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants