[BUG] Release 3.5.0 build failing on Windows using CUDA 12.6, and VS2022 17.11 #1732

levicki · 2024-08-21T12:16:15Z

Describe the bug
I initially reported this issue to xformers since xformers build was failing for me without realizing error was in CUTLASS submodule. After some back and forth and more testing on my end I realized the issue seems to be with CUTLASS 3.5.0.

Steps/Code to reproduce bug

Install Visual Studio 2022 17.11.0 with C++ Desktop Development workload
Install CUDA toolkit 12.6
git clone https://github.com/NVIDIA/cutlass
cd cutlass
git checkout v3.5.0
cmake-gui
Select VS 2022
Select x64
Leave native compiler
Click Configure
Click Generate
Click Open project
Select Release
Click Build

Expected behavior
Build should succeed, it is failing on this (please ignore C:/BUILD/xformers prefix -- the same compilation errors happen from within Visual Studio build):

C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): warning C4346: 'SharedStorage': dependent name is not a type
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): note: prefix the qualified-id with 'typename' to indicate a type
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): note: the template instantiation context (the oldest one first) is
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(60): note: while compiling class template partial specialization 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(124): note: while compiling class 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>::SharedStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(133): note: while compiling class 'cutlass::gemm::kernel::GemmUniversal<ProblemShape_,CollectiveMainloop_,CollectiveEpilogue_,TileScheduler_,enable_if<std::is_base_of_v<cutlass::gemm::KernelTmaWarpSpecializedPingpong,CollectiveMainloop_::DispatchPolicy::Schedule>,void>::type>::SharedStorage::PipelineStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(136): error C2061: syntax error: identifier 'SharedStorage'
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(140): error C3646: 'math_wg_order': unknown override specifier
C:/BUILD/xformers/third_party/cutlass/include\cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp(140): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int

Note that there might be other build errors as well, this was just the first place where building a project failed. It seems as if there might be some compiler issue with latest Visual Studio update?

Environment details (please complete the following information):

Environment location: Bare-metal

Additional context
cl.exe Version 19.41.34120 for x64

The text was updated successfully, but these errors were encountered:

levicki · 2024-08-21T16:33:12Z

Your setup.py is using -std=c++17 for CXX options — MSVC syntax is -std:c++17 or /std:c++17, using GNU syntax leads to a warning about unrecognized compiler option (and probably compilation without C++17 support). Also, -O3 doesn't exist for MSVC.

levicki · 2024-08-21T17:35:12Z

The culprit is CUDA 12.6 — I can build with CUDA 12.4.1 just fine.

levicki · 2024-08-22T16:44:48Z

NVIDIA bug ID #4820029.

thakkarV · 2024-08-22T16:49:17Z

tracking. Does 3.5.1 also fail with the same issue?

levicki · 2024-08-23T01:38:54Z

tracking. Does 3.5.1 also fail with the same issue?

@thakkarV I don't see a tag for 3.5.1 and it's not in releases yet?

thakkarV · 2024-08-23T01:55:52Z

Main is 3.5.1. We will tag soon

levicki · 2024-08-23T10:42:54Z

Main is 3.5.1. We will tag soon

@thakkarV Hopefully not before this issue is root-caused and at least worked around?

thakkarV · 2024-08-23T10:47:18Z

It appears to be a CUDA toolkit issue. If you could try out with main that would be great cause there were some MSVC fixes in 3.5.1 too

levicki · 2024-08-23T10:59:49Z

If you could try out with main that would be great cause there were some MSVC fixes in 3.5.1 too

If you mean with CUDA 12.6, can you give repro steps for some minimal build that triggers it so I don't have to run the full build?

Even better if you can isolate just relevant code part which causes compiler errors so I can try to build just that from the developer command prompt.

EDIT: If I remember correctly I tried with main as well, didn't make any difference.

egortech · 2024-09-21T23:31:37Z

Any update on it? It's blocking to build onnxruntime with CUDA 12.6 (microsoft/onnxruntime#21676)

levicki · 2024-09-29T10:42:53Z

Any update on it? It's blocking to build onnxruntime with CUDA 12.6 (microsoft/onnxruntime#21676)

I asked on the ticket, no response yet from engineering team.

github-actions · 2024-10-29T11:05:10Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

levicki · 2024-10-29T14:38:20Z

Closed because NVIDIA is apparently too lazy to fix it, what with resting on their laurels, it's a full time job.

levicki added ? - Needs Triage bug Something isn't working labels Aug 21, 2024

tianleiwu mentioned this issue Aug 21, 2024

[Build] fail to build rel-1.19.0 vs CUDA 12.6 on Windows microsoft/onnxruntime#21676

Closed

github-actions bot added the inactive-30d label Oct 29, 2024

levicki closed this as completed Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Release 3.5.0 build failing on Windows using CUDA 12.6, and VS2022 17.11 #1732

[BUG] Release 3.5.0 build failing on Windows using CUDA 12.6, and VS2022 17.11 #1732

levicki commented Aug 21, 2024

levicki commented Aug 21, 2024 •

edited

Loading

levicki commented Aug 21, 2024

levicki commented Aug 22, 2024

thakkarV commented Aug 22, 2024

levicki commented Aug 23, 2024 •

edited

Loading

thakkarV commented Aug 23, 2024

levicki commented Aug 23, 2024

thakkarV commented Aug 23, 2024 •

edited

Loading

levicki commented Aug 23, 2024 •

edited

Loading

egortech commented Sep 21, 2024

levicki commented Sep 29, 2024

github-actions bot commented Oct 29, 2024

levicki commented Oct 29, 2024 •

edited

Loading

[BUG] Release 3.5.0 build failing on Windows using CUDA 12.6, and VS2022 17.11 #1732

[BUG] Release 3.5.0 build failing on Windows using CUDA 12.6, and VS2022 17.11 #1732

Comments

levicki commented Aug 21, 2024

levicki commented Aug 21, 2024 • edited Loading

levicki commented Aug 21, 2024

levicki commented Aug 22, 2024

thakkarV commented Aug 22, 2024

levicki commented Aug 23, 2024 • edited Loading

thakkarV commented Aug 23, 2024

levicki commented Aug 23, 2024

thakkarV commented Aug 23, 2024 • edited Loading

levicki commented Aug 23, 2024 • edited Loading

egortech commented Sep 21, 2024

levicki commented Sep 29, 2024

github-actions bot commented Oct 29, 2024

levicki commented Oct 29, 2024 • edited Loading

levicki commented Aug 21, 2024 •

edited

Loading

levicki commented Aug 23, 2024 •

edited

Loading

thakkarV commented Aug 23, 2024 •

edited

Loading

levicki commented Aug 23, 2024 •

edited

Loading

levicki commented Oct 29, 2024 •

edited

Loading