Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] update version of "cutlass" #19891

Closed
mc-nv opened this issue Mar 13, 2024 · 7 comments
Closed

[Build] update version of "cutlass" #19891

mc-nv opened this issue Mar 13, 2024 · 7 comments
Assignees
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider

Comments

@mc-nv
Copy link
Contributor

mc-nv commented Mar 13, 2024

Describe the issue

Triton team facing build issue trying to compile ONNX Runtime.
That issue observed against CUDA 12.4 and "cutlass" v3.1.0.

Note

  1. Were able to suppress that warning with --compile_no_warning_as_error
  2. Haven't seen issue trying compile against latest version of "cutlass"

Urgency

No response

Target platform

WIN32

Build script

build.bat --cmake_generator "Visual Studio 17 2022" --config Release --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=60;61;70;75;80;86;90" --skip_submodule_sync --parallel --build_shared_lib --update --build --build_dir /workspace/build --use_cuda --cuda_version "12.4" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" --cudnn_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" --use_tensorrt --tensorrt_home "/tensorrt"

Error / output

         C:\workspace\build\Release\_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(101): error #940-D: missing return statement at end of non-void function "cute::cluster_grid_dims" [C:\workspace\build\Release\onnxruntime_providers_cuda.vcxproj] [C:\tmp\tritonbuild\onnxruntime\build\ort_target.vcxproj]
         C:\workspace\build\Release\_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(120): error #940-D: missing return statement at end of non-void function "cute::cluster_id_in_grid" [C:\workspace\build\Release\onnxruntime_providers_cuda.vcxproj] [C:\tmp\tritonbuild\onnxruntime\build\ort_target.vcxproj]
        

Visual Studio Version

BUILDTOOLS_VERSION:17.9.34622.214 CMAKE_VERSION:3.27.1 CUDA_VERSION:12.4.0 CUDNN_VERSION:9.0.0.312 PYTHON_VERSION:3.8.10 TENSORRT_VERSION:8.6.1.6 VCPGK_VERSION:2023.11.20

GCC / Compiler Version

No response

@mc-nv mc-nv added the build build issues; typically submitted using template label Mar 13, 2024
@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider labels Mar 13, 2024
@jywu-msft jywu-msft removed the ep:CUDA issues related to the CUDA execution provider label Mar 13, 2024
@jywu-msft
Copy link
Member

+@tianleiwu @yufenglee

@jywu-msft
Copy link
Member

yes, I encountered this as well. there seems to be a bug in cutlass 3.1
see: https://github.com/NVIDIA/cutlass/blob/6f47420213f757831fae65c686aa471749fa8d60/include/cute/arch/cluster_sm90.hpp#L97

latest cutlass seems to have it fixed
https://github.com/NVIDIA/cutlass/blob/ffa34e70756b0bc744e1dfcc115b5a991a68f132/include/cute/arch/cluster_sm90.hpp#L118

however when I update to cutlass 3.4.1 and retry build, I encounter other build errors
e.g.
C:\ort-118-cuda124-trt10\Release_deps\cutlass-src\include\cutlass/epilogue/warp/tile_iterator_tensor
_op_mixed.h(109): error : expression must have a constant value [C:\ort-118-cuda124-trt10\Release\onn
xruntime_providers_cuda.vcxproj]

which version of cutlass did you use to get a successful build?

@jywu-msft jywu-msft added ep:CUDA issues related to the CUDA execution provider and removed ep:TensorRT issues related to TensorRT execution provider labels Mar 13, 2024
@mc-nv
Copy link
Contributor Author

mc-nv commented Mar 13, 2024

"cutlass" compilation was against main branch.

@jywu-msft
Copy link
Member

just to confirm, you had to update cutlass to main AND also suppress warnings as error to get the build to succeed?

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 14, 2024
@snnn
Copy link
Member

snnn commented May 29, 2024

Do we need to upgrade cutlass version?

@tianleiwu
Copy link
Contributor

tianleiwu commented Jun 5, 2024

Yes, it need upgrade cutlass version to 3.5 and need some code change in ORT. I'm working on that.

@snnn snnn removed the stale issues that have not been addressed in a while; categorized by a bot label Jun 5, 2024
tianleiwu added a commit that referenced this issue Jun 11, 2024
### Description
Upgrade cutlass to 3.5 to fix build errors using CUDA 12.4 or 12.5 in
Windows
- [x] Upgrade cutlass to 3.5.0.
- [x] Fix flash attention build error with latest cutlass header files
and APIs. This fix is provided by @wangyems.
- [x] Update efficient attention to use new cutlass fmha interface.
- [x] Patch cutlass to fix `hrsqrt` not found error for sm < 53.
- [x] Disable TF32 Staged Accumulation to fix blkq4_fp16_gemm_sm80_test
build error for cuda 11.8 to 12.3.
- [x] Disable TRT 10 deprecate warnings. 

The following are not included in this PR:
* TRT provider replaces the deprecated APIs.
* Fix blkq4_fp16_gemm_sm80_test build error for cuda 12.4 or 12.5. This
test is not built by default unless you add `--cmake_extra_defines
onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON` in build command.

To integrate to rel-1.18.1: Either bring in other changes (like onnx
1.16.1), or generate manifest and upload a new ONNX Runtime Build Time
Deps artifact based on rel-1.18.1.

### Motivation and Context
#19891
#20924
#20953
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

No branches or pull requests

4 participants