Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use with cuda in Window #409

Open
thunanguyen opened this issue Jan 8, 2020 · 3 comments
Open

Cannot use with cuda in Window #409

thunanguyen opened this issue Jan 8, 2020 · 3 comments

Comments

@thunanguyen
Copy link

Hi, I cannot compile wwith the flag -d:cudnn in Window

My computer use:
GTX 1060
cuda 10.0
cudnn 7

@mratsim
Copy link
Owner

mratsim commented Jan 8, 2020

Yes I didn't configure Cuda/Cudnn for windows. I didn't have a Windows PC with a Nvidia GPU when I wrote the Cuda backend.

Hopefuly the only thing needed is to change the paths to the compiler here:

Arraymancer/nim.cfg

Lines 3 to 32 in 5bf4327

@if cudnn:
define:"cuda"
@end
# Nim cfg is not aware of new define within the file: https://github.com/nim-lang/Nim/issues/6698
@if cuda or cudnn:
# compile with "cpp" backend.
# See https://github.com/mratsim/Arraymancer/issues/371
# Nvidia NVCC
cincludes:"/opt/cuda/include"
cc:"gcc"
# Compilation for Cuda requires C++
gcc.cpp.exe:"/opt/cuda/bin/nvcc"
gcc.cpp.linkerexe:"/opt/cuda/bin/nvcc"
gcc.cpp.options.debug: "-Xcompiler -Og" # Additional "-Xcompiler -g3" crashes stb_image macros
gcc.cpp.options.speed: "-Xcompiler -O3 -Xcompiler -fno-strict-aliasing"
gcc.cpp.options.size: "-Xcompiler -Os"
# Important sm_61 architecture corresponds to Pascal and sm_75 to Turing. Change for your own card
gcc.cpp.options.always:"-gencode arch=compute_61,code=sm_61 -gencode arch=compute_75,code=sm_75 --x cu -Xcompiler -fpermissive"
# Clang
# cincludes:"/opt/cuda/include"
# clibdir:"/opt/cuda/lib"
# cc:"clang"
# # Compile for Pascal (6.1) and Turing cards (7.5)
# clang.cpp.options.always:"--cuda-path=/opt/cuda -lcudart_static -x cuda --cuda-gpu-arch=sm_61 --cuda-gpu-arch=sm_75"
@end

@thunanguyen
Copy link
Author

thunanguyen commented Jan 10, 2020

thanks, I've found that and changed already. But I wonder are there any ways to a unified config file for all OS ?

@mratsim
Copy link
Owner

mratsim commented Jan 10, 2020

Ideally I want to completely do away with the config files. I don't like them at all but at the moment they are a necessary evil.

Let's go over the config file:

Arraymancer/nim.cfg

Lines 3 to 32 in 71ccad0

@if cudnn:
define:"cuda"
@end
# Nim cfg is not aware of new define within the file: https://github.com/nim-lang/Nim/issues/6698
@if cuda or cudnn:
# compile with "cpp" backend.
# See https://github.com/mratsim/Arraymancer/issues/371
# Nvidia NVCC
cincludes:"/opt/cuda/include"
cc:"gcc"
# Compilation for Cuda requires C++
gcc.cpp.exe:"/opt/cuda/bin/nvcc"
gcc.cpp.linkerexe:"/opt/cuda/bin/nvcc"
gcc.cpp.options.debug: "-Xcompiler -Og" # Additional "-Xcompiler -g3" crashes stb_image macros
gcc.cpp.options.speed: "-Xcompiler -O3 -Xcompiler -fno-strict-aliasing"
gcc.cpp.options.size: "-Xcompiler -Os"
# Important sm_61 architecture corresponds to Pascal and sm_75 to Turing. Change for your own card
gcc.cpp.options.always:"-gencode arch=compute_61,code=sm_61 -gencode arch=compute_75,code=sm_75 --x cu -Xcompiler -fpermissive"
# Clang
# cincludes:"/opt/cuda/include"
# clibdir:"/opt/cuda/lib"
# cc:"clang"
# # Compile for Pascal (6.1) and Turing cards (7.5)
# clang.cpp.options.always:"--cuda-path=/opt/cuda -lcudart_static -x cuda --cuda-gpu-arch=sm_61 --cuda-gpu-arch=sm_75"
@end

This could be removed if NVCC becomes an official compiler in Nim with the proper flags.
This would be configured here I guess: https://github.com/nim-lang/Nim/blob/devel/config/nim.cfg

There is still one issue left though: either I distribute precompiled CUDA-enabled binaries or users ensure that CUDA/nvcc is in their path which AFAIK isn't automatic on Windows.

I tried using Clang but Clang support always has a versioning delay with CUDA, see #372 (comment) when I tried in August

clang++ supported CUDA release supported SMs
3.9-5.0 7.0-8.0 2.0-(5.0)6.0
6.0 7.0-9.0 (2.0)3.0-7.0
7.0 7.0-9.2 (2.0)3.0-7.2
8.0 7.0-10.0 (2.0)3.0-7.5
trunk 7.0-10.1 (2.0)3.0-7.5

Apparently the new Clang 9.1 supports Cuda 10.1 (though Clang is still annoying to install on Windows)

Arraymancer/nim.cfg

Lines 34 to 41 in 71ccad0

@if openblas:
define:"blas=openblas" # For nimblas
define:"lapack=openblas" # For nimlapack
@if macosx:
clibdir:"/usr/local/opt/openblas/lib"
cincludes:"/usr/local/opt/openblas/include"
@end
@end

and

Arraymancer/nim.cfg

Lines 47 to 57 in 71ccad0

@if mkl: # MKL multi_threaded
define:"openmp"
define:"blas=mkl_intel_lp64"
define:"lapack=mkl_intel_lp64"
clibdir:"/opt/intel/mkl/lib/intel64"
passl:"/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a"
passl:"-lmkl_core"
passl:"-lmkl_gnu_thread"
passl:"-lgomp"
dynlibOverride:"mkl_intel_lp64"
@end

Would require to replace BLAS and LAPACK which is something I've started to do in Laser but besides the matrix multiplication which is now as fast as OpenBLAS in pure Nim, I need QR decomposition, LU decomposition, Eigenvalues and solver implemented as well (from https://github.com/mratsim/Arraymancer/tree/master/src/linear_algebra/helpers)

Arraymancer/nim.cfg

Lines 59 to 67 in 71ccad0

@if openmp or mkl:
stackTrace:off
@if macosx: # Default compiler on Mac is clang without OpenMP and gcc is an alias to clang.
# Use Homebrew GCC instead for OpenMP support. GCC (v7), must be properly linked via `brew link gcc`
cc:"gcc"
gcc.exe:"/usr/local/bin/gcc-7"
gcc.linkerexe:"/usr/local/bin/gcc-7"
@end
@end

would require removing OpenMP, for which Weave is a very viable alternative and actually probably much better as it does not suffer from: https://github.com/zy97140/omp-benchmark-for-pytorch

Arraymancer/nim.cfg

Lines 69 to 81 in 71ccad0

# ############################################################
#
# SIMD flags
#
# ############################################################
gemm_ukernel_sse.always = "-msse"
gemm_ukernel_sse2.always = "-msse2"
gemm_ukernel_sse4_1.always = "-msse4.1"
gemm_ukernel_avx.always = "-mavx"
gemm_ukernel_avx_fma.always = "-mavx -mfma"
gemm_ukernel_avx2.always = "-mavx2"
gemm_ukernel_avx512.always = "-mavx512f -mavx512dq"

This isn't necessary anymore since: nim-lang/Nim#12662
I did implement the changes in Weave's benchmarks and Arraymancer can use a similar scheme in the same files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants