Cannot use with cuda in Window #409

thunanguyen · 2020-01-08T02:32:45Z

Hi, I cannot compile wwith the flag -d:cudnn in Window

My computer use:
GTX 1060
cuda 10.0
cudnn 7

mratsim · 2020-01-08T17:52:47Z

Yes I didn't configure Cuda/Cudnn for windows. I didn't have a Windows PC with a Nvidia GPU when I wrote the Cuda backend.

Hopefuly the only thing needed is to change the paths to the compiler here:

Arraymancer/nim.cfg

Lines 3 to 32 in 5bf4327

    
           @if cudnn: 
        
             define:"cuda" 
        
           @end 
        
           # Nim cfg is not aware of new define within the file: https://github.com/nim-lang/Nim/issues/6698 
        
           @if cuda or cudnn: 
        
             # compile with "cpp" backend. 
        
             # See https://github.com/mratsim/Arraymancer/issues/371 
        
             # Nvidia NVCC 
        
             cincludes:"/opt/cuda/include" 
        
             cc:"gcc" 
        
             # Compilation for Cuda requires C++ 
        
             gcc.cpp.exe:"/opt/cuda/bin/nvcc" 
        
             gcc.cpp.linkerexe:"/opt/cuda/bin/nvcc" 
        
             gcc.cpp.options.debug: "-Xcompiler -Og" # Additional "-Xcompiler -g3" crashes stb_image macros 
        
             gcc.cpp.options.speed: "-Xcompiler -O3 -Xcompiler -fno-strict-aliasing" 
        
             gcc.cpp.options.size: "-Xcompiler -Os" 
        
             # Important sm_61 architecture corresponds to Pascal and sm_75 to Turing. Change for your own card 
        
             gcc.cpp.options.always:"-gencode arch=compute_61,code=sm_61 -gencode arch=compute_75,code=sm_75 --x cu -Xcompiler -fpermissive" 
        
             # Clang 
        
             # cincludes:"/opt/cuda/include" 
        
             # clibdir:"/opt/cuda/lib" 
        
             # cc:"clang" 
        
             # # Compile for Pascal (6.1) and Turing cards (7.5) 
        
             # clang.cpp.options.always:"--cuda-path=/opt/cuda -lcudart_static -x cuda --cuda-gpu-arch=sm_61 --cuda-gpu-arch=sm_75" 
        
           @end

thunanguyen · 2020-01-10T16:53:05Z

thanks, I've found that and changed already. But I wonder are there any ways to a unified config file for all OS ?

mratsim · 2020-01-10T19:13:40Z

Ideally I want to completely do away with the config files. I don't like them at all but at the moment they are a necessary evil.

Let's go over the config file:

Arraymancer/nim.cfg

Lines 3 to 32 in 71ccad0

    
           @if cudnn: 
        
             define:"cuda" 
        
           @end 
        
           # Nim cfg is not aware of new define within the file: https://github.com/nim-lang/Nim/issues/6698 
        
           @if cuda or cudnn: 
        
             # compile with "cpp" backend. 
        
             # See https://github.com/mratsim/Arraymancer/issues/371 
        
             # Nvidia NVCC 
        
             cincludes:"/opt/cuda/include" 
        
             cc:"gcc" 
        
             # Compilation for Cuda requires C++ 
        
             gcc.cpp.exe:"/opt/cuda/bin/nvcc" 
        
             gcc.cpp.linkerexe:"/opt/cuda/bin/nvcc" 
        
             gcc.cpp.options.debug: "-Xcompiler -Og" # Additional "-Xcompiler -g3" crashes stb_image macros 
        
             gcc.cpp.options.speed: "-Xcompiler -O3 -Xcompiler -fno-strict-aliasing" 
        
             gcc.cpp.options.size: "-Xcompiler -Os" 
        
             # Important sm_61 architecture corresponds to Pascal and sm_75 to Turing. Change for your own card 
        
             gcc.cpp.options.always:"-gencode arch=compute_61,code=sm_61 -gencode arch=compute_75,code=sm_75 --x cu -Xcompiler -fpermissive" 
        
             # Clang 
        
             # cincludes:"/opt/cuda/include" 
        
             # clibdir:"/opt/cuda/lib" 
        
             # cc:"clang" 
        
             # # Compile for Pascal (6.1) and Turing cards (7.5) 
        
             # clang.cpp.options.always:"--cuda-path=/opt/cuda -lcudart_static -x cuda --cuda-gpu-arch=sm_61 --cuda-gpu-arch=sm_75" 
        
           @end

This could be removed if NVCC becomes an official compiler in Nim with the proper flags.
This would be configured here I guess: https://github.com/nim-lang/Nim/blob/devel/config/nim.cfg

There is still one issue left though: either I distribute precompiled CUDA-enabled binaries or users ensure that CUDA/nvcc is in their path which AFAIK isn't automatic on Windows.

I tried using Clang but Clang support always has a versioning delay with CUDA, see #372 (comment) when I tried in August

clang++	supported CUDA release	supported SMs
3.9-5.0	7.0-8.0	2.0-(5.0)6.0
6.0	7.0-9.0	(2.0)3.0-7.0
7.0	7.0-9.2	(2.0)3.0-7.2
8.0	7.0-10.0	(2.0)3.0-7.5
trunk	7.0-10.1	(2.0)3.0-7.5

Apparently the new Clang 9.1 supports Cuda 10.1 (though Clang is still annoying to install on Windows)

Arraymancer/nim.cfg

Lines 34 to 41 in 71ccad0

    
           @if openblas: 
        
             define:"blas=openblas" # For nimblas 
        
             define:"lapack=openblas" # For nimlapack 
        
             @if macosx: 
        
               clibdir:"/usr/local/opt/openblas/lib" 
        
               cincludes:"/usr/local/opt/openblas/include" 
        
             @end 
        
           @end

and

Arraymancer/nim.cfg

Lines 47 to 57 in 71ccad0

    
           @if mkl: # MKL multi_threaded 
        
             define:"openmp" 
        
             define:"blas=mkl_intel_lp64" 
        
             define:"lapack=mkl_intel_lp64" 
        
             clibdir:"/opt/intel/mkl/lib/intel64" 
        
             passl:"/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a" 
        
             passl:"-lmkl_core" 
        
             passl:"-lmkl_gnu_thread" 
        
             passl:"-lgomp" 
        
             dynlibOverride:"mkl_intel_lp64" 
        
           @end

Would require to replace BLAS and LAPACK which is something I've started to do in Laser but besides the matrix multiplication which is now as fast as OpenBLAS in pure Nim, I need QR decomposition, LU decomposition, Eigenvalues and solver implemented as well (from https://github.com/mratsim/Arraymancer/tree/master/src/linear_algebra/helpers)

Arraymancer/nim.cfg

Lines 59 to 67 in 71ccad0

    
           @if openmp or mkl: 
        
             stackTrace:off 
        
             @if macosx: # Default compiler on Mac is clang without OpenMP and gcc is an alias to clang. 
        
                         # Use Homebrew GCC instead for OpenMP support. GCC (v7), must be properly linked via `brew link gcc` 
        
               cc:"gcc" 
        
               gcc.exe:"/usr/local/bin/gcc-7" 
        
               gcc.linkerexe:"/usr/local/bin/gcc-7" 
        
             @end 
        
           @end

would require removing OpenMP, for which Weave is a very viable alternative and actually probably much better as it does not suffer from: https://github.com/zy97140/omp-benchmark-for-pytorch

Arraymancer/nim.cfg

Lines 69 to 81 in 71ccad0

    
           # ############################################################ 
        
           # 
        
           #                    SIMD flags 
        
           # 
        
           # ############################################################ 
        
           gemm_ukernel_sse.always = "-msse" 
        
           gemm_ukernel_sse2.always = "-msse2" 
        
           gemm_ukernel_sse4_1.always = "-msse4.1" 
        
           gemm_ukernel_avx.always = "-mavx" 
        
           gemm_ukernel_avx_fma.always = "-mavx -mfma" 
        
           gemm_ukernel_avx2.always = "-mavx2" 
        
           gemm_ukernel_avx512.always = "-mavx512f -mavx512dq"

This isn't necessary anymore since: nim-lang/Nim#12662
I did implement the changes in Weave's benchmarks and Arraymancer can use a similar scheme in the same files.

mratsim added Cuda Documentation ergonomics Windows labels Jan 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot use with cuda in Window #409

Cannot use with cuda in Window #409

thunanguyen commented Jan 8, 2020

mratsim commented Jan 8, 2020

thunanguyen commented Jan 10, 2020 •

edited

Loading

mratsim commented Jan 10, 2020

Cannot use with cuda in Window #409

Cannot use with cuda in Window #409

Comments

thunanguyen commented Jan 8, 2020

mratsim commented Jan 8, 2020

thunanguyen commented Jan 10, 2020 • edited Loading

mratsim commented Jan 10, 2020

thunanguyen commented Jan 10, 2020 •

edited

Loading