Skip to content

Commit

Permalink
Add single-precision kernel for calcMahalDistGpu
Browse files Browse the repository at this point in the history
We have substantially simplified the kernel, which has made it faster
and able to handle larger D. We now use this kernel for D <= 64.

For larger D, we have switched from a triangular solve (TRSM) to
computing the explicit inverse (TRTRI) and performing a general matrix
multiplication (GEMM). This is because TRSM is much slower than GEMM
on the GPU, despite the increased number of operations. For D > 2048,
we stick with TRSM.

Also, add -lmwlapack option to buildMexFiles, since calcMahalDistGpu
needs to be linked to LAPACK for the triangular inverse.
  • Loading branch information
kqshan committed May 1, 2017
1 parent 320d62a commit 9b921e3
Show file tree
Hide file tree
Showing 2 changed files with 467 additions and 445 deletions.
1 change: 1 addition & 0 deletions @MoDT/buildMexFiles.m
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ function buildMexFiles()
% Compiler/linker options
mexcuda_opts = {
'-lcublas' % Link to cuBLAS
'-lmwlapack' % Link to LAPACK
['NVCCFLAGS="' nvcc_opts '"']
['CXXFLAGS="--compiler-options=' compile_opts '"']
'-L/usr/local/cuda/lib64' % Location of CUDA libraries
Expand Down
Loading

0 comments on commit 9b921e3

Please sign in to comment.