Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add single-precision kernel for calcMahalDistGpu
We have substantially simplified the kernel, which has made it faster and able to handle larger D. We now use this kernel for D <= 64. For larger D, we have switched from a triangular solve (TRSM) to computing the explicit inverse (TRTRI) and performing a general matrix multiplication (GEMM). This is because TRSM is much slower than GEMM on the GPU, despite the increased number of operations. For D > 2048, we stick with TRSM. Also, add -lmwlapack option to buildMexFiles, since calcMahalDistGpu needs to be linked to LAPACK for the triangular inverse.
- Loading branch information