RPCA-MKL

RPCA-MKL is a Intel MKL-based, out-of-core C++ implementation of randomized SVD for rank-k approximation of large matrices. It is primarily intended to be used via an R wrapper.

R Package Installation

Clone this git repository
cd fastRPCA
R CMD INSTALL . --no-staged-install

Please see the documentation for usage: ?fastPCA

Note: The pre-compiled MKL libraries are only included for Linux and OS X, so this installation will not work for Windows. Windows users can see Development for details about compling the code.

R Testing Suite

Test the code by running the following:

source(sprintf("%s/test.R", system.file("tests", package="fastRPCA")))

Features

All matrix algebra is done with Intel MKL (pre-compiled version already linked) making it extremely fast
Row-centering and column-centering (without duplicating the matrix)
The calculations are 'blocked' allowing it to be 'out-of-core'. This functionality has not been tested appropriately; please see oocRPCA for a functioning version.
- CSV files: when too large for the memory, read block by block from the hard drive
- BED files: when too large for the memory, stored in a compressed 2 bit-per-SNP format, and then decompressed block by block for calculations

Compiling from source

This packages uses Intel Math Kernel Library, which has highly optimized implementations of BLAS and LAPACK (Free download here ). The lib folder contains a custom built shared library (see lib/generate_custom_mkl.sh), but the headers cannot be distributed. As such, to compile from source, Intel MKL must be installed on your machine.

Update the MKL_INCLUDE path in the Makefile. Make sure the compiler in the CC variable is openmp compatible. On OS X, for example, you can use homebrew to install libiomp5 for Clang/LLVM. Just be sure that it is the compiler being used.
Run make in the src folder.
You may need to export the LD_LIBRARY_PATH or (on OS X) DYLD_LIBRARY_PATH so that the new executable can find the necessary dynamic libraries in the lib folder. For example: export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/user/Downloads/fastPCA/lib".

References

This implementation is based on Algorithm 2 of the following paper:

George C. Linderman, Manas Rachh, Jeremy G. Hoskins, Stefan Steinerberger, Yuval Kluger. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nature Methods. (link)

An excellent reference for randomized SVD is the following paper:

Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions." SIAM review 53.2 (2011): 217-288.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RPCA-MKL

R Package Installation

R Testing Suite

Features

Compiling from source

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

RPCA-MKL

R Package Installation

R Testing Suite

Features

Compiling from source

References