This project is a C++ implementation of the Randomized Singular Value Decomposition (rSVD) algorithm. We only used the matrix operations of the Eigen
library to implement our algorithm. We do some benchmarks to compare the performance of our implementation with the Eigen
library, the result indicates that our implementation can enhance the performance of handling large or sparse matrices.
Our project leverages template programming to provide flexible and efficient implementations of key linear algebra algorithms. Specifically, the classes GivensRotationQR
, PowerMethodSVD
, and RandomizedSVD
are designed to accommodate a wide range of data types and storage formats. These classes support both dense and sparse matrices. For example:
// Dense matrix
Eigen::RandomizedSVD<double, Eigen::Dynamic, Eigen::Dynamic> rsvd;
// Sparse matrix
Eigen::RandomizedSVD<Eigen::SparseMatrix<double>>
// Row-major matrix
Eigen::RandomizedSVD<Eigen::SparseMatrix<double, Eigen::RowMajor>>
The PowerMethodSVD
class is a template-based implementation for computing the Singular Value Decomposition (SVD) of a matrix (dense or sparse) using the Power Method with Deflation. This approach is designed for approximating a few dominant singular values and their associated singular vectors efficiently.
In order to make the operation more efficient, we have optimized the algorithm. Alternately apply matrix A and its transpose
To compute the largest singular value
- Initialize
$v_1$ as a random vector:
- Normalize
$v_1$ :
- Iteratively compute:
- Left singular vector:
- Right singular vector:
- Convergence is checked by monitoring the change in
$v_1$ :
- After convergence, the largest singular value is:
- The left singular vector is computed as:
After computing
The GivensRotationQR
class is a template-based implementation for computing the QR decomposition of a matrix using the Givens Rotation method. This approach is designed for sparse matrices, because of generics, it can be used for both dense and sparse matrices.
The Givens Rotation QR decomposition is a method for decomposing a matrix
A Givens rotation matrix is used to zero out specific elements of a matrix. For two elements
The Givens rotation matrix for rows
The steps for the Givens Rotation QR decomposition are:
- Initialize
$R$ as a copy of the input matrix$A$ and$Q$ as the identity matrix. - For each column
$j$ :- Apply Givens rotations to zero out elements below the diagonal in column
$j$ . - Update
$R$ by applying Givens rotations from the left. - Accumulate rotations into
$Q$ by applying Givens rotations from the right (transpose of$G$ ).
- Apply Givens rotations to zero out elements below the diagonal in column
- Return
$Q$ and$R$ .
For step2 we do some optimizations:
- Instead of applying matrix multiplications:
$Q = I \times G_1^T \times G_2^T \times ... \times G_n^T$ and$R = G_n \times ... \times G_2 \times G_1 \times A$ , we apply Gi vens rotations directly to the rows of$R$ and$Q$ . We use eigen's vectorized operations instead of dual loops. - Instead of computing the transpose of the rotation matrix, we apply the rotation to the columns of the matrix.
Randomized Singular Value Decomposition is a fast probabilistic algorithm that can be used to compute the near optimal low-rank singular value decomposition of massive data sets with high accuracy. The key idea is to compute a compressed representation of the data to capture the essential information. This compressed representation can then be used to obtain the low-rank singular value decomposition decomposition. The larger the matrix, the higher the computational advantages of this algorithm are, considering that classical SVD computation requires π(ππ min(π,π)) operations. It's expecially efficient when the matrix is sparse or when only a small subset of singular values and vectors is needed.
The steps of the algorithm are:
- Draw an
$n \times k$ Gaussian random matrix$\Omega$ - From the
$m \times k$ sketch matrix$Y=A \Omega$ - From an
$m \times k$ orthonormal matrix$Q$ such that$Y=QR$ - From the
$n \times k$ matrix$B=Q^TA$ - Compute the SVD of
$\hat{U} \Sigma V^{T}$ - From the matrix
$U=Q\hat{U}$
rSVD reaches a complexity of
We use CMake
to build the project, and vcpkg
to manage dependencies, our project can run across platforms.
Prerequisites:
For MacOS
, install CMake
and Vcpkg
. If you need OpenMP
support, you must install llvm
:
brew install cmake
brew install vcpkg
brew install llvm
To use vcpkg
:
git clone https://github.com/microsoft/vcpkg "$HOME/vcpkg"
export VCPKG_ROOT="$HOME/vcpkg
$VCPKG_ROOT/bootstrap-vcpkg.sh
After installing the above packages, you need to load them into PATH
, for MacOS
, edit the .zshrc
file:
# Add vcpkg to PATH
export VCPKG_ROOT=... # your vcpkg path
export PATH=$VCPKG_ROOT:$PATH
# Add llvm to PATH
export CC=/opt/homebrew/opt/llvm/bin/clang
export CXX=/opt/homebrew/opt/llvm/bin/clang++
export LDFLAGS="-L/opt/homebrew/opt/libomp/bin"
export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
And then MacOS
environment settings completed.
- install CMake and Vcpkg
sudo apt update
sudo apt install cmake
git clone https://github.com/microsoft/vcpkg "$HOME/vcpkg"
$HOME/vcpkg/bootstrap-vcpkg.sh
- set vcpkg to PATH
echo 'export VCPKG_ROOT=$HOME/vcpkg' >> ~/.bashrc
echo 'export PATH=$VCPKG_ROOT:$PATH' >> ~/.bashrc
source ~/.bashrc
Fork and Clone this project to your own repo.
For MacOS
, you can use clang
or gcc
, for Windows
, you can use MSVC
or gcc
. First you should add CMakeUserPresets.json
to the project root directory, for example I have configured the CMakeUserPresets.json
file as follows, you should replace the compiler path with your own path, and the VCPKG_ROOT
with your own path:
{
"version": 2,
"configurePresets": [
{
"name": "gcc",
"inherits": "vcpkg",
"environment": {
"VCPKG_ROOT": "/Users/raopend/vcpkg"
},
"generator": "Ninja",
"binaryDir": "${sourceDir}/build",
"cacheVariables": {
"CMAKE_TOOLCHAIN_FILE": "$env{VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake",
"CMAKE_C_COMPILER": "/opt/homebrew/bin/gcc-14",
"CMAKE_CXX_COMPILER": "/opt/homebrew/bin/g++-14"
}
},
{
"name": "clang",
"inherits": "vcpkg",
"environment": {
"VCPKG_ROOT": "/Users/raopend/vcpkg"
},
"generator": "Ninja",
"binaryDir": "${sourceDir}/build",
"cacheVariables": {
"CMAKE_TOOLCHAIN_FILE": "$env{VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake",
"CMAKE_C_COMPILER": "/opt/homebrew/opt/llvm/bin/clang",
"CMAKE_CXX_COMPILER": "/opt/homebrew/opt/llvm/bin/clang++"
}
}
]
}
You can replace the compiler path with your own path.
After setting up the environment, you can build the project. The keyword default
is the preset name in the CMakeUserPresets.json
file, for example, I use the clang
preset:
- Configure the build using CMake:
cmake --preset=clang
- Build the project
cmake --build build
- Run the application
./build/benchmarks/BenchRSVD
We have implemented a benchmark to compare the performance of our implementation. The benchmark is run on a dense matrix of size 1000x1000 and a sparse matrix of size 1000x1000. The benchmark measures the time taken to compute the SVD of the matrix using our implementation and the Eigen
library. The results show that our implementation is faster than the Eigen
library for both dense and sparse matrices.
The following figure summarizes the results of givens rotation QR for sparse matrices: This figure shows we should do more to optimize the givens rotation QR for sparse matrices.
The following figure summarizes the results of multiple SVD method for sparse matrices:
mpirun -np 4 ./build/profiling/mpi_omp_random_matrix
We have implemented benchmarks to compare the performance to generate random matrix using MPI and OpenMP. The following table summarizes the results:
Configuration | Number of Processes | Time Taken (seconds) |
---|---|---|
RandomMTX mpi | 1 | 0.954016 |
RandomMTX mpi | 2 | 0.686535 |
RandomMTX mpi | 4 | 0.594157 |
RandomMTX mpi_omp | 1 | 0.998558 |
RandomMTX mpi_omp | 2 | 3.01405 |
RandomMTX mpi_omp | 4 | 0.635416 |
RandomMTX omp | 1 | 0.0920939 |
- Notably, the "RandomMTX omp" configuration with 1 process shows an outstanding performance with a time taken of just 0.0920939 seconds, which is the fastest.
- Excessive parallelization may lead to a huge amount of creation and destruction of resources, and as a result, the performance may turn out to be worse.