Skip to content

Microbenchmarks: RcppArmadillo and RcppEigen

zdebruine edited this page Apr 16, 2021 · 1 revision
  • Rcpp::dgCMatrix is passed by reference from R to C++ while RcppArmadillo/Eigen sparse matrices are created by a deep copy, creating significant overhead.
  • Rcpp::dgCMatrix sparse iterators are faster than arma::SpMat and slightly slower than Eigen::SparseMatrix iterators.

Overhead of deep-copy

Consider the cost of copying a 10000 x 10000 random sparse matrix into the RcppEigen and RcppArmadillo sparse matrix classes.

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
#include <RcppSparse.h>

//[[Rcpp::export]]
Eigen::SparseMatrix<double> Eigen_conv_to_from(Eigen::SparseMatrix<double>& mat){
  return mat;
}

//[[Rcpp::export]]
arma::SpMat<double> Armadillo_conv_to_from(arma::SpMat<double>& mat){
  return mat;  
}

//[[Rcpp::export]]
Rcpp::dgCMatrix rcppSparse_conv_to_from(Rcpp::dgCMatrix& mat){
  return mat;
}
library(Matrix)
library(microbenchmark)
mat <- rsparsematrix(10000, 10000, 0.2)
microbenchmark(
   "eigen" = Eigen_conv_to_from(mat),
   "armadillo" = Armadillo_conv_to_from(mat),
   "rcppSparse" = rcppSparse_conv_to_from(mat),
   times = 10
)
# Unit: microseconds
#                        expr     min      lq     mean  median      uq      max neval
#       Eigen_conv_to_from(A) 70361.3 74059.2 89142.77 81492.5 87029.3 176105.7    10
#   Armadillo_conv_to_from(A) 57647.4 61506.6 80136.39 79728.5 96738.0 105297.9    10
#  rcppSparse_conv_to_from(A)     5.0     6.4    23.66    28.9    34.7     52.8    10

The overhead of copying this matrix to C++ in RcppArmadillo and RcppEigen is ~80 milliseconds on my desktop machine (there is actually no overhead copying back to R because these are doubles). In contrast, Rcpp::dgcMatrix is passed by reference, and therefore takes just a few microseconds.

Iterator performance

After loading the sparse matrices into C++, we can compare the performance of Rcpp::dgCMatrix iterators to the counterparts in arma::SpMat and Eigen::SparseMatrix.

This must be done in C++ as the overhead of copying the matrix to/from R will interfere with the timing, and in this case we are only interested in the C++ runtime.

C++

// use std::chrono to time function runtime
template<typename F, typename... Args>
int funtime(F func, int rep, Args&&... args) {
    auto start = std::chrono::high_resolution_clock::now();

    while (rep-- > 0) func(std::forward<Args>(args)...);

    auto stop = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
    return duration.count();
}

//[[Rcpp::export]]
Rcpp::NumericVector rcpp_colsums(Rcpp::dgCMatrix& mat) {
    Rcpp::NumericVector sums(mat.cols());
    for (int i = 0; i < mat.cols(); ++i)
        for (Rcpp::dgCMatrix::col_iterator it = mat.begin_col(i); it != mat.end_col(i); it++)
            sums(i) += *it;
    return sums;
}

//[[Rcpp::export]]
arma::dvec arma_colsums(arma::SpMat<double>& mat) {
    arma::dvec sums = arma::zeros<arma::dvec>(mat.n_cols);
    for (int i = 0; i < mat.n_cols; ++i)
        for (arma::SpMat<double>::col_iterator it = mat.begin_col(i); it != mat.end_col(i); it++)
            sums(i) += *it;
    return sums;
}

//[[Rcpp::export]]
Eigen::VectorXd eigen_colsums(Eigen::SparseMatrix<double>& mat) {
    Eigen::VectorXd sums(mat.cols());
    for (int i = 0; i < mat.outerSize(); ++i)
        for (Eigen::SparseMatrix<double>::InnerIterator it(mat, i); it; ++it)
            sums(i) += it.value();
    return sums;
}

//[[Rcpp::export]]
void cpp_benchmark(Rcpp::dgCMatrix& mat_dgCMatrix, arma::SpMat<double>& mat_arma, Eigen::SparseMatrix<double>& mat_eigen, int rep) {
    Rprintf("colSums: \n %20s %10d\n %20s %10d\n %20s %10d\n",
            "Rcpp::dgCMatrix", funtime(rcpp_colsums, rep, mat_dgCMatrix),
            "arma::SpMat", funtime(arma_colsums, rep, mat_arma),
            "Eigen::SparseMatrix", funtime(eigen_colsums, rep, mat_eigen)
    );
}

R:

library(Matrix)
mat <- rsparsematrix(10000, 1000, 0.1)
cpp_benchmark(mat, mat, mat, 100)

Results: (in milliseconds after 100 repetitions)

colSums: 
      Rcpp::dgCMatrix       1639
          arma::SpMat       2010
  Eigen::SparseMatrix       1507

The dgCMatrix iterator outperforms Armadillo, likely because range-checking is not performed. However, it slightly underperforms Eigen's InnerIterator, a special rendition of iterator that uses a boolean range check to avoid an extra post-increment and runs logical comparisons between iterators and primitive types, not iterators and iterators. This collectively facilitates better compiler optimization.