Microbenchmarks: RcppArmadillo and RcppEigen

Rcpp::dgCMatrix is passed by reference from R to C++ while RcppArmadillo/Eigen sparse matrices are created by a deep copy, creating significant overhead.
Rcpp::dgCMatrix sparse iterators are faster than arma::SpMat and slightly slower than Eigen::SparseMatrix iterators.

Overhead of deep-copy

Consider the cost of copying a 10000 x 10000 random sparse matrix into the RcppEigen and RcppArmadillo sparse matrix classes.

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
#include <RcppSparse.h>

//[[Rcpp::export]]
Eigen::SparseMatrix<double> Eigen_conv_to_from(Eigen::SparseMatrix<double>& mat){
  return mat;
}

//[[Rcpp::export]]
arma::SpMat<double> Armadillo_conv_to_from(arma::SpMat<double>& mat){
  return mat;  
}

//[[Rcpp::export]]
Rcpp::dgCMatrix rcppSparse_conv_to_from(Rcpp::dgCMatrix& mat){
  return mat;
}

library(Matrix)
library(microbenchmark)
mat <- rsparsematrix(10000, 10000, 0.2)
microbenchmark(
   "eigen" = Eigen_conv_to_from(mat),
   "armadillo" = Armadillo_conv_to_from(mat),
   "rcppSparse" = rcppSparse_conv_to_from(mat),
   times = 10
)

# Unit: microseconds
#                        expr     min      lq     mean  median      uq      max neval
#       Eigen_conv_to_from(A) 70361.3 74059.2 89142.77 81492.5 87029.3 176105.7    10
#   Armadillo_conv_to_from(A) 57647.4 61506.6 80136.39 79728.5 96738.0 105297.9    10
#  rcppSparse_conv_to_from(A)     5.0     6.4    23.66    28.9    34.7     52.8    10

The overhead of copying this matrix to C++ in RcppArmadillo and RcppEigen is ~80 milliseconds on my desktop machine (there is actually no overhead copying back to R because these are doubles). In contrast, Rcpp::dgcMatrix is passed by reference, and therefore takes just a few microseconds.

Iterator performance

After loading the sparse matrices into C++, we can compare the performance of Rcpp::dgCMatrix iterators to the counterparts in arma::SpMat and Eigen::SparseMatrix.

This must be done in C++ as the overhead of copying the matrix to/from R will interfere with the timing, and in this case we are only interested in the C++ runtime.

C++

// use std::chrono to time function runtime
template<typename F, typename... Args>
int funtime(F func, int rep, Args&&... args) {
    auto start = std::chrono::high_resolution_clock::now();

    while (rep-- > 0) func(std::forward<Args>(args)...);

    auto stop = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
    return duration.count();
}

//[[Rcpp::export]]
Rcpp::NumericVector rcpp_colsums(Rcpp::dgCMatrix& mat) {
    Rcpp::NumericVector sums(mat.cols());
    for (int i = 0; i < mat.cols(); ++i)
        for (Rcpp::dgCMatrix::col_iterator it = mat.begin_col(i); it != mat.end_col(i); it++)
            sums(i) += *it;
    return sums;
}

//[[Rcpp::export]]
arma::dvec arma_colsums(arma::SpMat<double>& mat) {
    arma::dvec sums = arma::zeros<arma::dvec>(mat.n_cols);
    for (int i = 0; i < mat.n_cols; ++i)
        for (arma::SpMat<double>::col_iterator it = mat.begin_col(i); it != mat.end_col(i); it++)
            sums(i) += *it;
    return sums;
}

//[[Rcpp::export]]
Eigen::VectorXd eigen_colsums(Eigen::SparseMatrix<double>& mat) {
    Eigen::VectorXd sums(mat.cols());
    for (int i = 0; i < mat.outerSize(); ++i)
        for (Eigen::SparseMatrix<double>::InnerIterator it(mat, i); it; ++it)
            sums(i) += it.value();
    return sums;
}

//[[Rcpp::export]]
void cpp_benchmark(Rcpp::dgCMatrix& mat_dgCMatrix, arma::SpMat<double>& mat_arma, Eigen::SparseMatrix<double>& mat_eigen, int rep) {
    Rprintf("colSums: \n %20s %10d\n %20s %10d\n %20s %10d\n",
            "Rcpp::dgCMatrix", funtime(rcpp_colsums, rep, mat_dgCMatrix),
            "arma::SpMat", funtime(arma_colsums, rep, mat_arma),
            "Eigen::SparseMatrix", funtime(eigen_colsums, rep, mat_eigen)
    );
}

R:

library(Matrix)
mat <- rsparsematrix(10000, 1000, 0.1)
cpp_benchmark(mat, mat, mat, 100)

Results: (in milliseconds after 100 repetitions)

colSums: 
      Rcpp::dgCMatrix       1639
          arma::SpMat       2010
  Eigen::SparseMatrix       1507

The dgCMatrix iterator outperforms Armadillo, likely because range-checking is not performed. However, it slightly underperforms Eigen's InnerIterator, a special rendition of iterator that uses a boolean range check to avoid an extra post-increment and runs logical comparisons between iterators and primitive types, not iterators and iterators. This collectively facilitates better compiler optimization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microbenchmarks: RcppArmadillo and RcppEigen

Overhead of deep-copy

Iterator performance

Clone this wiki locally