Sparse matrix-vector multiplication (SpMV) implementations for each of the four formats -- COO, CSR, DIA and ELL.
y = Ax
where A is a M x N sparse matrix with nnz number of non-zeros, x is a dense input vector of size N and y is a dense output vector of size M.
For C, JavaScript, and WebAssembly via Emscripten. These implementations follow closely to conventional implementations of SpMV that target cache-based superscalar uniprocessor machines.
For C using pthreads and WebAssembly via Emscripten.
The input matrix is required to be in Matrix Market format (.mtx). Some real-life examples of sparse matrices in this external format can be obtained from The SuiteSparse Matrix Collection (formerly the University of Florida Sparse Matrix Collection) at https://sparse.tamu.edu
Our implementations currently support square sparse matrices, where M == N. We categorize the matrices based on the size equal to sizeof(x) + sizeof(y)
1. X-Large : don't fit in the L3 cache.
2. Large : fit in L3 cache, but don't fit in the L2 cache.
3. Medium : fit in L2 cache, but don't fit in the L1 cache.
4. Small : fit in the L1 cache.
For our cheetah machine, L1 : 32KB, L2 : 256KB, L3 : 12MB Therefore, for single-precision storage,
X-Large : N > 1,572,864
Large : 32,768 < N < 1,572,864
Medium : 4,096 < N < 32,768
Small : N < 4,096
make features
./run_features <matrix_market_input_file_path>
- Matrix dimension, N
- Number of non-zeros, nnz
- Density, nnz/(N * N)
- Number of non-zeros per row, nnz_row (min, max, mean, sd)
- Column width per row, col_width_row (min, max, mean, sd)
- Scatter ratio per row, scatter_row (min, max, mean, sd)
- Miss density per row, miss_density_row
Flop:Byte is the ratio of total number of floating point operations to the total number of bytes of the storage format. For example, for single-precision CSR format sparse matrix ->
Flop : Byte = (2 * nnz) / (8 * nnz + 12 * N)
Following are the scatter plots for our benchmark suite (correlation coefficient : 0.607) based on different sizes (comparing the SpMV performance in MFLOPS vs flop:byte) ->
The matrices with high Flop:Byte ratio seem to perform better than others. The low performance of matrices with high Flop:Byte ratio can be explained by low nnz_row, which is equal to the number of times inner loop of SpMV CSR executes. Therefore, the overhead of the inner loop seems to bring performance down. Also, the huge variation between the number of non-zeros per row potentially leads to high branch mispredictions.
Density = nnz/(N * N)
Following are the scatter plots for our benchmark suite based on different sizes (comparing the SpMV performance in MFLOPS vs Density) ->
Theoretically, low miss density leads to better performance. It means correlation coefficient should be closer to -1. Following are the scatter plots for our benchmark suite (correlation coefficient : -0.372) based on different sizes (comparing the SpMV performance in MFLOPS vs Miss Density) ->
Please follow ManLang18-SpMV for the experimental data and scripts.
make float
./run_float <matrix_market_input_file_path> <format_string>
make double
./run_double <matrix_market_input_file_path> <format_string>
where <format_string> can be COO, CSR, DIA and ELL.
make float DEBUG=1
./run_float <matrix_market_input_file_path> <format_string>
./run.py -b <browser> -p single <matrix_market_input_file_path>
./run.py -b <browser> -p double <matrix_market_input_file_path>
where <browser> is chrome for Google Chrome and firefox for Mozilla Firefox
./index.js -b <browser> -p single -f <matrix_market_input_file_path>
./index.js -b <browser> -p double -f <matrix_market_input_file_path>
where <browser> is chrome for Google Chrome and firefox for Mozilla Firefox
Please contact Prabhjot.