Skip to content

Ruby Grant 2019 proposal

Udit Gulati edited this page Oct 2, 2019 · 5 revisions

Contact information

Name: Udit Gulati

E-mail: [email protected]

GitHub: uditgulati

Blog: uditgulati.github.io/blog

Phone: +91 9896363996

Brief biography

I am an active SciRuby contributor. I worked on NumRuby(re-implementation of NMatrix) during summer of 2019 as part of Google Summer of Code program. I am currently a final year computer science undergraduate at Indian Institute of Information Technology, Una.

Projects

  • NumRuby
  • Ruby-Sparse

Talks

Project Title

Sparse storage support for Ruby environment.

Abstract

Sparse matrices are a very important part of scientific computing. Many popular modern languages like Python, Julia and R (some even have this as part of their standard libraries) have sparse matrices support while Ruby doesn't have a good library for sparse.

This project is to build an efficient and feature-rich end-user intended Sparse matrix library in Ruby with interfaces to popular dense matrix libraries (like Numo-Narray, NumRuby[NMatrix]) with linear algebra and graph functionalities support.

What is Sparse storage?

Sparse arrays, or arrays that are mostly empty or filled with zeros, are common in many scientific applications. To save space we often avoid storing these arrays in traditional dense formats, and instead choose different data structures. The choice of data structure can significantly affect our storage and computational costs when working with these arrays.

Large sparse matrices often appear in scientific or engineering applications when solving partial differential equations. When storing and manipulating sparse matrices on a computer, it is beneficial and often necessary to use specialized algorithms and data structures that take advantage of the sparse structure of the matrix. Operations using standard dense-matrix structures and algorithms are slow and inefficient when applied to large sparse matrices as processing and memory are wasted on the zeroes. Sparse data is by nature more easily compressed and thus requires significantly less storage.

Some very large sparse matrices are infeasible to manipulate using standard dense-matrix algorithms. So, it is required to implement completely different algorithms for sparse matrices leading to implementation of a completely different library for sparse.

Sparse storage can be implemented by following ways:

  • Compressed sparse row (CSR)
  • Compressed sparse column (CSC)
  • Coordinate list (COO)
  • Diagonal

These types are explained in Technical details section with implementation details.

Utilities of Sparse storage

  • Better performant sparse matrix operations. Improvements in speed and memory usage.
  • Graph algorithms support.
  • Sparse matrix linear algebra is better in performance and requires lesser memory.

Motivation

While many popular languages have sparse storage support (either in standard library or as a third-party library), Ruby doesn't have a good support for it. This projects aims to fill this gap by implementing a sparse storage library for Ruby environment which is fast, easy to use, API rich and highly compatible with popular dense storage libraries such as Numo-Narray and NumRuby (Re-implementation of NMatrix).

Languages that support Sparse storage

1. Python

  • Python has 2 major libraries for sparse matrix support, namely, scipy.sparse and pydata.sparse.
  • scipy.sparse is for 2-dimensional matrices only. It supports quite a lot of types of sparse matrix. These are BSR, COO, CSC, CSR, DIA, DOK and LIL.
  • scipy.sparse has a sub-module scipy.sparse.csgraph for graph algorithms. It consists of algorithms such as Dijkstra shortest path, Floyd warshall, Bellman ford, Bfs & Dfs, Minimum spanning tree, Maximum bipartite matching etc.
  • scipy.sparse has another sub-module scipy.sparse.linalg which is for linear algebra capabilities for sparse matrices. It consists of capabilities such as finding inverse, determinant or norm of a matrix, matrix factorization and solving linear problems.
  • pydata.sparse is for arbitrary dimensional sparse matrices. It supports only 2 types of sparse types, namely, COO and DOK.
  • pydata.sparse also supports basic arithmetic operations on sparse matrix and also has useful utility functions, random, eye and ones to name a few.
  • pydata.sparse is compatible with numpy.ndarray while scipy.sparse is compatible with numpy.matrix.

2. Julia

  • Julia has support for sparse vectors and sparse matrices in the SparseArrays std-lib module.
  • In Julia, sparse matrices are stored in the Compressed Sparse Column (CSC) format.
  • There is support for sparse matrix arithmetic operations and indexing.

Already present Sparse libraries in Ruby

There are a few gems for Sparse but none of them is complete enough to be used for practical work and they are also currently not being maintained by anyone. Some of them lack the performance (such as being completely in Ruby), some are only for one type of sparse (like CSR), some lack primary functionalities such as conversion to/from dense storage or sparse 2d-matrix operations. There is also no proper documentation for these gems. The reason being that most of them were written as a pet project or an experiment and not as a full-fledged library intended for end-user.

Below are some gems that I believe have some resemblance to sparse storage implementations:

  • sparse_matrix: Supports only creation and element-wise operations on a key-value pair type Sparse matrix. Written purely in Ruby.
  • csrmatrix: An implementation of CSR sparse matrix written purely in Ruby. Provides arithmetic operations and matrix operations such as taking transpose or finding determinant etc.
  • sparsematrix: An implementation of Yale sparse matrix written purely in Ruby. Only supports creation and elements access of sparse matrix.
  • libsmatrix: Written in C with APIs provided for C, Ruby and Java/Scala. Supports only key-value pair type Sparse matrix.

Codebase

https://github.com/SciRuby/ruby-sparse

Technical Details

Project Layout

A proposed project layout is as follows:

ruby-sparse  
└── ext  
    ├── blas   
    │   └── sparse_blas.c  
    ├── elementwise  
    │   ├── arithmetic.c  
    │   ├── exponential.c  
    │   ├── hyperbolic.c  
    │   ├── numeric.c  
    │   └── trigonometric.c  
    ├── indexing  
    │   ├── index.c  
    │   ├── iterators.c  
    │   ├── slicing.c 
    │   └── broadcasting.c 
    ├── extconf.rb  
    ├── interfaces  
    │   ├── narray.c  
    │   └── nmatrix.c  
    ├── io  
    │   └── serialize.c  
    ├── graph  
    │   ├── narray.c  
    │   └── nmatrix.c  
    ├── mkmf.rb   
    ├── ruby_sparse.h  
    └── ruby_sparse.c  

Module structure

RubySparse  
	RubySparse::CSR  
	RubySparse::CSC  
	RubySparse::COO  
	RubySparse::Dia  
	RubySparse::Graph  
	RubySparse::Linalg  
	RubySparse::SparseMatrix  

Classes

1. RubySparse::CSR

Class for Compressed Sparse Row matrix.

2. RubySparse::CSC

Class for Compressed Sparse Column matrix.

3. RubySparse::COO

Class for COOrdinate format sparse matrix.

4. RubySparse::Dia

Class for DIAgonal sparse matrix.

5. RubySparse::SparseMatrix

This class provides a base class for all sparse matrices. It cannot be instantiated. Most of the work is provided by it's subclasses.

Functions

1. eye(m, n, dtype: 'float', format: 'DIA')

Sparse matrix with ones on diagonal. Returns a sparse (m x n) matrix where the diagonal is all ones and everything else is zeros.

2. kron(A, B, format: 'CSR')

Kronecker product of sparse matrices A and B. Returns Kronecker product in a sparse matrix format.

3. tril(A, format: 'CSR')

Return the lower triangular portion of a matrix in sparse format.

4. triu(A, format: 'CSR')

Return the upper triangular portion of a matrix in sparse format

5. hstack(blocks, format: 'CSR', dtype: 'float')

Stack sparse matrices horizontally (column wise).

6. vstack(blocks, format: 'CSR', dtype: 'float')

Stack sparse matrices vertically (row wise).

7. dot(A, B)

Perform the dot product on two sparse matrices A and B.

8. save(file, matrix)

Save a sparse matrix to a file.

9. load(file)

Load a sparse matrix from a file. Returns a sparse matrix containing the loaded data.

Submodules

1. RubySparse::Graph

This module consists of graph algorithms for sparse matrices.

  • dijkstra(graph, directed: True, weighted: True): Dijkstra algorithm using Fibonacci Heaps.
  • floyd_warshall(grpah, directed: True, weighted: True): Compute the shortest path lengths using the Floyd-Warshall algorithm.
  • bellman_ford(graph, directed: True, weighted: True): Compute the shortest path lengths using the Bellman-Ford algorithm.
  • breadth_first_order(graph, start_node, directed: True): Return a breadth-first ordering starting with specified node.
  • depth_first_order(graph, start_node, directed: True): Return a depth-first ordering starting with specified node.
  • breadth_first_tree(graph, start_node, directed: True): Return the tree generated by a breadth-first search.
  • depth_first_tree(graph, start_node, directed: True): Return the tree generated by a depth-first search.
  • minimum_spanning_tree(graph): Return a minimum spanning tree of an undirected graph.
  • maximum_bipartite_matching(graph): Returns an array of row or column permutations that makes the diagonal of a non-singular square sparse matrix zero free.

2. RubySparse::Linalg

This module consists of linear algebra capabilities for sparse matrices.

  • inv(A): Compute the inverse of a sparse matrix. Returns the inverse of A.
  • expm(A): Compute the matrix exponential using Pade approximation. Returns the matrix exponential of A.
  • norm(A, order, axis): Norm of a sparse matrix. This function is able to return one of seven different matrix norms, depending on the value of the order parameter. Returns norm of matrix.
  • solve(A, B): Solve the sparse linear system Ax=b, where b may be a vector or a matrix. Returns x, the solution of linear equation.
  • eigs(A, k): Find k eigenvalues and eigenvectors of the square matrix A. Returns an arrays of k eigenvalues and an array of k eigenvectors.
  • eigsh(A, k): Find k eigenvalues and eigenvectors of the real symmetric square matrix or complex hermitian matrix A. Returns an arrays of k eigenvalues and an array of k eigenvectors.
  • svds(A, k): Compute the largest k singular values/vectors for a sparse matrix.
  • lu(A): Compute the LU decomposition of a sparse, square matrix.

Supported sparse types

CSR

The compressed sparse row (CSR) or Yale format represents a matrix M by three (one-dimensional) arrays, that respectively contain nonzero values, the extents of rows, and column indices. Methods to be implemented:

  • get_dense_from_csr: This method will convert CSR storage to dense storage.
  • get_csr_from_dense: This method will convert dense storage to CSR storage.
typedef struct CSR_STRUCT
{
  sp_dtype dtype;
  size_t ndims;
  size_t count; //count of non-zero elements;
  size_t* shape;
  double* elements; //elements array
  size_t* ia; //row index
  size_t* ja; //col index
}csr_matrix;

CSC

CSC is similar to CSR except that values are read first by column, a row index is stored for each value, and column pointers are stored. Methods to be implemented:

  • get_dense_from_csc: This method will convert CSC storage to dense storage.
  • get_csc_from_dense: This method will convert dense storage to CSC storage.
typedef struct CSC_STRUCT
{
  sp_dtype dtype;
  size_t ndims;
  size_t count; //count of non-zero elements;
  size_t* shape;
  double* elements; //elements array
  size_t* ia; //row index
  size_t* ja; //col index
}csc_matrix;

COO

COO stores a list of (row, column, value) tuples. Ideally, the entries are sorted first by row index and then by column index, to improve random access times. Methods to be implemented:

  • get_dense_from_coo: This method will convert COO storage to dense storage.
  • get_coo_from_dense: This method will convert dense storage to COO storage.
typedef struct COO_STRUCT
{
  sp_dtype dtype;
  size_t ndims;
  size_t count; //count of non-zero elements;
  size_t* shape;
  double* elements; //elements array
  size_t* ia; //row index
  size_t* ja; //col index
}coo_matrix;

Diagonal

Diagonal stores just the entries in the main diagonal as a one-dimensional array, so a diagonal n × n matrix requires only n entries. It is used when only the diagonal entries are non-zero. Methods to be implemented:

  • get_dense_from_dia: This method will convert Dia storage to dense storage.
  • get_dia_from_dense: This method will convert dense storage to Dia storage.
typedef struct DIA_STRUCT
{
  sp_dtype dtype;
  size_t ndims;
  size_t count; //count of non-zero elements;
  size_t* shape;
  double* elements; //elements array
  size_t* offset;
}dia_matrix;

Modules and classes initialization

Initialization code for RubySparse and it's modules and classes is as following:

VALUE RubySparse = Qnil;
VALUE SparseMatrix = Qnil;
VALUE COO = Qnil;

void  Init_sparse() {
  RubySparse   =  rb_define_module("RubySparse");
  SparseMatrix =  rb_define_class_under(RubySparse, "SparseMatrix", rb_cObject);
  COO          =  rb_define_class_under(RubySparse, "COO", rb_cObject);
  
  rb_define_alloc_func(COO, coo_alloc);
  rb_define_method(COO, "initialize", coo_init, -1);
  rb_define_method(COO, "shape", coo_get_shape, 0);
  rb_define_method(COO, "elements", coo_get_elements, 0);
  rb_define_method(COO, "coords", coo_get_coords, 0);
  rb_define_method(COO, "nzcount", coo_get_count, 0);
  rb_define_method(COO, "dim", coo_get_ndims, 0);
  
  rb_define_method(COO, "+", coo_add, 1);
}

The sparse matrix initialization of COO is as following:

VALUE coo_init(int argc, VALUE* argv, VALUE self) {
  coo_matrix* mat;
  Data_Get_Struct(self, coo_matrix, mat);

  if(argc > 0){
    mat->ndims = 2;
    mat->count = (size_t)RARRAY_LEN(argv[1]);
    mat->shape = ALLOC_N(size_t, mat->ndims);
    for(size_t index = 0; index < mat->ndims; index++) {
      mat->shape[index] = (size_t)FIX2LONG(RARRAY_AREF(argv[0], index));
    }
    mat->elements = ALLOC_N(double, mat->count);
    for(size_t index = 0; index < mat->count; index++) {
      mat->elements[index] = (double)NUM2DBL(RARRAY_AREF(argv[1], index));
    }
    mat->ia = ALLOC_N(size_t, mat->count);
    for(size_t index = 0; index < mat->count; index++) {
      mat->ia[index] = (size_t)NUM2SIZET(RARRAY_AREF(argv[2], index));
    }
    mat->ja = ALLOC_N(size_t, mat->count);
    for(size_t index = 0; index < mat->count; index++) {
      mat->ja[index] = (size_t)NUM2SIZET(RARRAY_AREF(argv[3], index));
    }
  }

  return self;
}


VALUE coo_alloc(VALUE klass) {
  coo_matrix* mat = ALLOC(coo_matrix);

  return Data_Wrap_Struct(klass, NULL, coo_free, mat);
}


void coo_free(coo_matrix* mat) {
  xfree(mat);
}

Similarly, it'll be implemented for other sparse types.

Dtypes

The library will be implemented only for int and float64 dtypes during the grant period. The other dtypes such as bool, complex64 and string will be implemented if time allows or will be done after the grant period.

Indexing

Sparse matrix indexing works differently as compared to that of dense ones. Each sparse type has a different way of indexing of elements as the position of each element is stored differently in each one. Foe example, for COO, the row and column indices are directly stored with the element, so it can directly be accessed from these but fro CSR which stores the column index but stores how many elements are in each row which doesn't give direct access to the row index of an element.

Once indexing is implemented then it can be used to iterate over each element of the sparse matrix. Other types of iterator can be implemented such as:

  • each_with_index: Iterate over each element with index of occurrence.
  • each_with_indices: Iterate over each element with row and column index.
  • each_row: Iterate over each row of sparse matrix.
  • each_column: Iterate over each column of sparse matrix.
  • each_row_with_index: Iterate over each row of sparse matrix with corresponding index.
  • each_column_with_index: Iterate over each column of sparse matrix with corresponding index.

Indexing will also be used to implement slicing of sparse matrices. Examples of slicing:

  • mat[1, 0:2]: Row with index 1 and columns with indices from 0 to 2, not including 2.
  • mat[0:5, 3:4]Rows with indices from 0 to 5, not including 5 and columns with indices from 3 to 4, not including 4.

Sparse matrix operations

Elementwise

The following Elementwise operations would be supported.

  • Basic Arithmetic operations: +, -, *, /, >>, <<.
  • Exponential and logarithmic functions: exp, log, expm1, log1p, etc.
  • Hyperbolic functions: sinh, cosh, tanh, etc.
  • Logical operations: &&, ||, |, &, <, >, <=, >=,==, !.
  • Numeric functions: floor, round, min, max, etc.
  • Trigonometric functions: sin, cos, tan, etc.

Elementwise addition for COO is as follows:

VALUE coo_add(VALUE self, VALUE another) {
  coo_matrix* left;
  coo_matrix* right;
  Data_Get_Struct(self, coo_matrix, left);
  Data_Get_Struct(another, coo_matrix, right);

  coo_matrix* result = ALLOC(coo_matrix);
  result->count = 0;
  result->ndims = left->ndims;
  result->shape = ALLOC_N(size_t, result->ndims);

  for(size_t index = 0; index < result->ndims; index++){
    result->shape[index] = left->shape[index];
  }

  result->elements = ALLOC_N(double, 2 * left->count);
  result->ia = ALLOC_N(size_t, 2 * left->count);
  result->ja = ALLOC_N(size_t, 2 * left->count);

  size_t left_index = 0, right_index = 0, result_index = 0;
  while(left_index < left->count || right_index < right->count) {
    if(left->ia[left_index] == right->ia[right_index] //left and right indices equal
    && left->ja[left_index] == right->ja[right_index] 
    && left_index < left->count && right_index < right->count) {
      result->elements[result_index] = left->elements[left_index] + right->elements[right_index];
      result->ia[result_index] = left->ia[left_index];
      result->ja[result_index] = left->ja[left_index];

      left_index++, right_index++, result_index++;
    }
    else if((left->ia[left_index] < right->ia[right_index]) //left index smaller
    || (left->ia[left_index] == right->ia[right_index] && left->ja[left_index] < right->ja[right_index])
    && left_index < left->count) {
      result->elements[result_index] = left->elements[left_index];
      result->ia[result_index] = left->ia[left_index];
      result->ja[result_index] = left->ja[left_index];

      left_index++, result_index++;
    }
    else {  //right index smaller
      result->elements[result_index] = right->elements[right_index];
      result->ia[result_index] = right->ia[right_index];
      result->ja[result_index] = right->ja[right_index];

      right_index++, result_index++;
    }

    result->count++;
  }

  return Data_Wrap_Struct(COO, NULL, coo_free, result);
}

Matrix multiplication

There are multiple sparse BLAS libraries to support:

  • matrix-vector multiplication
  • Matrix-matrix multiplication

I'll use the NIST Sparse BLAS library written in FORTRAN for these operation. The details for library can be found at https://math.nist.gov/spblas/.

Sparse linear algebra

The sparse linear algebra sub-module needs further research on which back-end to use as there isn't a LAPACK counterpart for sparse matrices. I have found a few possible alternatives, such as:

There's also a list of low-level libraries for sparse matrices here at http://www.netlib.org/utk/people/JackDongarra/la-sw.html.

For the grant period, I'll focus more on graph sub-module to get completed and will also research on possible solutions for sparse linear algebra sub-module and will implement basic linear algebra functionalities such as finding matrix inverse, norm and determinant or finding solution to system of linear equations.

Graph algorithms

A graph which is defined as a pair G = (V, E), where V is a set whose elements are called vertices (singular: vertex), and E is a set of two-sets (set with two distinct elements) of vertices, whose elements are called edges (sometimes links or lines).

Graphs can be stored in possible ways:

  • A dense matrix. Uses O(V^2) space. Usually used when the graph is near to being a complete graph. In other words, E approaches O(V^2).
  • A sparse matrix. Uses O(V) space. Used when E is of order of O(V) like a tree.

The second possibility is quite frequent and can be implemented using sparse storage.

Following are of the popular graph algorithms which will be implemented as part of this module:

Note: Below implementations are in C++ making use of STL standard library containers for data storage. When the module will be implemented, the following codes would be ported to C with implementation for the required data structures also.

Dijkstra's shortest path

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a graph, which may represent, for example, road networks.It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

/* 
  Dijkstra's Algorithm
  Time Complexity : O(ElogN)
*/
const int siz = 1e6 + 2;
const int inf = 1e9;
vector < pair<int, int> > graph[siz];

void dijkstra(int source,int n) {
  set < pair<int, int> > s; // set to process edges
  vector <int> dist(n+1, inf); // store distance from source
  s.insert(make_pair(0, source));
  dist[source] = 0;
  while (!s.empty()) {
    pair <int, int> tmp = *(s.begin());
    s.erase(s.begin());
    int u = tmp.second;
    for (vector < pair<int,int> >::iterator i = graph[u].begin(); i != graph[u].end(); i++) {
      int v = (*i).first;
      int weight = (*i).second;
      if (dist[v] > dist[u] + weight) {
        if (dist[v] != inf)
        s.erase(s.find(make_pair(dist[v], v)));
        dist[v] = dist[u] + weight;
        s.insert(make_pair(dist[v], v));
      }
    }
  }
  // final shortest distance from every other node
  for(int i = 1; i <= n; i++) {
    printf("Dist From %d To %d : %d\n", source, i, dist[i]);
  }
}

Breadth first search

Breadth-first search is an algorithm for traversing or searching tree or graph data structures. It starts at the tree root (or some arbitrary node of a graph), and explores all of the neighbor nodes at the present depth prior to moving on to the nodes at the next depth level.

/*
  Breadth first ssearch
  Time complexity: O(N)
*/
const int num = 1e5 + 2;
vector <int> adj[num];
bool visited[num];

bool bfs(int source) {
  queue <int> que;
  visited[source] = true;
  que.push(source);
  while(!que.empty()) {
    int temp = que.front();
    que.pop();
    visited[temp] = true;
    int len = conn[temp].size();
    for(int i = 0; i < len; i++) {
      int v = conn[temp][i];
      if(!visited[v]) {
        que.push(v);
      }
    }
  }
}

Depth first search

Depth-first search is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible along each branch before backtracking.

/*
  Depth first search
  Time complexity: O(N)
*/
const int num = 1e5 + 2;
vector <int> adj[num];
bool visited[num];
void dfs(int s) {
  visited[s] = true; //Now this node is visited
  for(int i = 0; i < adj[s].size(); i++){ //If not leaf node;
    if(visited[adj[s][i]] == false) //If next node is not visited
      dfs(adj[s][i]);
  }
}

Minimum spanning tree

MST using Prim's algorithm. This sample implementation is for finding the total weight of minimum spanning tree.

/* 
  Prim's algorithm
  Time Complexity : O(ElogN)
*/
typedef long long ll;
typedef pair<int, int> pii;
typedef pair<int, pii> piii;

const int siz = 1e6 + 10;
const ll modu = 1e9 + 7;

vector < pair<int, int> > conn[siz], conn1[siz];
//conn is original graph
//conn1 is the MST
int marked[siz];

int prim(int source) {
  priority_queue<piii, vector<piii>, greater<piii> > Q;
  piii curr;
  int cost = 0;
  Q.push(make_pair(0, make_pair(source, source)));
  while(!Q.empty()) {
    curr = Q.top();
    Q.pop();
    int next = curr.second.first;
    int prev = curr.second.second;
    int weight = curr.first;
    if(marked[next])
      continue;
    cost += weight;
    if(next != prev) {
      conn1[next].push_back(make_pair(weight, prev));
      conn1[prev].push_back(make_pair(weight, next));
    }
    marked[next] = true;
    int len = conn[next].size();
    for(int i = 0; i < len; i++) {
      int temp = conn[next][i].second;
      int w1 = conn[next][i].first;
      if(!marked[temp])
        Q.push(make_pair(w1, make_pair(temp, next)));
    }
  }
  return cost;
}

Maximum bipartite matching

A matching in a Bipartite Graph is a set of the edges chosen in such a way that no two edges share an endpoint. A maximum matching is a matching of maximum size (maximum number of edges). In a maximum matching, if any edge is added to it, it is no longer a matching. There can be more than one maximum matchings for a given Bipartite Graph.

/*
 Maximum bipartite matching
 Time complexity: O(E * V)
*/
typedef vector<int> VI;
typedef vector<VI> VVI;
bool FindMatch(int i, const VVI &w, VI &mr, VI &mc, VI &seen) {
  for(int j = 0; j < w[i].size(); j++) {
    if(w[i][j] && !seen[j]) {
      seen[j] = true;
      if(mc[j] < 0 || FindMatch(mc[j], w, mr, mc, seen)) {
        mr[i] = j;
        mc[j] = i;
        return true;
      }
    }
  }
  return false;
}
int BipartiteMatching(const VVI &w, VI &mr, VI &mc) {
  mr = VI(w.size(), -1);
  mc = VI(w[0].size(), -1);
  int ct = 0;
  for(int i = 0; i < w.size(); i++) {
    VI seen(w[0].size());
    if(FindMatch(i, w, mr, mc, seen))
    ct++;
  }
  return ct;
}

Same way Floyd warshall and Bellman Ford will be implemented.

Interfaces

NumRuby (Re-implementation of NMatrix)

Each of the sparse classes will have a class method called from_nmatrix() and an instance method called to_nmatrix(). from_nmatrix() is to convert nmatrix to given sparse type and to_nmatrix() is to convert given sparse type to nmatrix.

Following is the implementation of these methods for RubySparse::COO:

  • NMatrix to COO
VALUE coo_from_nmatrix(VALUE self, VALUE nmat) {
  VALUE r_elements =  rb_funcall(nmat, rb_intern("elements"), 0);
  VALUE r_shape =  rb_funcall(nmat, rb_intern("shape"), 0);
  
  if((size_t)RARRAY_LEN(r_shape) !=  2) {
    rb_raise(rb_eStandardError, "NMatrix must be of 2 dimensions.");
  }
  coo_matrix* mat;
  
  mat->ndims  =  2;
  mat->count  =  0;
  mat->shape  =  ALLOC_N(size_t, mat->ndims);
  for(size_t index =  0; index <  mat->ndims; index++) {
    mat->shape[index] = (size_t)NUM2SIZET(RARRAY_AREF(r_shape, index));
  }
  
  size_t dense_elements_count = (mat->shape[0]) * (mat->shape[1]);
  size_t sparse_elements_count =  0;
  for(size_t index =  0; index < dense_elements_count; ++index) {
    double value = (double)NUM2DBL(RARRAY_AREF(r_elements, index));
    if(fabs(value -  0.0) <  1e-6)
      continue;
    sparse_elements_count++;
  }
  
  mat->count  = sparse_elements_count;
  mat->elements  =  ALLOC_N(double, mat->count);
  mat->ia  =  ALLOC_N(size_t, mat->count);
  mat->ja  =  ALLOC_N(size_t, mat->count);
  
  size_t element_index =  0, index =  0;
  for(size_t i =  0; i <  mat->shape[0]; ++i) {
    for(size_t j =  0; j <  mat->shape[1]; ++j) {
      double value = (double)NUM2DBL(RARRAY_AREF(r_elements, element_index++));
      if(fabs(value -  0.0) <  1e-6)
        continue;
      mat->elements[index] = value;
      mat->ia[index] = i;
      mat->ja[index] = j;
      index++;
    }
  }
  
  return  Data_Wrap_Struct(COO, NULL, coo_free, mat);
}
  • COO to NMatrix
class  COO

  def to_nmatrix
    ia = self.coords[0]
    ja = self.coords[1]
  
    nm_elements = Array.new(self.shape[0]*self.shape[1], 0)
    for i in (0...self.nzcount)
      nm_index = (self.shape[1] * ia[i]) + ja[i]
      nm_elements[nm_index] = self.elements[i]
    end
  
    m = NMatrix.new self.shape, nm_elements
    return m
  end
end

Numo-Narray

The interfaces for Numo-Narray will also be implemented similarly as NMatrix.

IO support

There will be support to serialize the Sparse storage object to store it on disk and later can be read from the disk. Object serialization in Ruby is also known as Object marshalling.

Serialization support will be implemented as load and save methods in io/serialize.c file.

Tests and Documentation methodology

Tests

RSpec will be used for tests. Tests can be found here.

Tests structure will be as follows:

$LOAD_PATH.unshift File.expand_path('../../lib', __FILE__)
require 'sparse'
require 'minitest/autorun'

class COO::CreationTest < Minitest::Test

  def setup
    @n = RubySparse::COO.new [3, 3], [1, 2, 3], [0, 1, 2], [0, 1, 2]
  end

  def test_ndims
    assert_equal 2, @n.dim
  end

  def test_shape
    assert_equal [3, 3], @n.shape
  end

  def test_elements
    assert_equal [1, 2, 3], @n.elements
  end

  def test_coords
    assert_equal [[0, 1, 2], [0, 1, 2]], @n.coords
  end

  def test_count
    assert_equal 3, @n.nzcount
  end

end


class COO::ElementWiseTest < Minitest::Test

  def setup
    @left = RubySparse::COO.new [3, 3], [1, 2, 3], [0, 1, 2], [0, 1, 2]
    @right = RubySparse::COO.new [3, 3], [3, 2, 3], [0, 1, 2], [2, 2, 2]
  end

  def test_add
    result = RubySparse::COO.new [3, 3], [1, 3, 2, 2, 6], [0, 0, 1, 1, 2], [0, 2, 1, 2, 2]
    answer = @left + @right
    assert_equal answer.dim, result.dim
    assert_equal answer.shape, result.shape
    assert_equal answer.nzcount, result.nzcount
    assert_equal answer.elements, result.elements
    assert_equal answer.coords, result.coords
  end

end

Yard will be used for documentation.

Library usage examples

The example files gives a gist of usage of this library. The syntax is Ruby-like.

Timeline

28th Oct - 6th Nov

  • Implement the RubySparse::COO class with element-wise operations.
  • Implement indexing for RubySparse::COO.
  • Implement Numo::Narray and NMatrix interfaces for RubySparse::COO.

7th Nov - 16th Nov

  • Implement the RubySparse::CSR class with element-wise operations.
  • Implement indexing for RubySparse::CSR.
  • Implement Numo::Narray and NMatrix interfaces for RubySparse::CSR.

17th Nov - 26th Nov

  • Implement the RubySparse::CSC class with element-wise operations.
  • Implement indexing for RubySparse::CSC.
  • Implement Numo::Narray and NMatrix interfaces for RubySparse::CSC.

27th Nov - 6th Dec

  • Implement the RubySparse::DIA class with element-wise operations.
  • Implement indexing for RubySparse::DIA.
  • Implement Numo::Narray and NMatrix interfaces for RubySparse::DIA.

7th Dec - 16th Dec

  • Implement pretty print for RubySparse::COO, RubySparse::CSR, RubySparse::CSC and RubySparse::DIA.
  • Implement resize() and reshape().

17th Dec - 26th Dec

  • Add serialization support for sparse matrix classes.
  • Implement load() and save() methods.
  • Implement tests for sparse matrix element-wise operations.

27th Dec - 5th Jan

  • Add inter sparse classes conversion support.
  • Implement classes such as to_coo(), to_csr(), to_csc() and to_dia().

6th Jan - 13th Jan

  • Add tests for RubySparse::COO, RubySparse::CSR, RubySparse::CSC and RubySparse::DIA.
  • Refactor and optimize code for RubySparse::COO, RubySparse::CSR, RubySparse::CSC and RubySparse::DIA.

Mid term deliverables (14th Jan)

  • Implementation of sparse storage classes, namely, COO, CSR, CSC and DIA.
  • Implement serialization support.
  • Implement interfaces to dense matrix libraries and inter sparse type conversion support.
  • Implement pretty print for all sparse types.

15th Jan - 24th Jan

  • Implement RubySparse::Graph BFS and DFS algorithms.
  • Add depth_first_order() and breadth_first_order() functions.
  • Add depth_first_tree() and breadth_first_tree() functions.
  • Implement Dijkstra's shortest path algorithm.

25th Jan - 3rd Feb

  • Implement Floyd Warshall algorithm.
  • Implement Bellman Ford algorithm.
  • Implement Minimum spanning tree algorithm.
  • Implement Maximum bipartite matching algorithm.

4th Feb - 13th Feb

  • Implement RubySparse::Linalg module.
  • Add matrix-vector, matrix-matrix dot product support.
  • Add sparse matrix linear equation solution support.

14th Feb - 23rd Feb

  • Implement triu() and tril() methods for sparse types classes.
  • Implement hstack() and vstack() methods.

24th Feb - 4th Mar

  • Add tests for RubySparse::Linalg and RubySparse::Graph.
  • Write documentation for ruby sparse API.
  • Refactor and optimize code for RubySparse::Linalg and RubySparse::Graph.

5th Mar - 12th Mar

  • Release ruby-sparse gem.
  • Create benchmarks for the library.
  • Create graph plots for the benchmarks and comparison will other libraries.

Final deliverables (13th Mar)

  • RubySparse::Linalg and RubySparse::Graph modules with proper tests.
  • Documentation for the code written so far.
  • Proper benchmarks for core functionalities.
  • Graph plots of benchmarks and comparison with respect to other similar libraries.

Deliverables

  • Implementation of Sparse storage library.
  • IO (serialization) support for the Sparse storage.
  • Implementation of proper interfaces for dense storage (or dense arrays) libraries, namely, Numo-Narray and NumRuby.
  • Implement a linear algebra and a graph module for sparse matrices.
  • Proper documentation and strong tests.
  • Release of sparse-storage gem.

Future Work

  • The library can be extended to support arbitrary dimensional sparse arrays.
  • More complex linear algebra and graph algorithms can be added if required.
  • Add broadcasting support to sparse matrices.

Technical Advisor

I have asked my GSoC 2019 mentor Prasun Anand to advise me on this project. Prasun works as a software engineer at Quansight and is currently working on the Pytorch library.

https://www.prasunanand.com/about/