From 85d01f55b5ba3de8e90c61abcf29478602c6b7aa Mon Sep 17 00:00:00 2001 From: stevo15025 Date: Tue, 6 Aug 2019 17:10:41 +0300 Subject: [PATCH 01/13] Add 1/2 draft of sparse matrix --- designs/0004-sparse-matrices.md | 312 ++++++++++++++++++++++++++++++++ 1 file changed, 312 insertions(+) create mode 100644 designs/0004-sparse-matrices.md diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md new file mode 100644 index 0000000..1c93966 --- /dev/null +++ b/designs/0004-sparse-matrices.md @@ -0,0 +1,312 @@ +- Feature Name: sparse_matrices +- Start Date: August 1st, 2019 +- RFC PR: (leave this empty) +- Stan Issue: (leave this empty) + +# Summary +[summary]: #summary + +Add a sparse matrix type to the Stan language and sparse matrix operations to Stan-math to utilize operations that can take advantage of the sparsity structure. + +# Motivation +[motivation]: #motivation + +Data for models such as [ICAR](https://mc-stan.org/users/documentation/case-studies/icar_stan.html) come in as sparse matrix. (i.e. a large number of elements in the matrix are zero). There are often methods for storing and computing these matrices which utilize the sparsity structure to give better performance. Currently we have some Compressed Sparse Row (CSR) [methods](https://mc-stan.org/docs/2_19/functions-reference/sparse-matrices.html) for going from dense matrices to simpler representations that ignore zeros and vice versa. Though the only exposed operation is [`csr_matrix_times_vector`](https://mc-stan.org/docs/2_19/functions-reference/sparse-matrix-arithmetic.html). The CSR methods currently supported are limited and require a good deal of fan-dangling from the user. + +A `sparse_matrix` type directly in the stan language would support all existing methods available for `matrix` types. From Dan and Aki's previous [proposal](https://aws1.discourse-cdn.com/standard14/uploads/mc_stan/original/2X/1/13fda4102c8f48e5aadbf0fbe75d3641a187d0a3.pdf) this would include full auto-diff and algebra support for: + +- Addition (returns sparse matrix, possibly with different sparsity structure) +- Sparse matrix transpose (returns sparse matrix, different sparsity structure) +- Sparse matrix-vector multiplication (returns dense vector) +- Sparse matrix-constant multiplication (returns sparse matrix, same sparsity) +- Sparse matrix-matrix multiplication (returns a sparse matrix that likely has a different sparsity structure) +- Sparse matrix-dense matrix multiplication (should return a dense matrix) +- Sparse matrix-diagonal matrix multiplication on the left and right (returns sparse matrix, same sparsity) +- Sparse inner product and quadratic form (returns scalars) +- Operations to move from sparse to dense matrices and vice versa. +- Fill-reducing reorderings +- A sparse Cholesky for a matrix of doubles. +- The computation of the log-determinant as the product of Cholesky diagonals +- Sparse linear solves and sparse triangular solves (for sampling from the marginalized out parameters in the generated quantities block) +- Sparse algorithms to compute the required elements of the inverse. +- An implementation of the reverse mode derivative of the log determinant +- Reverse-mode derivative of a Cholesky decomposition of a sparse matrix. + - This is the same algorithm that was initially used in the dense case before Rob implemented Iain Murray’s blocked version. But it will need to be implemented in a “data structure” aware way +- Specialization for Poisson, binomial and maybe gamma likelihoods. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +Most of the below comes from the [functional-spec](https://github.com/stan-dev/stan/wiki/Functional-Spec:-Sparse-Matrix-Data-Types) for sparse matrices. + +## Data + +Sparse matrix types in the Stan language can be constructed in the data block via Coordinate List Notation using the rows, cols, non-empty row/col indices, and values for those index positions. + +```stan +int N; // Rows +int M; // Cols +int K; // number non-empty +int nonzero_row_index[K]; // Non-empty row positions +int nonzero_col_index[K]; // Non-empty col positions +vector[N] vals; // Values in each position +// Direct way +sparse_matrix[N, M, nonzero_row_index, nonzero_col_index, val] A +// Can we do this? +sparse_matrix[N, M, nonzero_row_index, nonzero_col_index] B; + + +``` + +Alternatively a sparse matrix can be constructed via Compressed Sparse Row (CSR) Notation That's used within Eigen (See appendix below for description on how Eigen handles sparse matrix storage) + +```stan +data { + int N; // rows + int M; // cols + int K; // Number of non-empty values + int starting_row_idx[K]; // Gives index for first non-zero in each row + int column_idx[K]; // Column index for each value + vector[N] vals; // Values in each position + sparse_matrix A; + // Again if we can be clever on input + sparse_matrix B; +} +``` + +Sparse vectors are the same only with a single size an a single index array. + +To users, sparse matrices for data should operate the same way as normal matrices. + +## Transformed Data + +Sparse matrices in these blocks can be defined dynamically and declared such as + +```stan +transformed data { +// Could construct here as well +sparse_matrix[N, M] A = to_sparse_matrix(N, M, nonzero_row_index, nonzero_col_index, vals); + +// Linear Algebra is cool +sparse_matrix[N, N] C = A * A'; +// For data this is fine +C[10, 10] = 100.0; +} +``` + +## Parameters, Transformed Parameters, and Generated Quantities + +Parameters be defined as above for data or deduced from the output of other functions. +```stan +data { + int N; // Rows + int M; // Cols + int K; // # of nonzero elements + int non_zero_x_index[K]; // Indices for nonzero elements + int nz_foo_row_index[K]; // Indices for nonzero elements + int nz_foo_col_index[K]; // Indices for nonzero elements + + sparse_vector[N, non_zero_index] x; +} +parameters { + real rho; + real alpha; + real sigma; + sparse_matrix[N, M, nz_foo_row_index, nz_foo_row_index] foo; +} +transformed parameters { + sparse_matrix[N, N] K = cov_exp_quad(x, alpha, rho); +} + +``` + +The size and non-zero indices for sparse matrices in the must be defined in the data block. This is because Stan's I/O and posterior analysis infrastructure assumes the same full specification in each iteration of the model. + +## Full Example Model + +In my dream mindscape, here is how a user could write a gaussian process that uses sparse matrices. + +```stan +data { + int N; // Vec size + int K; // # of nonzero elements + int non_zero_index[K]; // Indices for nonzero elements + sparse_vector[N, non_zero_index] x; // [1] + vector[N] y; +} +transformed data { + sparse_vector[N] mu = rep_sparse_vector(0, N, non_zero_index); // [1] +} +parameters { + real rho; + real alpha; + real sigma; +} +model { + sparse_cholesky_matrix[N] L_K; // [2] + sparse_matrix[N, N] K = cov_exp_quad(x, alpha, rho); + real sq_sigma = square(sigma); + + // Assign to pre-existing diagonal + for (n in 1:N) + K[n, n] = K[n, n] + sq_sigma; + + L_K = cholesky_decompose(K); + + rho ~ inv_gamma(5, 5); + alpha ~ std_normal(); + sigma ~ std_normal(); + + y ~ multi_normal_cholesky(mu, L_K); +} +``` + +[1] Because Stan is told what values are true zeros vs. sparse zeros we can set this up fine. +[2] The `sparse_cholesky_matrix` holdds the ordering of the sparse matrix after decomposition. + + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## Stan Math + +There are two approaches to coding with Eigen's sparse matrices in Stan-math. + +### The Hard way + +Sparse matrices can be supported in Stan-math by either moving to `EigenBase` as the default in the metaprogramming or by having separate methods for Sparse Matrices. + +Let's look at primitive add for an example. One implementation of `add` in stan math is + +```cpp +template +inline Eigen::Matrix, R, C> add( + const Eigen::Matrix& m1, const Eigen::Matrix& m2) { + check_matching_dims("add", "m1", m1, "m2", m2); + return m1 + m2; +} +``` + +Would change to + +```cpp +template +inline return_derived_obj add( + const Eigen::EigenBase& m1, const Eigen::EigenBase& m2) { + check_matching_dims("add", "m1", m1, "m2", m2); + return m1 + m2; +} +``` + +Where `return_derived_obj` would deduce the correct the correct return type based on the `Scalar` value of the `Derived*` types. This is nice because for places where Eigen supports both dense and sparse operations we do not have to duplicate code. For places where the sparse and dense operations differ we can have sparse matrix tempalte specializations. There has been a lot of discussion on this refactor in the past (see [this](https://groups.google.com/forum/#!topic/stan-dev/ZKYCQ3Y7eY0) Google groups post and [this](https://github.com/stan-dev/math/issues/62) issue). Though looking at the two forms it seems like using `A.coeff()` for access instead of `operator()` would be sufficient to handle the coefficient access error Dan saw. + +### The Simple Way + +If we would rather not refactor the math library we can keep our current templates and have specializations for Sparse matrices. + +```cpp +template +inline Eigen::SparseMatrix, R, C> add( + const Eigen::SparseMatrix& m1, const Eigen::SparseMatrix& m2) { + check_matching_dims("add", "m1", m1, "m2", m2); + return m1 + m2; +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +Doing this improperly could lead to serious amounts of "code smell" aka duplicated code, confusing templates, etc. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +- Why is this design the best in the space of possible designs? + +This is the most full-feature design I can think of that makes sparse matrices simple for stan users. + +- What other designs have been considered and what is the rationale for not choosing them? + +Instead of the full feature spec like the above we could have sparse_matrix only be a result type + +```stan +parameters { + real theta_raw[K]; +... +model { + sparse_matrix[M, N] theta = to_sparse_matrix(M, N, mm, nn, theta_raw); + ... +``` + +- What is the impact of not doing this? + +Kind of a bummer, it will be awkward for users to fit models whose data or parameters are sparsely defined. + +# Prior art +[prior-art]: #prior-art + +Discuss prior art, both the good and the bad, in relation to this proposal. +A few examples of what this can include are: + +- Talk about tensorflow and mxnet. + +- For language, library, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had? +- For community proposals: Is this done by some other community and what were their experiences with it? +- For other teams: What lessons can we learn from what other communities have done here? +- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. + +This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. +If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. + +Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. +Please also take into consideration that rust sometimes intentionally diverges from common language features. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- What parts of the design do you expect to resolve through the RFC process before this gets merged? +- What parts of the design do you expect to resolve through the implementation of this feature before stabilization? +- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? + + +# Appendix: (Eigen Sparse Matrix formats) + +From the Eigen [Sparse Matrix Docs](https://eigen.tuxfamily.org/dox/group__TutorialSparse.html), Eigen matrices use a Compressed Column Scheme (CCS) to represent sparse matrices. + +Sparse matrices in Eigen are stored using four compact arrays. + +> Values: stores the coefficient values of the non-zeros. +> InnerIndices: stores the row (resp. column) indices of the non-zeros. +> OuterStarts: stores for each column (resp. row) the index of the first non-zero in the previous two arrays. +> InnerNNZs: stores the number of non-zeros of each column (resp. row). The word inner refers to an inner vector that is a column for a column-major matrix, or a row for a row-major matrix. The word outer refers to the other direction. + +| row/col | 0 | 1 | 2 | 3 | 4 | +|---------|----|---|----|---|----| +| 0 | 0 | 3 | 0 | 0 | 0 | +| 1 | 22 | 0 | 0 | 0 | 17 | +| 2 | 7 | 5 | 0 | 1 | 0 | +| 3 | 0 | 0 | 0 | 0 | 0 | +| 4 | 0 | 0 | 14 | 0 | 8 | + +| Inner Values: |----|---|---|---|---|----|---|---|---|---|----|---| +|----------------|----|---|---|---|---|----|---|---|---|---|----|---| +| Values: | 22 | 7 | _ | 3 | 5 | 14 | _ | _ | 1 | _ | 17 | 8 | +| InnerIndices: | 1 | 2 | _ | 0 | 2 | 4 | _ | _ | 2 | _ | 1 | 4 | + +| Meta Info: |---|---|---|---|----|----| +|--------------|---|---|---|---|----|----| +| OuterStarts | 0 | 3 | 5 | 8 | 10 | 12 | +| InnerNNZs | 2 | 2 | 1 | 1 | 2 | | + +The `_` indicates available free space to insert new elements. In the above example, 14 sits in the 4th column index (`InnerIndices`) while the 4th column index has 10 (`OuterStarts`) non-zero elements before it (there are 10 including the free space `_` for new elements) The 1st column in the above above has 2 (`InnerNNZs`) elements that are nonzero. The above allows for elements to be inserted inside of the sparse matrix, but can be compressed further with `makeCompressed()` + +|Compressed Vals:|----|---|---|---|----|---|----|---| +|----------------|----|---|---|---|----|---|----|---| +| Values: | 22 | 7 | 3 | 5 | 14 | 1 | 17 | 8 | +| InnerIndices: | 1 | 2 | 0 | 2 | 4 | 2 | 1 | 4 | + +|Meta Info: |---|---|---|---|---|---| +|--------------|---|---|---|---|---|---| +| OuterStarts: | 0 | 2 | 4 | 5 | 6 | 8 | + +This is now in the Compressed Row Format (CRF) where value 14 sits in the 4th row index where the 4th row index has six non-zero elements before it. From 4c9990e60ce611774b35a5839aa5fa06c84b7665 Mon Sep 17 00:00:00 2001 From: stevo15025 Date: Tue, 6 Aug 2019 17:21:46 +0300 Subject: [PATCH 02/13] fixup a bit --- designs/0004-sparse-matrices.md | 47 +++++++++++++++++++++------------ 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index 1c93966..be827e1 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -11,7 +11,7 @@ Add a sparse matrix type to the Stan language and sparse matrix operations to St # Motivation [motivation]: #motivation -Data for models such as [ICAR](https://mc-stan.org/users/documentation/case-studies/icar_stan.html) come in as sparse matrix. (i.e. a large number of elements in the matrix are zero). There are often methods for storing and computing these matrices which utilize the sparsity structure to give better performance. Currently we have some Compressed Sparse Row (CSR) [methods](https://mc-stan.org/docs/2_19/functions-reference/sparse-matrices.html) for going from dense matrices to simpler representations that ignore zeros and vice versa. Though the only exposed operation is [`csr_matrix_times_vector`](https://mc-stan.org/docs/2_19/functions-reference/sparse-matrix-arithmetic.html). The CSR methods currently supported are limited and require a good deal of fan-dangling from the user. +Data for models such as [ICAR](https://mc-stan.org/users/documentation/case-studies/icar_stan.html) come in as sparse matrix. (i.e. a large number of elements in the matrix are zero). There are often methods for storing and computing these matrices which utilize the sparsity structure to give better performance and use less memory. Currently we have some Compressed Sparse Row (CSR) [methods](https://mc-stan.org/docs/2_19/functions-reference/sparse-matrices.html) for going from dense matrices to simpler representations that ignore zeros and vice versa. Though the only exposed operation is [`csr_matrix_times_vector`](https://mc-stan.org/docs/2_19/functions-reference/sparse-matrix-arithmetic.html). The CSR methods currently supported are limited and require a good deal of fan-dangling from the user. A `sparse_matrix` type directly in the stan language would support all existing methods available for `matrix` types. From Dan and Aki's previous [proposal](https://aws1.discourse-cdn.com/standard14/uploads/mc_stan/original/2X/1/13fda4102c8f48e5aadbf0fbe75d3641a187d0a3.pdf) this would include full auto-diff and algebra support for: @@ -41,7 +41,7 @@ Most of the below comes from the [functional-spec](https://github.com/stan-dev/s ## Data -Sparse matrix types in the Stan language can be constructed in the data block via Coordinate List Notation using the rows, cols, non-empty row/col indices, and values for those index positions. +Sparse matrix types in the Stan language can be constructed in the data block via Coordinate List Notation using the row and column sizes, non-empty row/column indices, and values for those index positions. ```stan int N; // Rows @@ -54,11 +54,9 @@ vector[N] vals; // Values in each position sparse_matrix[N, M, nonzero_row_index, nonzero_col_index, val] A // Can we do this? sparse_matrix[N, M, nonzero_row_index, nonzero_col_index] B; - - ``` -Alternatively a sparse matrix can be constructed via Compressed Sparse Row (CSR) Notation That's used within Eigen (See appendix below for description on how Eigen handles sparse matrix storage) +Alternatively, a sparse matrix can be constructed via Compressed Sparse Row (CSR) Notation that's used within Eigen (See appendix below for description on how Eigen handles sparse matrix storage) ```stan data { @@ -74,13 +72,24 @@ data { } ``` -Sparse vectors are the same only with a single size an a single index array. +Sparse vectors are the same only with a single size on a single index array. + +```stan +int N; // Rows +int K; // number non-empty +int nonzero_row_index[K]; // Non-empty row positions +vector[N] vals; // Values in each position +// Direct way +sparse_matrix[N, nonzero_row_index, val] A +// Can we do this? +sparse_vector[N, nonzero_row_index] B; +``` To users, sparse matrices for data should operate the same way as normal matrices. ## Transformed Data -Sparse matrices in these blocks can be defined dynamically and declared such as +Sparse matrices in this blocks can be defined dynamically and declared such as ```stan transformed data { @@ -120,12 +129,11 @@ transformed parameters { ``` -The size and non-zero indices for sparse matrices in the must be defined in the data block. This is because Stan's I/O and posterior analysis infrastructure assumes the same full specification in each iteration of the model. +The size and non-zero indices for sparse matrices in the paramter blocks must be defined in the data block. This is because Stan's I/O and posterior analysis infrastructure assumes the same full specification in each iteration of the model. ## Full Example Model -In my dream mindscape, here is how a user could write a gaussian process that uses sparse matrices. - +Below is a full example model that uses sparse vectors and matrices for data and parameters ```stan data { int N; // Vec size @@ -161,8 +169,8 @@ model { } ``` -[1] Because Stan is told what values are true zeros vs. sparse zeros we can set this up fine. -[2] The `sparse_cholesky_matrix` holdds the ordering of the sparse matrix after decomposition. +- [1] Because Stan is told what values are true zeros vs. sparse zeros we can set this up fine. +- [2] The `sparse_cholesky_matrix` holds the ordering of the sparse matrix after decomposition. # Reference-level explanation @@ -176,7 +184,7 @@ There are two approaches to coding with Eigen's sparse matrices in Stan-math. Sparse matrices can be supported in Stan-math by either moving to `EigenBase` as the default in the metaprogramming or by having separate methods for Sparse Matrices. -Let's look at primitive add for an example. One implementation of `add` in stan math is +Let's look at primitive add for an example. One implementation of `add` in Stan math is ```cpp template @@ -198,21 +206,23 @@ inline return_derived_obj add( } ``` -Where `return_derived_obj` would deduce the correct the correct return type based on the `Scalar` value of the `Derived*` types. This is nice because for places where Eigen supports both dense and sparse operations we do not have to duplicate code. For places where the sparse and dense operations differ we can have sparse matrix tempalte specializations. There has been a lot of discussion on this refactor in the past (see [this](https://groups.google.com/forum/#!topic/stan-dev/ZKYCQ3Y7eY0) Google groups post and [this](https://github.com/stan-dev/math/issues/62) issue). Though looking at the two forms it seems like using `A.coeff()` for access instead of `operator()` would be sufficient to handle the coefficient access error Dan saw. +Where `return_derived_obj` would deduce the correct the correct return type based on the `Scalar` value of the `Derived*` types. This is nice because for places where Eigen supports both dense and sparse operations we do not have to duplicate code. For places where the sparse and dense operations differ we can have sparse matrix template specializations. There has been a lot of discussion on this refactor in the past (see [this](https://groups.google.com/forum/#!topic/stan-dev/ZKYCQ3Y7eY0) Google groups post and [this](https://github.com/stan-dev/math/issues/62) issue). Though looking at the two forms it seems like using `A.coeff()` for access instead of `operator()` would be sufficient to handle the coefficient access error Dan saw. ### The Simple Way If we would rather not refactor the math library we can keep our current templates and have specializations for Sparse matrices. ```cpp -template -inline Eigen::SparseMatrix, R, C> add( - const Eigen::SparseMatrix& m1, const Eigen::SparseMatrix& m2) { +template +inline Eigen::SparseMatrixBase> add( + const Eigen::SparseMatrixBase& m1, const Eigen::SparseMatrixBase& m2) { check_matching_dims("add", "m1", m1, "m2", m2); return m1 + m2; } ``` +## TODO: talk about specialized rev stuff. + # Drawbacks [drawbacks]: #drawbacks @@ -276,8 +286,11 @@ From the Eigen [Sparse Matrix Docs](https://eigen.tuxfamily.org/dox/group__Tutor Sparse matrices in Eigen are stored using four compact arrays. > Values: stores the coefficient values of the non-zeros. + > InnerIndices: stores the row (resp. column) indices of the non-zeros. + > OuterStarts: stores for each column (resp. row) the index of the first non-zero in the previous two arrays. + > InnerNNZs: stores the number of non-zeros of each column (resp. row). The word inner refers to an inner vector that is a column for a column-major matrix, or a row for a row-major matrix. The word outer refers to the other direction. | row/col | 0 | 1 | 2 | 3 | 4 | From 0dc7e3f7d4afa5864f0380632bb6eb63d6ff3a46 Mon Sep 17 00:00:00 2001 From: stevo15025 Date: Wed, 7 Aug 2019 16:20:59 +0300 Subject: [PATCH 03/13] update with more prior art --- designs/0004-sparse-matrices.md | 143 +++++++++++++++++++++++--------- 1 file changed, 102 insertions(+), 41 deletions(-) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index be827e1..213892c 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -39,11 +39,17 @@ A `sparse_matrix` type directly in the stan language would support all existing Most of the below comes from the [functional-spec](https://github.com/stan-dev/stan/wiki/Functional-Spec:-Sparse-Matrix-Data-Types) for sparse matrices. +A sparse matrix/vector differs from a standard matrix/vector only in the underlying representation of the data. + +Below we go over how sparse matrices can be constructed and operated on within each stan block. + + ## Data Sparse matrix types in the Stan language can be constructed in the data block via Coordinate List Notation using the row and column sizes, non-empty row/column indices, and values for those index positions. ```stan +data { int N; // Rows int M; // Cols int K; // number non-empty @@ -54,21 +60,6 @@ vector[N] vals; // Values in each position sparse_matrix[N, M, nonzero_row_index, nonzero_col_index, val] A // Can we do this? sparse_matrix[N, M, nonzero_row_index, nonzero_col_index] B; -``` - -Alternatively, a sparse matrix can be constructed via Compressed Sparse Row (CSR) Notation that's used within Eigen (See appendix below for description on how Eigen handles sparse matrix storage) - -```stan -data { - int N; // rows - int M; // cols - int K; // Number of non-empty values - int starting_row_idx[K]; // Gives index for first non-zero in each row - int column_idx[K]; // Column index for each value - vector[N] vals; // Values in each position - sparse_matrix A; - // Again if we can be clever on input - sparse_matrix B; } ``` @@ -85,11 +76,31 @@ sparse_matrix[N, nonzero_row_index, val] A sparse_vector[N, nonzero_row_index] B; ``` -To users, sparse matrices for data should operate the same way as normal matrices. +The above sparse matrix example with values + +```stan +N = 5 +M = 5 +K = 7 +// Column major order +nonzero_row_index[K] = [2, 3, 1, 3, 5, 3, 2, 5] +nonzero_col_index[K] = [1, 1, 2, 2, 3, 4, 5, 5] +val[K] = [22, 7, 3, 5, 14, 1, 17, 8] +``` + +Would have the dense form of + +| row/col | 1 | 2 | 3 | 4 | 5 | +|---------|----|---|----|---|----| +| 1 | 0 | 3 | 0 | 0 | 0 | +| 2 | 22 | 0 | 0 | 0 | 17 | +| 3 | 7 | 5 | 0 | 1 | 0 | +| 4 | 0 | 0 | 0 | 0 | 0 | +| 5 | 0 | 0 | 14 | 0 | 8 | ## Transformed Data -Sparse matrices in this blocks can be defined dynamically and declared such as +Sparse matrices in this block can be defined dynamically and declared such as ```stan transformed data { @@ -103,9 +114,12 @@ C[10, 10] = 100.0; } ``` +The assignment operation `C[10, 10] = 100.0;` works fine with Eigen as the implementation leaves room in the arrays for quick insertions. + ## Parameters, Transformed Parameters, and Generated Quantities Parameters be defined as above for data or deduced from the output of other functions. + ```stan data { int N; // Rows @@ -121,25 +135,28 @@ parameters { real rho; real alpha; real sigma; - sparse_matrix[N, M, nz_foo_row_index, nz_foo_row_index] foo; + // Defining sparse matrices in parameters needs the non-zero elements + sparse_matrix[N, M, nz_foo_row_index, nz_foo_col_index] foo; } transformed parameters { + // Non-zero elements are deduced by the operation on x sparse_matrix[N, N] K = cov_exp_quad(x, alpha, rho); } ``` -The size and non-zero indices for sparse matrices in the paramter blocks must be defined in the data block. This is because Stan's I/O and posterior analysis infrastructure assumes the same full specification in each iteration of the model. +The size and non-zero indices for sparse matrices in the parameter block must be from either the data block or transformed data block. This is because Stan's I/O and posterior analysis infrastructure assumes the same full specification in each iteration of the model. ## Full Example Model -Below is a full example model that uses sparse vectors and matrices for data and parameters +Below is a full example model that uses sparse vectors and matrices for data and parameters for a gaussian process. + ```stan data { int N; // Vec size int K; // # of nonzero elements int non_zero_index[K]; // Indices for nonzero elements - sparse_vector[N, non_zero_index] x; // [1] + sparse_vector[N, non_zero_index] x; vector[N] y; } transformed data { @@ -170,17 +187,36 @@ model { ``` - [1] Because Stan is told what values are true zeros vs. sparse zeros we can set this up fine. -- [2] The `sparse_cholesky_matrix` holds the ordering of the sparse matrix after decomposition. + +- [2] The `sparse_cholesky_matrix` is optional and holds the permutation of the sparse matrix after decomposition using the `EIGEN_SPARSEMATRIX_BASE_PLUGIN`. Permuting the rows of the sparse matrix before decomposition can often increase performance. Instead of having to recompute the permutation on each call to `cholesky_decompose`, after the first iteration we can keep the [`PermutationMatrix`](https://eigen.tuxfamily.org/dox/classEigen_1_1SimplicialCholeskyBase.html#aff1480e595a21726beaec9a586a94d5a) inside of the `sparse_cholesky_matrix` and reuse it on subsequent operations. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation +## I/O + +Data input formats should not need to change, less R and Python want a particular method that does not exist at this time. + +At the C++ level input data can be constructed to the sparse matrix through soemthing like what Ben did [here](https://github.com/stan-dev/rstanarm/blob/master/inst/include/csr_matrix_times_vector2.hpp#L18) for a revised `csr_matrix_times_vector`. + +```cpp +int outerIndexPtr[cols+1]; +int innerIndices[nnz]; +double values[nnz]; +// read-write (parameters) +Map> sm1(rows, cols, nnz, outerIndexPtr, innerIndices, values); +// read only (data) +Map> sm2(rows, cols, nnz, outerIndexPtr, innerIndices, values); +``` + ## Stan Math -There are two approaches to coding with Eigen's sparse matrices in Stan-math. +### Templating + +There are two approaches to have Stan support Eigen's Sparse Matrix format. -### The Hard way +#### The Hard way Sparse matrices can be supported in Stan-math by either moving to `EigenBase` as the default in the metaprogramming or by having separate methods for Sparse Matrices. @@ -199,14 +235,14 @@ Would change to ```cpp template -inline return_derived_obj add( +inline Eigen::PlainObjectBase> add( const Eigen::EigenBase& m1, const Eigen::EigenBase& m2) { check_matching_dims("add", "m1", m1, "m2", m2); return m1 + m2; } ``` -Where `return_derived_obj` would deduce the correct the correct return type based on the `Scalar` value of the `Derived*` types. This is nice because for places where Eigen supports both dense and sparse operations we do not have to duplicate code. For places where the sparse and dense operations differ we can have sparse matrix template specializations. There has been a lot of discussion on this refactor in the past (see [this](https://groups.google.com/forum/#!topic/stan-dev/ZKYCQ3Y7eY0) Google groups post and [this](https://github.com/stan-dev/math/issues/62) issue). Though looking at the two forms it seems like using `A.coeff()` for access instead of `operator()` would be sufficient to handle the coefficient access error Dan saw. +Where `return_derived_obj` would deduce the correct derived return type based on the `Scalar` value of the `Derived*` types. This is nice because for places where Eigen supports both dense and sparse operations we do not have to duplicate code. For places where the sparse and dense operations differ we can have sparse matrix template specializations. There has been a lot of discussion on this refactor in the past (see [this](https://groups.google.com/forum/#!topic/stan-dev/ZKYCQ3Y7eY0) Google groups post and [this](https://github.com/stan-dev/math/issues/62) issue). Though looking at the two forms it seems like using `A.coeff()` for access instead of `operator()` would be sufficient to handle the coefficient access error Dan saw. ### The Simple Way @@ -221,7 +257,9 @@ inline Eigen::SparseMatrixBase> add( } ``` -## TODO: talk about specialized rev stuff. +### Keeping Permutation Matrix from Cholesky + +`SimplicalCholeskybase` keeps the permutation matrix, when the user uses a `sparse_cholesky_matrix` we can pull out this permutation matrix and keep it to use in the next iteration. We do this through `EIGEN_SPARSEMATRIX_BASE_PLUGIN`, adding the permutation matrix to the input matrix. This adds a bito of state, but assuming the sparse matrices are fixed in size and sparsity this should be fine. # Drawbacks [drawbacks]: #drawbacks @@ -233,7 +271,7 @@ Doing this improperly could lead to serious amounts of "code smell" aka duplicat - Why is this design the best in the space of possible designs? -This is the most full-feature design I can think of that makes sparse matrices simple for stan users. +This is the most full-feature design I can think of that makes sparse matrices simple for stan users while giving the stan-math library a lot of flexibility. - What other designs have been considered and what is the rationale for not choosing them? @@ -248,6 +286,8 @@ model { ... ``` +Or alternativly we can extend the `csr_*` functions, though this requires a lot of recomputation. + - What is the impact of not doing this? Kind of a bummer, it will be awkward for users to fit models whose data or parameters are sparsely defined. @@ -255,28 +295,49 @@ Kind of a bummer, it will be awkward for users to fit models whose data or param # Prior art [prior-art]: #prior-art -Discuss prior art, both the good and the bad, in relation to this proposal. -A few examples of what this can include are: +There has been a _lot_ of previous discussion about sparse matrices in Stan which are listed below in no particular order. -- Talk about tensorflow and mxnet. +- [Sparse Matrix Roadmap](https://discourse.mc-stan.org/t/sparse-matrix-roadmap/5493/24) +- [Sparse Matrix Functionality without the Sparse Matrix](https://discourse.mc-stan.org/t/sparse-model-matrix-functionality-without-the-sparse-matrix/5759) +- [Spline Fitting Demo in Comparison of Sparse Vs. Non-Sparse](https://discourse.mc-stan.org/t/spline-fitting-demo-inc-comparison-of-sparse-vs-non-sparse/3287) +- [Should Sparse Matrices in Stan be Row Major or Column Major](https://discourse.mc-stan.org/t/should-sparse-matrices-in-stan-be-row-major-or-column-major/1563) +- [A Proposal for Sparse Matrices and GPs in Stan](https://discourse.mc-stan.org/t/a-proposal-for-sparse-matrices-and-gps-in-stan/2183) +- [Sparse Matrix Use Cases (Resolving Declaration Issues)](https://discourse.mc-stan.org/t/a-proposal-for-sparse-matrices-and-gps-in-stan/2183) +- [Sparse Matrices in Eigen with Autodiff](https://discourse.mc-stan.org/t/sparse-matrices-in-eigen-with-autodiff/3324/6) +- [Sparse Matrix Functional Spec](https://discourse.mc-stan.org/t/sparse-matrix-functional-spec/109) +- [Sparse Matrices in Stan](https://discourse.mc-stan.org/t/sparse-matrices-in-stan/3129) -- For language, library, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had? -- For community proposals: Is this done by some other community and what were their experiences with it? -- For other teams: What lessons can we learn from what other communities have done here? -- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. +### Tensorflow -This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. -If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. +Tensorflow uses the same data storage schema inside of [`tf.sparse.SparseTensor`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/sparse/SparseTensor) with a limited amount of specific [methods](https://www.tensorflow.org/api_docs/python/tf/sparse). It does not seem that they have [Sparse Cholesky support](https://github.com/tensorflow/tensorflow/issues/15910). -Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. -Please also take into consideration that rust sometimes intentionally diverges from common language features. +It seems like OpenAI has methods like matrix multiplication for block sparse matrices (Sparse matrices with dense sub-blocks) in tensorflow available on [github](https://github.com/openai/blocksparse). + +### Keras + +Both Keras and Tensorflow Sparse support is very [low level](https://stackoverflow.com/a/43188970). Though it seems like work is going on for RaggedTensors and SparseTensors in [Keras](https://github.com/tensorflow/tensorflow/issues/27170#issuecomment-509296615). + +### MxNet + +MxNets [`portri`](https://mxnet.incubator.apache.org/api/python/symbol/linalg.html#mxnet.symbol.linalg.potrf) (Cholesky) function can handle sparse arrays, but there is not a lot of documentation saying that explicitly. Besides that they also store things in the compressed storage schema. + +### PyTorch + +Pytorch has an experimental section for limited sparse tensor operations in [`torch.sparse`](https://pytorch.org/docs/stable/sparse.html). + + +It looks like sparse matrices has been difficult for a lot of groups! # Unresolved questions [unresolved-questions]: #unresolved-questions - What parts of the design do you expect to resolve through the RFC process before this gets merged? -- What parts of the design do you expect to resolve through the implementation of this feature before stabilization? -- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? + +- Language Design +- How this will work with the new Stan compiler? + - Can we deduce the sparse matrix non-zero indices for some functions like I have in the above? +- Will the cost of the refactor of Stan math to EigenBase worth it?. +- Any prior art that I've missed? # Appendix: (Eigen Sparse Matrix formats) From acac6897a07a9d3ee603797038bd05b3250274ae Mon Sep 17 00:00:00 2001 From: Steve Bronder Date: Thu, 8 Aug 2019 20:50:49 +0300 Subject: [PATCH 04/13] Update 0004-sparse-matrices.md --- designs/0004-sparse-matrices.md | 36 +-------------------------------- 1 file changed, 1 insertion(+), 35 deletions(-) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index 213892c..4153a8e 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -212,41 +212,7 @@ Map> sm2(rows, cols, nnz, outerIndexPtr, innerIndices ## Stan Math -### Templating - -There are two approaches to have Stan support Eigen's Sparse Matrix format. - -#### The Hard way - -Sparse matrices can be supported in Stan-math by either moving to `EigenBase` as the default in the metaprogramming or by having separate methods for Sparse Matrices. - -Let's look at primitive add for an example. One implementation of `add` in Stan math is - -```cpp -template -inline Eigen::Matrix, R, C> add( - const Eigen::Matrix& m1, const Eigen::Matrix& m2) { - check_matching_dims("add", "m1", m1, "m2", m2); - return m1 + m2; -} -``` - -Would change to - -```cpp -template -inline Eigen::PlainObjectBase> add( - const Eigen::EigenBase& m1, const Eigen::EigenBase& m2) { - check_matching_dims("add", "m1", m1, "m2", m2); - return m1 + m2; -} -``` - -Where `return_derived_obj` would deduce the correct derived return type based on the `Scalar` value of the `Derived*` types. This is nice because for places where Eigen supports both dense and sparse operations we do not have to duplicate code. For places where the sparse and dense operations differ we can have sparse matrix template specializations. There has been a lot of discussion on this refactor in the past (see [this](https://groups.google.com/forum/#!topic/stan-dev/ZKYCQ3Y7eY0) Google groups post and [this](https://github.com/stan-dev/math/issues/62) issue). Though looking at the two forms it seems like using `A.coeff()` for access instead of `operator()` would be sufficient to handle the coefficient access error Dan saw. - -### The Simple Way - -If we would rather not refactor the math library we can keep our current templates and have specializations for Sparse matrices. +Eigen has already done a lot here, we can steal most of the functionality from them like so ```cpp template From ed49dfb22e0c041b730b1fd8ceb8557183ac5af0 Mon Sep 17 00:00:00 2001 From: Steve Bronder Date: Fri, 9 Aug 2019 13:58:22 +0300 Subject: [PATCH 05/13] Add stuff so we can use only one function --- designs/0004-sparse-matrices.md | 37 ++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index 4153a8e..77e607c 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -212,8 +212,43 @@ Map> sm2(rows, cols, nnz, outerIndexPtr, innerIndices ## Stan Math -Eigen has already done a lot here, we can steal most of the functionality from them like so +### Templating +There are two approaches to have Stan support Eigen's Sparse Matrix format. + +#### The Hard way + +Sparse matrices can be supported in Stan-math by either moving to `EigenBase` as the default in the metaprogramming or by having separate methods for Sparse Matrices. + +Let's look at primitive add for an example. One implementation of `add` in Stan math is + +```cpp +template +inline Eigen::Matrix, R, C> add( + const Eigen::Matrix& m1, const Eigen::Matrix& m2) { + check_matching_dims("add", "m1", m1, "m2", m2); + return m1 + m2; +} +``` + +Based on [this](https://stackoverflow.com/questions/57426417/function-that-accepts-both-eigen-dense-and-sparse-matrices/57427407#57427407) stackoverflow post we can do something like the following to allow this to handle both sparse and dense matrices. + +```cpp +template +eigen_return_t add(const Eigen::EigenBase& A_, + const Eigen::EigenBase& B_) { + // Pull out the Derived type + Derived1 const& A = A_.derived(); + Derived2 const& B = B_.derived(); + return ((A+B).eval()).transpose(); +} +``` + +Where `return_derived_t` would deduce the correct derived return type based on the `Scalar` value of the `Derived*` types. This is nice because for places where Eigen supports both dense and sparse operations we do not have to duplicate code. For places where the sparse and dense operations differ we can have sparse matrix template specializations. There has been a lot of discussion on this refactor in the past (see [this](https://groups.google.com/forum/#!topic/stan-dev/ZKYCQ3Y7eY0) Google groups post and [this](https://github.com/stan-dev/math/issues/62) issue). Though looking at the two forms it seems like using `A.coeff()` for access instead of `operator()` would be sufficient to handle the coefficient access error Dan saw. + +### The Simple Way + +If we would rather not refactor the math library we can keep our current templates and have specializations for Sparse matrices. ```cpp template inline Eigen::SparseMatrixBase> add( From 38ed2f3449da8087268b0927f48f1124f6ad40b8 Mon Sep 17 00:00:00 2001 From: stevo15025 Date: Tue, 13 Aug 2019 10:46:58 +0300 Subject: [PATCH 06/13] Fix some grammar and add links --- designs/0004-sparse-matrices.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index 213892c..893a51a 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -46,7 +46,7 @@ Below we go over how sparse matrices can be constructed and operated on within e ## Data -Sparse matrix types in the Stan language can be constructed in the data block via Coordinate List Notation using the row and column sizes, non-empty row/column indices, and values for those index positions. +Sparse matrix types in the Stan language can be constructed in the data block via [Coordinate List](https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)) notation using the row and column sizes, non-empty row/column indices, and values for those index positions. ```stan data { @@ -57,7 +57,7 @@ int nonzero_row_index[K]; // Non-empty row positions int nonzero_col_index[K]; // Non-empty col positions vector[N] vals; // Values in each position // Direct way -sparse_matrix[N, M, nonzero_row_index, nonzero_col_index, val] A +sparse_matrix[N, M, nonzero_row_index, nonzero_col_index, vals] A // Can we do this? sparse_matrix[N, M, nonzero_row_index, nonzero_col_index] B; } @@ -90,7 +90,7 @@ val[K] = [22, 7, 3, 5, 14, 1, 17, 8] Would have the dense form of -| row/col | 1 | 2 | 3 | 4 | 5 | +| col/row | 1 | 2 | 3 | 4 | 5 | |---------|----|---|----|---|----| | 1 | 0 | 3 | 0 | 0 | 0 | | 2 | 22 | 0 | 0 | 0 | 17 | @@ -105,7 +105,8 @@ Sparse matrices in this block can be defined dynamically and declared such as ```stan transformed data { // Could construct here as well -sparse_matrix[N, M] A = to_sparse_matrix(N, M, nonzero_row_index, nonzero_col_index, vals); +sparse_matrix[N, M, nonzero_row_index, nonzero_col_index] A = + to_sparse_matrix(N, M, nonzero_row_index, nonzero_col_index, vals); // Linear Algebra is cool sparse_matrix[N, N] C = A * A'; @@ -201,13 +202,13 @@ Data input formats should not need to change, less R and Python want a particula At the C++ level input data can be constructed to the sparse matrix through soemthing like what Ben did [here](https://github.com/stan-dev/rstanarm/blob/master/inst/include/csr_matrix_times_vector2.hpp#L18) for a revised `csr_matrix_times_vector`. ```cpp -int outerIndexPtr[cols+1]; +int outerIndex[cols+1]; int innerIndices[nnz]; double values[nnz]; // read-write (parameters) -Map> sm1(rows, cols, nnz, outerIndexPtr, innerIndices, values); +Map> sm1(rows, cols, nnz, outerIndex, innerIndices, values); // read only (data) -Map> sm2(rows, cols, nnz, outerIndexPtr, innerIndices, values); +Map> sm2(rows, cols, nnz, outerIndex, innerIndices, values); ``` ## Stan Math From 5b85083cdb7d2b86be5051dfbcbe3f3824673299 Mon Sep 17 00:00:00 2001 From: stevo15025 Date: Mon, 26 Aug 2019 11:44:47 +0300 Subject: [PATCH 07/13] update doc --- designs/0004-sparse-matrices.md | 167 ++++++++++++++------------------ 1 file changed, 71 insertions(+), 96 deletions(-) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index b86eae2..d7cfe2c 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -6,7 +6,7 @@ # Summary [summary]: #summary -Add a sparse matrix type to the Stan language and sparse matrix operations to Stan-math to utilize operations that can take advantage of the sparsity structure. +Add a sparse matrix type to the Stan language and sparse matrix operations to Stan math to utilize operations that can take advantage of the matrices sparsity structure. # Motivation [motivation]: #motivation @@ -46,34 +46,37 @@ Below we go over how sparse matrices can be constructed and operated on within e ## Data -Sparse matrix types in the Stan language can be constructed in the data block via [Coordinate List](https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)) notation using the row and column sizes, non-empty row/column indices, and values for those index positions. +Sparse matrix types in the Stan language can be constructed in the data block via [Coordinate List](https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)) notation using the row and column sizes, non-empty row/column indices, and values for those index positions. The non-zero (NZ) row and column indices are brought in as bounds in the `<>`. Setting these as bounds expands the definition of what can go into `<>` and should be discussed more. ```stan data { int N; // Rows int M; // Cols int K; // number non-empty -int nonzero_row_index[K]; // Non-empty row positions -int nonzero_col_index[K]; // Non-empty col positions -vector[N] vals; // Values in each position -// Direct way -sparse_matrix[N, M, nonzero_row_index, nonzero_col_index, vals] A -// Can we do this? -sparse_matrix[N, M, nonzero_row_index, nonzero_col_index] B; +int nz_row_ind[K]; // Non-empty row positions +int nz_col_ind[K]; // Non-empty col positions + +sparse_matrix[N, M] A } ``` -Sparse vectors are the same only with a single size on a single index array. +With proper IO the data sparsity pattern can be brought in automatically via a json or rdump list of lists. +```stan +data { +int N; // Rows +int M; // Cols +sparse_matrix[N, M] A +} +``` + +Though not necessary, we can also add sparse vectors. ```stan int N; // Rows int K; // number non-empty -int nonzero_row_index[K]; // Non-empty row positions -vector[N] vals; // Values in each position -// Direct way -sparse_matrix[N, nonzero_row_index, val] A +int nz_row_ind[K]; // Non-empty row positions // Can we do this? -sparse_vector[N, nonzero_row_index] B; +sparse_vector[N] B; ``` The above sparse matrix example with values @@ -83,9 +86,9 @@ N = 5 M = 5 K = 7 // Column major order -nonzero_row_index[K] = [2, 3, 1, 3, 5, 3, 2, 5] -nonzero_col_index[K] = [1, 1, 2, 2, 3, 4, 5, 5] -val[K] = [22, 7, 3, 5, 14, 1, 17, 8] +nonzero_row_index[K] = [ 2, 3, 1, 3, 5, 3, 2, 5] +nonzero_col_index[K] = [ 1, 1, 2, 2, 3, 4, 5, 5] +val[K] = [22, 7, 3, 5, 14, 1, 17, 8] ``` Would have the dense form of @@ -105,8 +108,8 @@ Sparse matrices in this block can be defined dynamically and declared such as ```stan transformed data { // Could construct here as well -sparse_matrix[N, M, nonzero_row_index, nonzero_col_index] A = - to_sparse_matrix(N, M, nonzero_row_index, nonzero_col_index, vals); +sparse_matrix[N, M] A = + to_sparse_matrix(N, M, nz_col_ind, nz_col_ind, vals); // Linear Algebra is cool sparse_matrix[N, N] C = A * A'; @@ -115,29 +118,18 @@ C[10, 10] = 100.0; } ``` -The assignment operation `C[10, 10] = 100.0;` works fine with Eigen as the implementation leaves room in the arrays for quick insertions. +The assignment operation `C[10, 10] = 100.0;` works fine with Eigen as the implementation leaves room in the arrays for quick insertions. Though because the rest of Stan assumes the amount of coefficients are fixed this should be the only block where sparse matrix access to elements defined as zero valued in the bounds should be allowed. + +The because the sparsity pattern is given in the bounds `<>` the above multiplication result `C` will have it's own sparsity pattern deduced from the result of `A * A'`. This is a concern of Eigen and Stan-math and I'm pretty sure would not effect the Stan language. ## Parameters, Transformed Parameters, and Generated Quantities -Parameters be defined as above for data or deduced from the output of other functions. +Parameters can be defined as above for data or deduced from the output of other functions. ```stan -data { - int N; // Rows - int M; // Cols - int K; // # of nonzero elements - int non_zero_x_index[K]; // Indices for nonzero elements - int nz_foo_row_index[K]; // Indices for nonzero elements - int nz_foo_col_index[K]; // Indices for nonzero elements - - sparse_vector[N, non_zero_index] x; -} parameters { - real rho; - real alpha; - real sigma; // Defining sparse matrices in parameters needs the non-zero elements - sparse_matrix[N, M, nz_foo_row_index, nz_foo_col_index] foo; + sparse_matrix[N, M] foo; } transformed parameters { // Non-zero elements are deduced by the operation on x @@ -148,56 +140,35 @@ transformed parameters { The size and non-zero indices for sparse matrices in the parameter block must be from either the data block or transformed data block. This is because Stan's I/O and posterior analysis infrastructure assumes the same full specification in each iteration of the model. -## Full Example Model +## Helper Functions -Below is a full example model that uses sparse vectors and matrices for data and parameters for a gaussian process. +We can also include helper functions to extract the sparsity pattern's row and column information. ```stan -data { - int N; // Vec size - int K; // # of nonzero elements - int non_zero_index[K]; // Indices for nonzero elements - sparse_vector[N, non_zero_index] x; - vector[N] y; -} -transformed data { - sparse_vector[N] mu = rep_sparse_vector(0, N, non_zero_index); // [1] -} -parameters { - real rho; - real alpha; - real sigma; -} -model { - sparse_cholesky_matrix[N] L_K; // [2] - sparse_matrix[N, N] K = cov_exp_quad(x, alpha, rho); - real sq_sigma = square(sigma); - - // Assign to pre-existing diagonal - for (n in 1:N) - K[n, n] = K[n, n] + sq_sigma; - - L_K = cholesky_decompose(K); - - rho ~ inv_gamma(5, 5); - alpha ~ std_normal(); - sigma ~ std_normal(); - - y ~ multi_normal_cholesky(mu, L_K); -} +int K = num_nz_elements(x); +// Extract effectively a tuple representation of the sparse matrix. +matrix[K, 3] = get_nz_elements(x); ``` -- [1] Because Stan is told what values are true zeros vs. sparse zeros we can set this up fine. - -- [2] The `sparse_cholesky_matrix` is optional and holds the permutation of the sparse matrix after decomposition using the `EIGEN_SPARSEMATRIX_BASE_PLUGIN`. Permuting the rows of the sparse matrix before decomposition can often increase performance. Instead of having to recompute the permutation on each call to `cholesky_decompose`, after the first iteration we can keep the [`PermutationMatrix`](https://eigen.tuxfamily.org/dox/classEigen_1_1SimplicialCholeskyBase.html#aff1480e595a21726beaec9a586a94d5a) inside of the `sparse_cholesky_matrix` and reuse it on subsequent operations. - - # Reference-level explanation [reference-level-explanation]: #reference-level-explanation ## I/O -Data input formats should not need to change, less R and Python want a particular method that does not exist at this time. +Data can be read in from json via a list of lists and the Rdump from the equivalent. + +```js +{ + ... + {'col': 1, 'row': 20, 'val': 1.85}, + {'col': 11, 'row': 3, 'val': 2.71}, + ... +} +``` + +```R +X <- list(..., list(1, 20, 1.85), list(11, 3, 2.71),...) +``` At the C++ level input data can be constructed to the sparse matrix through soemthing like what Ben did [here](https://github.com/stan-dev/rstanarm/blob/master/inst/include/csr_matrix_times_vector2.hpp#L18) for a revised `csr_matrix_times_vector`. @@ -232,20 +203,19 @@ inline Eigen::Matrix, R, C> add( } ``` -Based on [this](https://stackoverflow.com/questions/57426417/function-that-accepts-both-eigen-dense-and-sparse-matrices/57427407#57427407) stackoverflow post we can do something like the following to allow this to handle both sparse and dense matrices. +We can use a form of pseudo-concepts to write something like the following for functions which share both Eigen dense and sparse matrix equivalents. ```cpp -template -eigen_return_t add(const Eigen::EigenBase& A_, - const Eigen::EigenBase& B_) { - // Pull out the Derived type - Derived1 const& A = A_.derived(); - Derived2 const& B = B_.derived(); - return ((A+B).eval()).transpose(); +template > +inline auto add(Mat1&& m1, Mat2&& m2) { + return (m1 + m2).eval(); } ``` -Where `return_derived_t` would deduce the correct derived return type based on the `Scalar` value of the `Derived*` types. This is nice because for places where Eigen supports both dense and sparse operations we do not have to duplicate code. For places where the sparse and dense operations differ we can have sparse matrix template specializations. There has been a lot of discussion on this refactor in the past (see [this](https://groups.google.com/forum/#!topic/stan-dev/ZKYCQ3Y7eY0) Google groups post and [this](https://github.com/stan-dev/math/issues/62) issue). Though looking at the two forms it seems like using `A.coeff()` for access instead of `operator()` would be sufficient to handle the coefficient access error Dan saw. +In the above, the return type without the `.eval()` will be something of the form `Eigen::CwiseBinaryOp` which all of the other math functions would have to also take in. Adding a `.eval()` at the end will force the matrix to evaluate to either a sparse or dense matrix. Once more of Stan math can take in the Eigen expression types we can remove the `.eval()` at the end. + +For places where the sparse and dense operations differ we can have dense/sparse matrix template specializations. There has been a lot of discussion on this refactor in the past (see [this](https://groups.google.com/forum/#!topic/stan-dev/ZKYCQ3Y7eY0) Google groups post and [this](https://github.com/stan-dev/math/issues/62) issue). Though looking at the two forms it seems like using `A.coeff()` for access instead of `operator()` would be sufficient to handle the coefficient access error Dan saw. + ### The Simple Way @@ -261,7 +231,7 @@ inline Eigen::SparseMatrixBase> add( ### Keeping Permutation Matrix from Cholesky -`SimplicalCholeskybase` keeps the permutation matrix, when the user uses a `sparse_cholesky_matrix` we can pull out this permutation matrix and keep it to use in the next iteration. We do this through `EIGEN_SPARSEMATRIX_BASE_PLUGIN`, adding the permutation matrix to the input matrix. This adds a bito of state, but assuming the sparse matrices are fixed in size and sparsity this should be fine. +`SimplicalCholeskybase` keeps the permutation matrix, when the user uses a `sparse_cholesky_matrix` we can pull out this permutation matrix and keep it to use in the next iteration. We do this through `EIGEN_SPARSEMATRIX_BASE_PLUGIN`, adding the permutation matrix to the input matrix. This adds a bit of state, but assuming the sparse matrices are fixed in size and sparsity this should be fine. # Drawbacks [drawbacks]: #drawbacks @@ -273,22 +243,27 @@ Doing this improperly could lead to serious amounts of "code smell" aka duplicat - Why is this design the best in the space of possible designs? -This is the most full-feature design I can think of that makes sparse matrices simple for stan users while giving the stan-math library a lot of flexibility. +This is the most full fledged design that seems to have a reasonable consensus so far. - What other designs have been considered and what is the rationale for not choosing them? -Instead of the full feature spec like the above we could have sparse_matrix only be a result type +There are two other designs possible for the stan language. within the `[]` operator or as a new concept called an attribute. ```stan -parameters { - real theta_raw[K]; -... -model { - sparse_matrix[M, N] theta = to_sparse_matrix(M, N, mm, nn, theta_raw); - ... +// with operator[] +sparse_matrix[N, M, nz_rows_ind, nz_cols_ind, vals] A; + +// With attribute +@sparse(nz_rows=nz_rows_ind, nz_cols=nz_cols_ind) Matrix[N, M] A; +``` + +The attribute method would require it's own design document, though it would allow a separation between computational and statistical concerns. Multiple attributes could also be placed into a tuple such as + +```stan +(@sparse(nz_rows=nz_rows_ind, nz_cols=nz_cols_ind), @opencl) Matrix[N, M] A; ``` -Or alternativly we can extend the `csr_*` functions, though this requires a lot of recomputation. +Another alternative would be to extend the `csr_*` functions, though this requires a lot of recomputation. - What is the impact of not doing this? @@ -338,7 +313,7 @@ It looks like sparse matrices has been difficult for a lot of groups! - Language Design - How this will work with the new Stan compiler? - Can we deduce the sparse matrix non-zero indices for some functions like I have in the above? -- Will the cost of the refactor of Stan math to EigenBase worth it?. +- Will the cost of the refactor of Stan math to a more generic form be worth it?. - Any prior art that I've missed? From 8c65419af2eddfb3e5404913e6c59cfbbf75b0f0 Mon Sep 17 00:00:00 2001 From: stevo15025 Date: Mon, 26 Aug 2019 12:50:19 +0300 Subject: [PATCH 08/13] update docs --- designs/0004-sparse-matrices.md | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index d7cfe2c..4745aed 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -120,7 +120,7 @@ C[10, 10] = 100.0; The assignment operation `C[10, 10] = 100.0;` works fine with Eigen as the implementation leaves room in the arrays for quick insertions. Though because the rest of Stan assumes the amount of coefficients are fixed this should be the only block where sparse matrix access to elements defined as zero valued in the bounds should be allowed. -The because the sparsity pattern is given in the bounds `<>` the above multiplication result `C` will have it's own sparsity pattern deduced from the result of `A * A'`. This is a concern of Eigen and Stan-math and I'm pretty sure would not effect the Stan language. +Because the sparsity pattern is given in the bounds `<>` the above multiplication result `C` will have it's own sparsity pattern deduced from the result of `A * A'`. This is a concern of Eigen and Stan-math and I'm pretty sure would not need anything particular in the stan compiler. ## Parameters, Transformed Parameters, and Generated Quantities @@ -138,7 +138,7 @@ transformed parameters { ``` -The size and non-zero indices for sparse matrices in the parameter block must be from either the data block or transformed data block. This is because Stan's I/O and posterior analysis infrastructure assumes the same full specification in each iteration of the model. +The size and non-zero indices for sparse matrices in the parameter block must be from either the data block or transformed data block. This is because Stan's I/O and posterior analysis infrastructure assumes the same sparsity pattern in each iteration of the model. ## Helper Functions @@ -186,11 +186,10 @@ Map> sm2(rows, cols, nnz, outerIndex, innerIndices, v ### Templating -There are two approaches to have Stan support Eigen's Sparse Matrix format. +Sparse matrices can be supported in Stan-math by either moving to more thorough templating along with pseudo-concepts in the metaprogramming or by having separate methods for Sparse Matrices. #### The Hard way -Sparse matrices can be supported in Stan-math by either moving to `EigenBase` as the default in the metaprogramming or by having separate methods for Sparse Matrices. Let's look at primitive add for an example. One implementation of `add` in Stan math is @@ -203,16 +202,17 @@ inline Eigen::Matrix, R, C> add( } ``` -We can use a form of pseudo-concepts to write something like the following for functions which share both Eigen dense and sparse matrix equivalents. +We can use a form of pseudo-concepts to write something like the following for functions which share the same Eigen dense and sparse methods. ```cpp template > inline auto add(Mat1&& m1, Mat2&& m2) { + check_matching_dims("add", "m1", m1, "m2", m2); return (m1 + m2).eval(); } ``` -In the above, the return type without the `.eval()` will be something of the form `Eigen::CwiseBinaryOp` which all of the other math functions would have to also take in. Adding a `.eval()` at the end will force the matrix to evaluate to either a sparse or dense matrix. Once more of Stan math can take in the Eigen expression types we can remove the `.eval()` at the end. +In the above, the return type without the `.eval()` will be something of the form `Eigen::CwiseBinaryOp<...>` which all of the other math functions would have to also take in. Adding a `.eval()` at the end will force the matrix to evaluate to either a sparse or dense matrix. Once more of Stan math can take in the Eigen expression types we can remove the `.eval()` at the end. For places where the sparse and dense operations differ we can have dense/sparse matrix template specializations. There has been a lot of discussion on this refactor in the past (see [this](https://groups.google.com/forum/#!topic/stan-dev/ZKYCQ3Y7eY0) Google groups post and [this](https://github.com/stan-dev/math/issues/62) issue). Though looking at the two forms it seems like using `A.coeff()` for access instead of `operator()` would be sufficient to handle the coefficient access error Dan saw. @@ -231,7 +231,7 @@ inline Eigen::SparseMatrixBase> add( ### Keeping Permutation Matrix from Cholesky -`SimplicalCholeskybase` keeps the permutation matrix, when the user uses a `sparse_cholesky_matrix` we can pull out this permutation matrix and keep it to use in the next iteration. We do this through `EIGEN_SPARSEMATRIX_BASE_PLUGIN`, adding the permutation matrix to the input matrix. This adds a bit of state, but assuming the sparse matrices are fixed in size and sparsity this should be fine. +`SimplicalCholeskybase` keeps the permutation matrix, when the user does a cholesky decomposition we can pull out this permutation matrix and keep it to use in the next iteration. We do this through `EIGEN_SPARSEMATRIX_BASE_PLUGIN`, adding the permutation matrix to the input matrix. This adds a bit of state, but assuming the sparse matrices are fixed in size and sparsity this should be fine. # Drawbacks [drawbacks]: #drawbacks @@ -303,7 +303,9 @@ MxNets [`portri`](https://mxnet.incubator.apache.org/api/python/symbol/linalg.ht Pytorch has an experimental section for limited sparse tensor operations in [`torch.sparse`](https://pytorch.org/docs/stable/sparse.html). -It looks like sparse matrices has been difficult for a lot of groups! +### Template Model Builder (TMB) + +TMB uses a forked version of ADcpp for auto-diff and from their arvix paper seems to include sparse matrix types as well. Arvix [here](https://arxiv.org/pdf/1509.00660.pdf). # Unresolved questions [unresolved-questions]: #unresolved-questions From 8596f22c486e33c762c52169ae908bf8988d6522 Mon Sep 17 00:00:00 2001 From: stevo15025 Date: Mon, 26 Aug 2019 12:54:25 +0300 Subject: [PATCH 09/13] update docs --- designs/0004-sparse-matrices.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index 4745aed..7a2586b 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -314,11 +314,10 @@ TMB uses a forked version of ADcpp for auto-diff and from their arvix paper seem - Language Design - How this will work with the new Stan compiler? - - Can we deduce the sparse matrix non-zero indices for some functions like I have in the above? +- Right now it appears the need for the same number of coefficients in each iteration is because of historically how we have done our analysis stuff. Are there also statistical concerns? - Will the cost of the refactor of Stan math to a more generic form be worth it?. - Any prior art that I've missed? - # Appendix: (Eigen Sparse Matrix formats) From the Eigen [Sparse Matrix Docs](https://eigen.tuxfamily.org/dox/group__TutorialSparse.html), Eigen matrices use a Compressed Column Scheme (CCS) to represent sparse matrices. From e322c9d567ac7075e6e54df8c6b97f776380dc48 Mon Sep 17 00:00:00 2001 From: stevo15025 Date: Mon, 26 Aug 2019 13:31:38 +0300 Subject: [PATCH 10/13] add latest discussion to prior art --- designs/0004-sparse-matrices.md | 1 + 1 file changed, 1 insertion(+) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index 7a2586b..d912db7 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -283,6 +283,7 @@ There has been a _lot_ of previous discussion about sparse matrices in Stan whic - [Sparse Matrices in Eigen with Autodiff](https://discourse.mc-stan.org/t/sparse-matrices-in-eigen-with-autodiff/3324/6) - [Sparse Matrix Functional Spec](https://discourse.mc-stan.org/t/sparse-matrix-functional-spec/109) - [Sparse Matrices in Stan](https://discourse.mc-stan.org/t/sparse-matrices-in-stan/3129) +- [Sparse Matrices discussion that took over anothe thread](https://discourse.mc-stan.org/t/stancon-cambridge-developer-meeting-opportunities/9743/46?u=stevo15025) ### Tensorflow From a938b45eff9a5869933e73057ad9694cd0c51c1e Mon Sep 17 00:00:00 2001 From: stevo15025 Date: Thu, 26 Sep 2019 23:34:39 +0300 Subject: [PATCH 11/13] Add updates from conversation with Dan and Aki on Sept 26th. Meeting notes: - When we declare a sparse matrix the sparsity structure should be fixed. - How to declare a sparse matrix without assigning a value - Need to make our own sparse matrix that holds the sparsity pattern - No explicit indexing for wip - Sparse matrix in data block, not params or transformed params block - Do ordering for all square matrices. --- designs/0004-sparse-matrices.md | 40 ++++++++++++++++++++++++--------- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index d912db7..7422aaf 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -39,14 +39,27 @@ A `sparse_matrix` type directly in the stan language would support all existing Most of the below comes from the [functional-spec](https://github.com/stan-dev/stan/wiki/Functional-Spec:-Sparse-Matrix-Data-Types) for sparse matrices. -A sparse matrix/vector differs from a standard matrix/vector only in the underlying representation of the data. +A sparse matrix/vector differs from a standard matrix/vector in the underlying representation of the data and that, at least for the initial design, no explicit indexing is allowed within a sparse matrix. + + +Sparse matrices dimensions are defined by the dimensions of the entire matrix as well as the elements that are nonzero. Just like how you cannot assign to a matrix of different dimensions you cannot assign a sparse matrix with one sparsity pattern to another with a separate sparsity pattern. The example below shows illegal assignment. + +```stan +sparse_matrix[nz_row, nz_col, N, M] A; + +sparse_matrix[nz_row_alt, nz_col_alt, N, M] B; + +sparse_matrix B = A; // illegal! +``` + +For sparse matrices that are used in inversion and factorization a common optimization technique is to reorder the rows and columns such that the structure is easier for algorithms to traverse. Upon creation of square sparse matrices it may be beneficial to order them once after initial construction. Below we go over how sparse matrices can be constructed and operated on within each stan block. ## Data -Sparse matrix types in the Stan language can be constructed in the data block via [Coordinate List](https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)) notation using the row and column sizes, non-empty row/column indices, and values for those index positions. The non-zero (NZ) row and column indices are brought in as bounds in the `<>`. Setting these as bounds expands the definition of what can go into `<>` and should be discussed more. +Sparse matrix types in the Stan language can be constructed in the data block via [Coordinate List](https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)) notation using the row and column sizes, non-empty row/column indices, and values for those index positions. The non-zero (NZ) row and column indices are brought in as bounds in the `[]`. ```stan data { @@ -56,11 +69,12 @@ int K; // number non-empty int nz_row_ind[K]; // Non-empty row positions int nz_col_ind[K]; // Non-empty col positions -sparse_matrix[N, M] A +sparse_matrix[nz_row_ind, nz_col_ind, N, M] A } ``` With proper IO the data sparsity pattern can be brought in automatically via a json or rdump list of lists. + ```stan data { int N; // Rows @@ -69,7 +83,7 @@ sparse_matrix[N, M] A } ``` -Though not necessary, we can also add sparse vectors. +Though not in the initial implementation, we can also add sparse vectors. ```stan int N; // Rows @@ -108,18 +122,14 @@ Sparse matrices in this block can be defined dynamically and declared such as ```stan transformed data { // Could construct here as well -sparse_matrix[N, M] A = - to_sparse_matrix(N, M, nz_col_ind, nz_col_ind, vals); +sparse_matrix[nz_row_ind, nz_col_ind, N, M] A = + to_sparse_matrix(nz_row_ind, nz_col_ind, N, M, vals); // Linear Algebra is cool -sparse_matrix[N, N] C = A * A'; -// For data this is fine -C[10, 10] = 100.0; +sparse_matrix C = A * A'; } ``` -The assignment operation `C[10, 10] = 100.0;` works fine with Eigen as the implementation leaves room in the arrays for quick insertions. Though because the rest of Stan assumes the amount of coefficients are fixed this should be the only block where sparse matrix access to elements defined as zero valued in the bounds should be allowed. - Because the sparsity pattern is given in the bounds `<>` the above multiplication result `C` will have it's own sparsity pattern deduced from the result of `A * A'`. This is a concern of Eigen and Stan-math and I'm pretty sure would not need anything particular in the stan compiler. ## Parameters, Transformed Parameters, and Generated Quantities @@ -140,6 +150,10 @@ transformed parameters { The size and non-zero indices for sparse matrices in the parameter block must be from either the data block or transformed data block. This is because Stan's I/O and posterior analysis infrastructure assumes the same sparsity pattern in each iteration of the model. +**NOTE** + +After speaking with Dan Simpson him and Aki would not like sparse matrices to be available in params unless they are used in temporary scopes (at least that's what I have written in my notes so apologies if I wrote that down incorrectly). I'm not sure about this and think we should discuss it. + ## Helper Functions We can also include helper functions to extract the sparsity pattern's row and column information. @@ -182,8 +196,12 @@ Map> sm1(rows, cols, nnz, outerIndex, innerIndices, values) Map> sm2(rows, cols, nnz, outerIndex, innerIndices, values); ``` +For the initial implementation data will be read in via a for loop and insertion method. + ## Stan Math +To preserve the ordering of a matrix for factorization and inversion we should use `EIGEN_SPARSEMATRIX_BASE_PLUGIN` to store the sparsity structure. + ### Templating Sparse matrices can be supported in Stan-math by either moving to more thorough templating along with pseudo-concepts in the metaprogramming or by having separate methods for Sparse Matrices. From 2cc059500a516bc88189005de356bb3d0241f22f Mon Sep 17 00:00:00 2001 From: stevo15025 Date: Thu, 26 Sep 2019 23:42:15 +0300 Subject: [PATCH 12/13] update with Dan's example code --- designs/0004-sparse-matrices.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index 7422aaf..5d92b8d 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -164,6 +164,13 @@ int K = num_nz_elements(x); matrix[K, 3] = get_nz_elements(x); ``` +There can also be methods to bind sparse matrices together by rows or columns + +```stan +sparse_matrix G = rbind(A,B); +sparse_matrix F = cbind(A,B); +``` + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation From 6b7e894818c3149b3d3dc9b7dc8ac6f02b616818 Mon Sep 17 00:00:00 2001 From: stevo15025 Date: Thu, 26 Sep 2019 23:44:41 +0300 Subject: [PATCH 13/13] grammatical update --- designs/0004-sparse-matrices.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/designs/0004-sparse-matrices.md b/designs/0004-sparse-matrices.md index 5d92b8d..d0eefa3 100644 --- a/designs/0004-sparse-matrices.md +++ b/designs/0004-sparse-matrices.md @@ -90,7 +90,7 @@ int N; // Rows int K; // number non-empty int nz_row_ind[K]; // Non-empty row positions // Can we do this? -sparse_vector[N] B; +sparse_vector[nz_row_ind, N] B; ``` The above sparse matrix example with values @@ -122,7 +122,7 @@ Sparse matrices in this block can be defined dynamically and declared such as ```stan transformed data { // Could construct here as well -sparse_matrix[nz_row_ind, nz_col_ind, N, M] A = +sparse_matrix[nz_row_ind, nz_col_ind, N, M] A = to_sparse_matrix(nz_row_ind, nz_col_ind, N, M, vals); // Linear Algebra is cool @@ -139,7 +139,7 @@ Parameters can be defined as above for data or deduced from the output of other ```stan parameters { // Defining sparse matrices in parameters needs the non-zero elements - sparse_matrix[N, M] foo; + sparse_matrix[nz_row_ind, nz_col_ind, N, M] foo; } transformed parameters { // Non-zero elements are deduced by the operation on x