Sparse Matrices #8

SteveBronder · 2019-08-26T10:13:42Z

Sparse matrix support in the stan language and backends.

Summary

There has been much discussion about sparse matrices in Stan. This design doc brings together the discusions on how to implement them at the language, IO, and math levels. The below gives a TL;DR for each section.

Language

There will be a new sparse_matrix type with the non-zero (NZ) sparsity structure defined as bounds.*

sparse_matrix<nz_rows=nz_row_ind, nz_cols=nz_col_ind>[N, M] A;

bounds make specifying the sparsity optional so that the sparsity pattern can be deduced under the hood for algebra etc. at the stan math level.

sparse_matix[N, N] B = A * A';

I/O

Sparse matrices come in lists of lists from the json or rdump

Stan math

We can either do a big refactoring to simplify the codebase or include specializations for the functions that take sparse matrices.

* I personally prefer the attribute style mentioned in the alternatives section, but Dan and Aki have both expressed interest in the <> style and did not like the attributed style. While it's only an N of 2 I like to think the user is right in what is aesthetically pleasing to them.

…-docs into spec/sparse-matrices

SteveBronder · 2019-08-26T10:14:23Z

I had this open on a separate draft Pr and am going to bring over some of the discussion

SteveBronder · 2019-08-26T10:18:45Z

designs/0004-sparse-matrices.md

+
+## Parameters, Transformed Parameters, and Generated Quantities
+
+Parameters can be defined as above for data or deduced from the output of other functions.


@seantalts

We don't have type inference yet in Stan - I think so far declarations will all need to be of the same form... not sure. Might want to make an open item of that in the design doc.

@SteveBronder

I think so far declarations will all need to be of the same form

One thing I'm a bit worried by is that users would then have to know the non-zero elements of the Cholesky or any sparse matrix output.

When I'm thinking about this, we need to specify the nonzero rows / columns when we are writing data into a sparse matrix via triplets. But the Eigen declaration for sparse matrices is really just Eigen::SparseMatrix, so it doesn't really need to know the nonzero elements. So doing something like

Eigen::SparseMatrix<double> A = inv_solver.solve(B);

Will not need to know the non-zero elements.

Does it make sense what I'm getting at here? i.e. that specifying the non-zero elements is much more for Eigen matrix initialization than it is for the math library.

@seantalts

Yeah, that makes sense. It's very good that code generation can support not knowing the sparsity ahead of time for Eigen in the cases where it is immediately assigned to something with Eigen sparse structure. It makes it a little more difficult for the compiler to check that this is true, though probably still mostly possible. We'd need to basically add sparsity to our type system somehow, possibly just as two new Eigen types, SparseMatrix and SparseVector (there are no SparseRowVectors that we care about, right?). And then we can annotate all of the Stan Lib functions with additional sparsity type signatures so we know when a return value will be sparse given the input types. And then the compiler can check that we don't need the sparsity on the declaration.

That brings up another awkward point - do we need to check the sparsity structure if someone creates a full declaration but then assigns to some function output? Or do we only allow the full sparsity declaration in the data and parameters blocks?

@dpsimpson

The sparsity pattern has to be a known-at-compile-time deterministic function of the data. Thankfully the Eigen sparse Cholesky (etc) does a two-phase sweep that can be separated out. Phase one is called a "symbolic factorization", which works out the sparsity pattern and does all the appropriate allocation. This is 100% a thing that can be used to infer that sparsity structure.

Some possible problems would be if someone wanted something like (apologies for the messy pseudocode)

sparse_matrix A; // structure given sparse_matrix B; // structure given sparse_matrix L; // structure to be inferred real c; // data L = chol(A + c*B); // A problem if c = 0!

That brings up another awkward point - do we need to check the sparsity structure if someone creates a full declaration but then assigns to some function output? Or do we only allow the full sparsity declaration in the data and parameters blocks?

Yeah. This is the issue here. I had a shot at it here: https://aws1.discourse-cdn.com/standard14/uploads/mc_stan/original/2X/1/13fda4102c8f48e5aadbf0fbe75d3641a187d0a3.pdf

Essentially it would allow for two different declarations of sparse matrices:

sparse_matrix[N, M, nz_foo_row_index, nz_foo_col_index] A; sparse_matrix[foo(A)] L;

where foo is a function that returns a sparse matrix. Steve's version doesn't have the explicit [foo(A)] bit and just has sparse_matrix L = foo(A) which is probably more in line with the idea of bringing declaration closer to use.

This will work if we have a requirement that every function foo_sparse in math that returns a sparse matrix has a foo_sparse_pattern_ variant that takes only data. (This is the same pattern Eigen uses for sparse matrix factorizations.)

@seantalts Would it be possible for the new compiler to find these things and just execute them once even though it would be in the model block? Because when this is declared in the parameter block, the pattern is data, while the "values" are vars (Does this question make sense?)

A thing that would need to be avoided is

// Declare B sparse_matrix A = foo_sparse(B); A = bar_sparse(B);

if foo and bar return different patterns. This means that the compound declaration has to be "special" (ie call the foo_sparse_pattern_ function to declare the sparse matrix and then call foo_sparse to get the values), while the ordinary bar_sparse call doesn't call bar_sparse_pattern_

SteveBronder · 2019-08-26T10:20:09Z

designs/0004-sparse-matrices.md

+### Keeping Permutation Matrix from Cholesky
+
+`SimplicalCholeskybase` keeps the permutation matrix, when the user does a cholesky decomposition we can pull out this permutation matrix and keep it to use in the next iteration. We do this through `EIGEN_SPARSEMATRIX_BASE_PLUGIN`, adding the permutation matrix to the input matrix. This adds a bit of state, but assuming the sparse matrices are fixed in size and sparsity this should be fine.
+


@seantalts

Simplical?
Also we might want to think about the Math library as being stateless and thread state through everywhere with the compiler going forward... can talk this over in person, not sure it's fully formed in my head yet.

@SteveBronder

Simplical?

That's the words Eigen uses

Also we might want to think about the Math library as being stateless and thread state through everywhere with the compiler going forward... can talk this over in person, not sure it's fully formed in my head yet.

Yeah I worry about adding state like this. Another options is to do the sorting immediately when a sparse matrix is created

@dpsimpson

We could assume that every square sparse matrix is going to be decomposed and just keep a permutation around at all times. Eigen's AMD re-ordering works regardless of symmetry so it's all good. (It's also cheap to compute relative to everything else.) This would make it a deterministic function of the data that is stored in the same object as the rest of the sparse matrix (assuming we make a simple container class around Eigen::SparseMatrix) so that should keep things stateless and thread safe

SteveBronder · 2019-08-26T10:21:02Z

designs/0004-sparse-matrices.md

+
+Tensorflow uses the same data storage schema inside of [`tf.sparse.SparseTensor`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/sparse/SparseTensor) with a limited amount of specific [methods](https://www.tensorflow.org/api_docs/python/tf/sparse). It does not seem that they have [Sparse Cholesky support](https://github.com/tensorflow/tensorflow/issues/15910).
+
+It seems like OpenAI has methods like matrix multiplication for block sparse matrices (Sparse matrices with dense sub-blocks) in tensorflow available on [github](https://github.com/openai/blocksparse).


@seantalts

Raises a question about our need (or lack thereof) for blocked operations too maybe?

@SteveBronder

yes @dpsimpson has brought this up as well as another person in my lab

@dpsimpson

We don't need blocked operations for sparse matrices. They're only efficient for sparse matrices with dense sub-blocks and we don't have any algorithms that take advantage of that structure. It's a real speed up on some sense (replace level 1 with level 2 and 3 BLAS operations lets you be more cache efficient for example) but it would be a weird thing to target on the first go

UGA animal breeding have a sparse-dense matrix package YAMS - it’s Fortran though. See slides at http://nce.ads.uga.edu/wiki/lib/exe/fetch.php?media=uga2018_yams.pdf

Neat thanks! I'll try to find their code online

SteveBronder · 2019-08-26T10:21:27Z

designs/0004-sparse-matrices.md

+(@sparse(nz_rows=nz_rows_ind, nz_cols=nz_cols_ind), @opencl) Matrix[N, M] A;
+```
+
+Another alternative would be to extend the `csr_*` functions, though this requires a lot of recomputation.


@dpsimpson

The csr_* functions are not good and should go away. Also it's inconsistent to have row oriented sparse-matrices and column oriented dense matrices.

SteveBronder · 2019-08-26T10:22:25Z

designs/0004-sparse-matrices.md

+
+# Appendix: (Eigen Sparse Matrix formats)
+
+From the Eigen [Sparse Matrix Docs](https://eigen.tuxfamily.org/dox/group__TutorialSparse.html), Eigen matrices use a Compressed Column Scheme (CCS) to represent sparse matrices.


@dpsimpson

One very important but minimally documented part of Eigen's sparse libraries is that it tries to remove "lucky" zeros (aka the symbolic step computes the structural non-zeros of A*B, but some of the elements that could be non-zero end up being zero when the product is done. These are the "lucky" zeros that eigen removes). This is very bad for us as it changes the dimension of the resulting std::vector. The key function in Eigen is prune and a quick grep through the library looks like it's only used for matrix-matrix products. So this is a danger zone!

@SteveBronder

I'm looking at godbolt and it looks like Eigen keeps the sparse matrix values for multiplication even when they are zero?

https://godbolt.org/z/JJsyp0

Which is odd because looking at their unary_evaluator it looks like you are right and they do a pruning mechanism?

https://github.com/eigenteam/eigen-git-mirror/blob/c399f06d3f3a77b5bd2a74c11e635e4952b72a4b/Eigen/src/SparseCore/SparseProduct.h#L139

It looks like the pruning happens here. Maybe we can fiddle the value of epsilon to turn off pruning?

seantalts · 2019-08-28T19:05:52Z

I think the doc could use a little discussion on whether sparse matrices deserve their own type in Stan. Some starter points off the top of my head:
pros:

can add functions dealing with sparsity in an adhoc fashion to the math library slowly over time.
can prevent users from doing inefficient things with sparse matrices, like using it in a function that would just convert it to dense first.

cons:

Users don't necessarily care that their matrices are stored in sparse or dense form; they want efficient inference. I think there's some wisdom in trying to keep a 1-to-1 correspondence between a Stan model and the mathematical representation of it, if we can keep computational issues behind the silk screen.
I think everyone universally regrets adding both real[] and vector to Stan, which is a pretty similar use-case in that both are representing the same mathematical object just with slightly different computational properties. Keeping users from doing linear algebra operations on real[] doesn't seem to have served any real purpose.

The "what do users want" question difficult to test because anyone who is already aware of sparse matrices will probably consider them necessary as first-class objects, but there's a whole class of Stan user who would just like their model to run faster and wouldn't care if we switched to sparse under the hood. I wonder if @lauren can offer advice here?

bob-carpenter · 2019-08-28T19:42:50Z

On Aug 28, 2019, at 3:05 PM, seantalts ***@***.***> wrote: I think the doc could use a little discussion on whether sparse matrices deserve their own type in Stan.

If they don't have their own type, how does the decision to represent a matrix get handled? At the very least, we'll need a new I/O format so that we don't need quadratic memory in and out. What happens when we print or save parameters to a CSV file during sampling? Does it do one thing for sparse and one thing for dense? How will the user be able to figure out what's going to happen? What's going to ensure we have the same parameters every iteration if there's no way to type the sparsity pattern? What about row vectors? That's a third way to represent a sequence of real values. Then at higher orders, we get the same issue with array of vectors (type of vectorized outcomes for multinormal), array of row vectors, a matrix, and a 2D array? Now we have four Stan types for the same abstract 2D structure. Is this at all like integer vs. double? We can assign integer to double (perhaps with loss of precision, and vice-versa with rounding), but mathematically we tend to think of them differently. That difference doesn't make much difference in stats other than that there are naturally integer disributions like Poisson or Bernoulli. How do you think about integer vs. double? That's a common distinction in programming languages that languages like R decided to sweep under the rug. It's there in the guts of R implicitly, as well as a distinction from bool; I imagine something similar is being suggested for Stan, where the decision is implicit on the framework's part (in R, you can modify and inspect, but I think the proposal for Stan is to literally collapse the types from the user's perspective so that there's no way to tell if a matrix is sparse or dense).

Some starter points off the top of my head: pros: • can add functions dealing with sparsity in an adhoc fashion to the math library slowly over time. • can prevent users from doing inefficient things with sparse matrices, like using it in a function that would just convert it to dense first.

It's easy enough to add converters all over the place that convert sparse to dense when necessary. But that won't give us the right result. We can't just remove zero values, as we may need them for autodiff (this is a problem with current Eigen implementations). In particular, we need to be sure that we get the same sparsity in output for parameters at each iteration. If we just willy-nilly promote to dense, that won't be the case or might not be predictable.

cons: • Users don't necessarily care that their matrices are stored in sparse or dense form; they want efficient inference. I think there's some wisdom in trying to keep a 1-to-1 correspondence between a Stan model and the mathematical representation of it, if we can keep computational issues behind the silk screen.

They will very much care during I/O. What mathematical representation is supposed to be one-one with a Stan program? The mapping from programs to posterior densities is many to one.

• I think everyone universally regrets adding both real[] and vector to Stan, which is a pretty similar use-case in that both are representing the same mathematical object just with slightly different computational properties. Keeping users from doing linear algebra operations on real[] doesn't seem to have served any real purpose.

I don't regret it! It's a natural consequence of having primitive types int, real, vector, row_vector and matrix and closing under taking arrays. If you don't allow it, it complicates every definition and lots of code where it has to be excluded from the natural definition. I'm not saying it's impossible, just that it's going to add its own set of complications. I don't think of arrays and vectors as the same mathematical object any more than I think of a complex number as a pair of real numbers, despite their being the same abstract data type in terms of data. I made the same initial distinction as is made in Eigen, namely that linear algebra operations apply to matrix types. It's also clunky in Eigen. But it is worth keeping in mind that the major C++ matrix lib did find it useful to make that distinction. They're even stricter, I think, in limiting elementwise operations to arrays. What about row vectors? That's a third way of representing a sequence of real numbers. The reason it's nice to distinguish these types is that multiplication signatures are real(row_vector, vector) and matrix(vector, row_vector). I think the argument is that real arrays are redundant because there's nothing you can do with an array you can't do with a vector or row_vector instead. That may be true now so that we wouldn't lose any functionality getting rid of them. I don't think it would fail anywhere.

The "what do users want" question difficult to test because anyone who is already aware of sparse matrices will probably consider them necessary as first-class objects, but there's a whole class of Stan user who would just like their model to run faster and wouldn't care if we switched to sparse under the hood. I wonder if @lauren can offer advice here?

Lauren is @lauken13, but I'm answering via email, so my @ won't work. Dan Simpson probably has the most experience with this on the project, though I'd suggest asking Andrew, too.

dpsimpson · 2019-08-28T19:43:25Z

If it’s not a type it might as well not be done. It would be extremely difficult to use. Akin to not having a dense matrix type and making users deal with flattened representations. Not sure why any function would cast a sparse matrix to a different type automatically rather than just throw a “not defined for type sparse” warning. Struggle to think of a use case for a sparse-to-dense cast. What’s your use case. No earthly idea how to “pretend” sparse matrices are just matrices that are different in hidden under the hood ways. Neither matlab or R does this.

…

On Wed, 28 Aug 2019 at 15:05, seantalts ***@***.***> wrote: I think the doc could use a little discussion on whether sparse matrices deserve their own type in Stan. Some starter points off the top of my head: pros: 1. can add functions dealing with sparsity in an adhoc fashion to the math library slowly over time. 2. can prevent users from doing inefficient things with sparse matrices, like using it in a function that would just convert it to dense first. cons: 1. Users don't necessarily care that their matrices are stored in sparse or dense form; they want efficient inference. I think there's some wisdom in trying to keep a 1-to-1 correspondence between a Stan model and the mathematical representation of it, if we can keep computational issues behind the silk screen. 2. I think everyone universally regrets adding both real[] and vector to Stan, which is a pretty similar use-case in that both are representing the same mathematical object just with slightly different computational properties. Keeping users from doing linear algebra operations on real[] doesn't seem to have served any real purpose. The "what do users want" question difficult to test because anyone who is already aware of sparse matrices will probably consider them necessary as first-class objects, but there's a whole class of Stan user who would just like their model to run faster and wouldn't care if we switched to sparse under the hood. I wonder if @lauren <https://github.com/lauren> can offer advice here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=ADRICBWJGDIWECK6LGTGTJDQG3EBFA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MEZRQ#issuecomment-525880518>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADRICBRHGWMVFTWJ3MXPWHLQG3EBFANCNFSM4IPODLXA> .

bob-carpenter · 2019-08-28T20:09:50Z

Not sure why any function would cast a sparse matrix to a different type
automatically rather than just throw a “not defined for type sparse”
warning.

The motivation would be to make sure user programs with the right math don't break because they got the data types wrong. We see that all the time now because our users aren't programmers.

Of course, if we have a function that takes a sparse matrix in and then outputs a dense matrix, that's going to be considered broken by most people who use sparse matrices.

No earthly idea how to “pretend” sparse matrices are just matrices that are
different in hidden under the hood ways. Neither matlab or R does this.

No pretending necessary. Mathematically, a sparse matrix is just a dense matrix with a lot of zeros (no surprise to anyone there, I hope). So the user just treats everything as a matrix (except, perhaps, during I/O) and leaves the system to figure out where things should be dense or sparse under the hood. There's a question of feasibility, but we're not violating any laws of computation here.

R doesn't have a built-in sparse matrix type (though there is a contributed Matrix package with sparsity).

MATLAB does have built-in sparsity and uses a separate type system.

https://www.mathworks.com/help/matlab/sparse-matrices.html

Under the hood, it just represents dense matrix types as an array and there's no specific vector or row-vector types, just matrix types.

https://www.mathworks.com/help/matlab/matrices-and-arrays.html

Maybe we should've done that. I really wanted to have row vector times vector to be a scalar and really wanted matrix times vector to be a vector. In MATLAB, everything's a matrix and they just use vector constructors. In MATLAB, [1 2 3] is a 1 x 3 matrix (i.e., a row vector) whereas [1; 2; 3] is a 3 x 1 matrix and [1 2; 3 4] is a 2 x 2 matrix.

Consider what R does for integer vs. real vs. boolean distinctions:

> is.integer(1)
[1] FALSE

> is.integer(as.integer(1))
[1] TRUE

I find this kind of behavior in R very confusing. Same with transposing a vector, which turns it into a matrix.

> c(1, 2, 3)
[1] 1 2 3

> t(c(1, 2, 3))
     [,1] [,2] [,3]
[1,]    1    2    3

It gets really fun trying to predict the result of mixed operations

> t(t(c(1, 2, 3))) == c(1, 2, 3)
     [,1]
[1,] TRUE
[2,] TRUE
[3,] TRUE

I really don't like that t(t(x)) != x.

betanalpha · 2019-08-28T20:09:58Z

I don't think of arrays and vectors as the same mathematical object any more than I think of a complex number as a pair of real numbers, despite their being the same abstract data type in terms of data.

I agree — arrays are a general container for any type and vectors and row vectors are linear algebraic objects with well-defined operations one can apply on them. If one defines a type by its admissable operations then the similarity between real[] and vector/row_vector, not to mention matrix[1, N] and matrix[N, 1]!, is only a superficial one. I think lots of the user confusion comes from people accustomed to R and its dynamic/weak type system. They expect the structural similarity to carry over to functional similarity and get frustrated when it doesn’t. At the very least this is something that should be emphasized very strongly in introductory material so that users are at least aware for the issue.

Not sure why any function would cast a sparse matrix to a different type automatically rather than just throw a “not defined for type sparse” warning. Struggle to think of a use case for a sparse-to-dense cast. What’s your use case.

One that tops to my mind is incomplete implementation of sparse functionality. For example if I wanted to compute a matrix exponential but no one has yet implemented a sparse matrix exponential then can I no longer proceed or can I cast my sparse matrix to a dense one and proceed, albeit with the more expensive calculation.

No earthly idea how to “pretend” sparse matrices are just matrices that are different in hidden under the hood ways. Neither matlab or R does this.

There is potential for some expression template methods that would lazily evaluate expressions containing dense and sparse matrices and figure out how to propagate the sparsity only at that final evaluation, but that would be a pain to implement. I think that the only useful “hidden” abstraction would be hiding sparse matrices entirely and just have functions that take in sparse inputs and return dense or low-dimensional outputs, like quadratic forms, determinants, or densities.

dpsimpson · 2019-08-28T20:16:35Z

On Wed, Aug 28, 2019 at 16:09 Bob Carpenter ***@***.***> wrote: Not sure why any function would cast a sparse matrix to a different type automatically rather than just throw a “not defined for type sparse” warning. The motivation would be to make sure user programs with the right math don't break because they got the data types wrong. We see that all the time now because our users aren't programmers. Of course, if we have a function that takes a sparse matrix in and then outputs a dense matrix, that's going to be considered broken by most people who use sparse matrices.

This would be extremely bad design. It should throw and informative error. Otherwise we’ll spend our whole time explaining to people why they added one line that had an implicit cast and their code no longer worked because it ran out of memory.

…

No earthly idea how to “pretend” sparse matrices are just matrices that are different in hidden under the hood ways. Neither matlab or R does this. No pretending necessary. Mathematically, a sparse matrix is just a dense matrix with a lot of zeros (no surprise to anyone there, I hope). So the user just treats everything as a matrix (except, perhaps, during I/O) and leaves the system to figure out where things should be dense or sparse under the hood. There's a question of feasibility, but we're not violating any laws of computation here. R doesn't have a built-in sparse matrix type (though there is a contributed Matrix package with sparsity). MATLAB does have built-in sparsity and uses a separate type system. https://www.mathworks.com/help/matlab/sparse-matrices.html Under the hood, it just represents dense matrix types as an array and there's no specific vector or row-vector types, just matrix types. https://www.mathworks.com/help/matlab/matrices-and-arrays.html Maybe we should've done that. I really wanted to have row vector times vector to be a scalar and really wanted matrix times vector to be a vector. In MATLAB, everything's a matrix and they just use vector constructors. In MATLAB, [1 2 3] is a 1 x 3 matrix (i.e., a row vector) whereas [1; 2; 3] is a 3 x 1 matrix and [1 2; 3 4] is a 2 x 2 matrix. Consider what R does for integer vs. real vs. boolean distinctions: is.integer(1) [1] FALSE is.integer(as.integer(1)) [1] TRUE I find this kind of behavior in R very confusing. Same with transposing a vector, which turns it into a matrix. c(1, 2, 3) [1] 1 2 3 t(c(1, 2, 3)) [,1] [,2] [,3] [1,] 1 2 3 It gets really fun trying to predict the result of mixed operations t(t(c(1, 2, 3))) == c(1, 2, 3) [,1] [1,] TRUE [2,] TRUE [3,] TRUE I really don't like that t(t(x)) != x. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=ADRICBQAWMUIKLRCGWOLR5TQG3LRBA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MKH4A#issuecomment-525902832>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADRICBVOM3AMAFQXUZU5GFDQG3LRBANCNFSM4IPODLXA> .

dpsimpson · 2019-08-28T20:31:51Z

This would be extremely bad design. It should throw and informative error. Otherwise we’ll spend our whole time explaining to people why they added one line that had an implicit cast and their code no longer worked because it ran out of memory.

Just to be concrete, the model Mitzi used in the ICAR case study uses an extremely small sparse matrix. A cast to dense would cast from a vector of 2055 elements to a vector (flattened column major matrix) of 709 * 709 = 502681 elements. So this is very different to casting from real[] to vector[], which are roughly the same size in memory (modulo possible locality stuff and maybe some small class overhead).

dpsimpson · 2019-08-28T20:33:18Z

One that tops to my mind is incomplete implementation of sparse functionality. For example if I wanted to compute a matrix exponential but no one has yet implemented a sparse matrix exponential then can I no longer proceed or can I cast my sparse matrix to a dense one and proceed, albeit with the more expensive calculation.

There's no way to compute a sparse matrix exponential and get a sparse matrix out (because maths). So that would be easy to specialize.

betanalpha · 2019-08-28T20:46:09Z

There's no way to compute a sparse matrix exponential and get a sparse matrix out (because maths). So that would be easy to specialize

Sure, the question is what is the user experience be if it hadn’t been specialized. I feel like at some point users will have some corner applications where they need to finish a calculation using dense methods. Given the potential memory blowout from naively going from a sparse matrix to a giant dense matrix it seems prudent to not do this implicitly but an explicit casting function that maps sparse to dense (and potentially even throws with an informative error if the dimensions are too big) might be a good compromise initially.

dpsimpson · 2019-08-28T20:53:43Z

I can't disagree more. These aren't incomplete or unimplemented features, these are features that should not exist. For scale, it's like when people say that we should have a Gibbs sampler "just in case". For any example where you should use sparse matrices, a to_dense will break everything. A model with sparse matrices that needs a matrix exponential (or an eigendecomposition or anything that is naturally dense) is just not a realistic case for using a sparse matrix and should throw an error. Either it's a different type or it will basically not work for 90% of people who use it (because then you need to know all of the internals to work out how to use it well rather than just having to accept "I can't do xxxx with a sparse matrix").

…

On Wed, 28 Aug 2019 at 16:46, Michael Betancourt ***@***.***> wrote: > There's no way to compute a sparse matrix exponential and get a sparse matrix out (because maths). So that would be easy to specialize > Sure, the question is what is the user experience be if it hadn’t been specialized. I feel like at some point users will have some corner applications where they need to finish a calculation using dense methods. Given the potential memory blowout from naively going from a sparse matrix to a giant dense matrix it seems prudent to not do this implicitly but an explicit casting function that maps sparse to dense (and potentially even throws with an informative error if the dimensions are too big) might be a good compromise initially. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=ADRICBV7TPPYHNZANOR5X3DQG3PZHA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MNL6I#issuecomment-525915641>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADRICBXPNDSPRW55Y54MRILQG3PZHANCNFSM4IPODLXA> .

seantalts · 2019-08-28T21:23:41Z

Very cool, thank you all for the discussion! My one request for this PR would be to capture a summary of this in the "Rationale and Alternatives" heading.

Another possible point for discussion - I think adding GPU matrix support to the Stan language is almost exactly the same, modulo some degree of difference on the break-even matrix size where converting to a dense [cpu] matrix is so inefficient as to be impossible and should be outlawed. Meaning that in GPU calculation land I think it will actually be somewhat common to want to go back and forth as we flesh out the GPU-aware implementations. So I might propose that we try to treat those the same way, meaning if we go with the full new-types approach we'd have all new cl(_cholesky_factor_corr | _cov | cholesky_factor_cov | ... )_matrix types as well (luckily someone said you'd never want to put a sparse matrix on a GPU or we'd have that combinatoric explosion as well).

betanalpha · 2019-08-28T21:27:37Z

The challenge then is designing a spare matrix type, and the accompanying documentation, that clearly communicates those limitations. At best I think that most users will interpret sparse matrices as “special kinds of matrices that sometimes admit much faster calculations” and not “special kinds of matrices that do not play with the standard matrices”. I understand the technical utility of keeping a sparse type completely encapsulated in its own world, but those not as familiar with sparse linear algebra as yourself will need something to guide them beyond the interpretational baggage they bring with them, for example very careful compiler error messages, extensive documentation, etc.

…

On Aug 28, 2019, at 4:53 PM, dpsimpson ***@***.***> wrote: I can't disagree more. These aren't incomplete or unimplemented features, these are features that should not exist. For scale, it's like when people say that we should have a Gibbs sampler "just in case". For any example where you should use sparse matrices, a to_dense will break everything. A model with sparse matrices that needs a matrix exponential (or an eigendecomposition or anything that is naturally dense) is just not a realistic case for using a sparse matrix and should throw an error. Either it's a different type or it will basically not work for 90% of people who use it (because then you need to know all of the internals to work out how to use it well rather than just having to accept "I can't do xxxx with a sparse matrix"). On Wed, 28 Aug 2019 at 16:46, Michael Betancourt ***@***.***> wrote: > > There's no way to compute a sparse matrix exponential and get a sparse > matrix out (because maths). So that would be easy to specialize > > > Sure, the question is what is the user experience be if it hadn’t > been specialized. > > I feel like at some point users will have some corner applications > where they need to finish a calculation using dense methods. > Given the potential memory blowout from naively going from a > sparse matrix to a giant dense matrix it seems prudent to not do > this implicitly but an explicit casting function that maps sparse to > dense (and potentially even throws with an informative error if > the dimensions are too big) might be a good compromise initially. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#8?email_source=notifications&email_token=ADRICBV7TPPYHNZANOR5X3DQG3PZHA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MNL6I#issuecomment-525915641>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ADRICBXPNDSPRW55Y54MRILQG3PZHANCNFSM4IPODLXA> > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=AALU3FTSHLWCDSTEE6W35X3QG3QVRA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MOBNI#issuecomment-525918389>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALU3FUD3Z2XPJMOFQ4HOH3QG3QVRANCNFSM4IPODLXA>.

seantalts · 2019-08-28T21:30:20Z

Also curious about the ICAR model - is that a good representative example? It seems to just want to store an adjacency matrix efficiently, but otherwise to not use any sparse computation at all...? I thought someone at StanCon had said we would want to completely disallow element-wise indexing into a sparse matrix, which would appear to disallow this use-case... I am assuming I'm confused here and misheard probably multiple parts of this so please correct me with exuberance.

I understand the technical utility
of keeping a sparse type completely encapsulated in its own
world, but those not as familiar with sparse linear algebra as
yourself will need something to guide them beyond the
interpretational baggage they bring with them, for example
very careful compiler error messages, extensive documentation,
etc.

100% agreed - I still don't really get why you would disallow to_dense in the type system instead of just throwing a runtime warning like we do for deepcopies and other inefficient stuff. But I'm happy to defer to folks who actually write models that need sparse computation. I still dream of a world in which folks don't even need to learn about sparse matrix computations in order to experience their benefit, but perhaps that's a Stan 3 (or 4) thing.

rok-cesnovar · 2019-08-29T07:46:35Z

designs/0004-sparse-matrices.md

+  {'col': 1, 'row': 20, 'val': 1.85},
+  {'col': 11, 'row': 3, 'val': 2.71},
+  ...
+}


Not sure we need to be nit-picking here, but this should be written as

X = [ ... {'col': 1, 'row': 20, 'val': 1.85}, {'col': 11, 'row': 3, 'val': 2.71}, ... ]

Currently JSON objects are only supported on the top level (Cmdstan at least), so we will need to remove that restriction.

rok-cesnovar · 2019-08-29T07:54:20Z

designs/0004-sparse-matrices.md

+
+## I/O
+
+Data can be read in from json via a list of lists and the Rdump from the equivalent.


Should we also support that the input is in a dense matrix form in the input file but then the parser & data handler automatically constructs the sparse matrix, ignoring zeroes? We would need to somehow inform the parser to ignore zeroes during parsing, otherwise we would first create the dense matrix and then transform which defeats the purpose of introducing sparsity..

dpsimpson · 2019-08-29T12:28:49Z

I don’t think we should do this. Huge memory waste limits it to very small matrices. R/Python have easy to access sparse matrix support

…

On Thu, Aug 29, 2019 at 03:54 Rok Češnovar ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In designs/0004-sparse-matrices.md <#8 (comment)>: > +## Helper Functions + +We can also include helper functions to extract the sparsity pattern's row and column information. + +```stan +int K = num_nz_elements(x); +// Extract effectively a tuple representation of the sparse matrix. +matrix[K, 3] = get_nz_elements(x); +``` + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## I/O + +Data can be read in from json via a list of lists and the Rdump from the equivalent. Should we also support that the input is in a dense matrix form in the input file but then the parser & data handler automatically constructs the sparse matrix, ignoring zeroes? We would need to somehow inform the parser to ignore zeroes during parsing, otherwise we would first create the dense matrix and then transform which defeats the purpose of introducing sparsity.. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=ADRICBWSAUHGVDG6BSMBYSLQG56C7A5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCDB7KCI#pullrequestreview-281277705>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADRICBVDW2CB2RINAAZPNTLQG56C7ANCNFSM4IPODLXA> .

bob-carpenter · 2019-08-30T01:38:52Z

On Aug 28, 2019, at 9:56 PM, Andrew Gelman ***@***.***> wrote: Things I can do with a vector or matrix but not a 1-d or 2-d real array:

The question was whether the problem exists the other way around. Is there something you can do with a real[] that you can't do with a vector or row_vector?

- vector and matrix operations (multiplication etc)

That's intentional. It doesn't make sense to do matrix operations on arrays. You have to guess about orientation like R and it's a mess.

- various functions that are vectorized. For example, I think you can do theta ~ normal(0,1); if theta is a vector but not if it's a 1-d array.

Yes, that works if it's a 1D array. But theta ~ multi_normal(mu, Sigma) doesn't work if mu or theta are real[] types---they have to be vectors. I did it that way because they're considered as a whole in multi_normal. It wouldn't kill me to loosen this.

I might be wrong about that particular example, but there are other examples like that.

If you can ping this thread (or just me) when one comes up, that'd be great.

- various functions for subsetting, concatenation, etc., work for vectors and matrices but not arrays.

I'm not sure what you mean here. All the slicing works the same way. If what you mean is that you can't concatenate a real[] and vector, that's by design. What would the result be?

The main things I can do with a real array but not a vector or matrix: - pass it into an ode function

This is a real limitation. But this is the simplest possible to_array or to_vector conversion, right? We could generalize this, but I think we'll be going with an even better solution based on closures. These higher-order functions are special in that they're implemented very differently. We could generalize the signatures without much effort if you really find to_vector to be that onerous. But it'd have to wait until stanc3 gets rolled out as work on the old parser is frozen pending that release.

- easily change code from linear to logistic regression.

That doesn't seem like it should be so easy. The likelihood changes, you need to add a link function, and there's no more scale parameter. The constraints on y also change to be lower = 0 and upper = 1 from unconstrained.

(It's natural for me to write my linear regression data as vector[N] y; but then if I want to switch to logistic regression it becomes int y[N]; and then various other aspects of the code have to be changed cos I'm switching from vector to array.

This is because Stan distinguishes integer and real types (again, so it doesn't have to guess like R). I'm strongly opposed to having integer vector types because we're not doing number theory. As a programmer, I like that the types reflect the underlying data type rather than being implicit.

seantalts · 2019-08-31T21:45:21Z

Okay, to try to corral some of the conversation back, it seems like folks basically want to go with what our potential sparse matrix users would like to see in Stan's type system here, and so far the only person like that we have commenting is @dpsimpson. Dan's proposal is that we should have a separate type for every kind of matrix that could also be sparse and that we would only allow efficient operations to be allowed on them in the type system. I think we want to add some clarification for which types and operations those actually are concretely. Is it enough to add a single sparse_matrix type to Stan with the operations listed here? That's what's in this proposal now, just double checking that's still kosher.

If so, I think we need to change the design doc to add a different model as an exemplar model; the ICAR model would not benefit from sparse matrices as described in this proposal as we wouldn't be allowed to do element-wise access. Does anyone have another model that would benefit from the design in this proposal they can contribute?

seantalts · 2019-08-31T23:40:20Z

Or I may be wrong about ICAR being a bad fit - if there's a more efficient version possible once we have sparse matrices, relevant lines from that would be good to include in this design doc to show the before-and-after.

bob-carpenter · 2019-09-01T00:28:50Z

No, the ICAR isn't a good fit. Sparse GPs would be where I think the plan's to add sparse log determinant and sparse Cholesky factorization (on precision matrices, I think---you'd need to ask Dan!). You an also think smaller in terms of a big regression problem where the data matrix is sparse and the coefficient vector is dense. So we definitely need some mixed operations.

…

On Aug 31, 2019, at 7:40 PM, seantalts ***@***.***> wrote: Or I may be wrong about ICAR being a bad fit - if there's a more efficient version possible once we have sparse matrices, relevant lines from that would be good to include in this design doc to show the before-and-after. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

dpsimpson · 2019-09-01T00:31:14Z

Sparse GPs are a different thing that don’t involve sparse matrices. (I know!) So they aren’t a use case. Multivariate Gaussians with sparse precision matrices is the main use case. On Sat, Aug 31, 2019 at 20:28 Bob Carpenter <[email protected]> wrote:

…

No, the ICAR isn't a good fit. Sparse GPs would be where I think the plan's to add sparse log determinant and sparse Cholesky factorization (on precision matrices, I think---you'd need to ask Dan!). You an also think smaller in terms of a big regression problem where the data matrix is sparse and the coefficient vector is dense. So we definitely need some mixed operations. > On Aug 31, 2019, at 7:40 PM, seantalts ***@***.***> wrote: > > Or I may be wrong about ICAR being a bad fit - if there's a more efficient version possible once we have sparse matrices, relevant lines from that would be good to include in this design doc to show the before-and-after. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub, or mute the thread. > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=ADRICBX6I3WMD4VEN77NQCTQHMEEJA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5TXWVI#issuecomment-526875477>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADRICBUSTOWPUCA5TN4IE5LQHMEEJANCNFSM4IPODLXA> .

dpsimpson · 2019-09-01T01:01:39Z

A good use case along the lines of the ICAR that needs full linear algebra is this model https://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html (Note the case study doesn’t use the sparsity to do anything, so it’s just the model that’s relevant.). In this case you need the log determinant if a matrix with `var` values. You probably need to use a non-centred parameterization.

…

On Sat, Aug 31, 2019 at 20:30 Daniel Simpson ***@***.***> wrote: Sparse GPs are a different thing that don’t involve sparse matrices. (I know!) So they aren’t a use case. Multivariate Gaussians with sparse precision matrices is the main use case. On Sat, Aug 31, 2019 at 20:28 Bob Carpenter ***@***.***> wrote: > No, the ICAR isn't a good fit. Sparse GPs would be where I think the > plan's to add sparse log determinant and sparse Cholesky factorization (on > precision matrices, I think---you'd need to ask Dan!). > > You an also think smaller in terms of a big regression problem where the > data matrix is sparse and the coefficient vector is dense. So we definitely > need some mixed operations. > > > On Aug 31, 2019, at 7:40 PM, seantalts ***@***.***> > wrote: > > > > Or I may be wrong about ICAR being a bad fit - if there's a more > efficient version possible once we have sparse matrices, relevant lines > from that would be good to include in this design doc to show the > before-and-after. > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub, or mute the thread. > > > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#8?email_source=notifications&email_token=ADRICBX6I3WMD4VEN77NQCTQHMEEJA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5TXWVI#issuecomment-526875477>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ADRICBUSTOWPUCA5TN4IE5LQHMEEJANCNFSM4IPODLXA> > . >

bob-carpenter · 2019-09-01T13:58:38Z

What's the application of multivariate Gaussians with sparse precision matrices? I had wrongly supposed it was GPs. Setting aside generic sparsity, aren't there some GPs where the covariance matrix is structured like banded or something like that?

bob-carpenter · 2019-09-01T14:00:53Z

I thought the full CAR model was a GP. That's the kind of model where I thought we'd need the sparse log determinant.

…

On Aug 31, 2019, at 9:01 PM, dpsimpson ***@***.***> wrote: A good use case along the lines of the ICAR that needs full linear algebra is this model https://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html

betanalpha · 2019-09-01T15:22:22Z

Gaussian processes specify a Gaussian density through a structured covariance matrix which is inverted, while CAR models specify a Gaussian density through sparse precision matrix which does not need to be inverted. In the latter case the precision matrix is ill-posed and cannot be inverted into a precision matrix and the log determinant term isn’t an exact log determinant but something like a log determinant with a singularity subtracted away. The kind of random sparsity that sparse linear algebra is most effective for arises is natural when modeling precision matrices (which encode conditional independencies, like the adjacency graph of a CAR model) where as Gaussian processes yield covariances that aren’t sparse but rather dense and structured (Kronecker, Toeplitz, etc). At least that is my understanding.

…

On Sep 1, 2019, at 4:00 PM, Bob Carpenter ***@***.***> wrote: I thought the full CAR model was a GP. That's the kind of model where I thought we'd need the sparse log determinant. On Aug 31, 2019, at 9:01 PM, dpsimpson ***@***.***> wrote: > > A good use case along the lines of the ICAR that needs full linear algebra > is this model > https://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=AALU3FWWMA3IFIO4IGNYF5DQHPDJLA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5UC67Q#issuecomment-526921598>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALU3FULGMLDDPCKZLVZWXDQHPDJLANCNFSM4IPODLXA>.

gregorgorjanc · 2019-09-01T15:37:12Z

The point of Gaussian model with a sparse precision matrix is efficiency - we can fit models that we couldn’t if the “model” would be dense. Think millions of parameters. This is why Dan was so adamant about avoiding casting sparse to dense.

The critical point is conditional independence, which gives sparse structure. Examples are time-series (banded precision), some spatial (CAR with neighbourhood structure, SPDE with some meshing magic), phylogeny (banded precision), pedigree (expanded band) and some other models.

seantalts · 2019-09-01T16:04:57Z

Thanks all, and especially @gregorgorjanc for joining the conversation and explaining more of the background for me :)

So we will need to allow element-wise reads after all, is that right? I'm thinking that because that is the most obvious way to code up the ICAR example, but it's also possible we would still want to disallow that and only let users use some new distributions functions we'd be creating (which I guess would be a bunch of our multivariate functions but with sparse_ prepended to the name, to follow the convention; e.g. "sparse_multi_normal`).

I want to also think ahead and think about how we can define more of the Math library in Stan itself (if we can do that efficiently), so if we can think ahead and not put in any constraints that prevent a user from coding their own sparse multivariate distributions in Stan itself that's probably a good forward-looking move in my opinion.

gregorgorjanc · 2019-09-01T16:35:41Z

For completeness of discussion, BUGS “language” specifies Gaussian distribution with a precision parameter. I believe this was due to the natural/canonical parametrisation of the Gaussian and the conditional (graph) model structure embedded implicitly in the “language”.

dpsimpson · 2019-09-01T16:53:07Z

Sean: No. you do it build the matrix with sums and scalar products. Density implemented with matvecs. Linear algebra is your friend. Gregor: Hi!! Yes! And thanks! Michael: ugh. Kinda? “Random” is a weird word there. CAR models have invertible precisions (the ICAR That Mitzi used is a boundary case. We’ve not looked at others because we don’t have the linear algebra). GPs are a much broader class of things than you talk about there. Not sure a design doc is the place to learn about statistical uses of sparse matrices. I recommend Rue and Held’s book. Or that we move this less relevant portion of the discussion to the discourse.

…

On Sun, Sep 1, 2019 at 12:35 Gregor Gorjanc ***@***.***> wrote: For completeness of discussion, BUGS “language” specifies Gaussian distribution with a precision parameter. I believe this was due to the natural/canonical parametrisation of the Gaussian and the conditional (graph) model structure embedded implicitly in the “language”. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=ADRICBSAURYMUSV7EG2VGNDQHPVN5A5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5UGDDI#issuecomment-526934413>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADRICBQDTQZT2F4DLU463JTQHPVN5ANCNFSM4IPODLXA> .

seantalts · 2019-09-01T16:56:25Z

Ok cool! If people can build their own I'm happy with the design. Would someone mind adding what the ICAR model would look like with sparse matrices to the design doc as an example?

seantalts · 2019-09-01T17:45:25Z

(Or post it as a comment in this thread and I can add it to the PR for the doc)

bob-carpenter · 2019-09-01T20:56:15Z

On Sep 1, 2019, at 12:04 PM, seantalts ***@***.***> wrote: Thanks all, and especially @gregorgorjanc for joining the conversation and explaining more of the background for me :) So we will need to allow element-wise reads after all, is that right? I'm thinking that because that is the most obvious way to code up the ICAR example,

It's the CAR example that has the full structure here. The ICAR is easy to compute as is.

but it's also possible we would still want to disallow that and only let users use some new distributions functions we'd be creating (which I guess would be a bunch of our multivariate functions but with sparse_ prepended to the name, to follow the convention; e.g. "sparse_multi_normal`).

I'm not sure we need to change the name here for sparse cases, but we could. I just worry about crossing that with _precision, etc.

I want to also think ahead and think about how we can define more of the Math library in Stan itself (if we can do that efficiently), so if we can think ahead and not put in any constraints that prevent a user from coding their own sparse multivariate distributions in Stan itself that's probably a good forward-looking move in my opinion.

Right now, we can't code the density functions in Stan because we don't have type traits and we don't have the vector views.

seantalts · 2019-09-16T22:34:46Z

So I think this is just waiting on @SteveBronder for:

A before-and-after example illustrating what the new syntax will provide, and
confirmation from Aki that we will need to represent the sparsity structure of a matrix in the Stan program (because parameters will otherwise have no way of knowing how to construct themselves).

I think this is ready to go as-is after that.

dpsimpson · 2019-09-16T22:51:37Z

I’m in Helsinki next week so maybe we should finish it off then

…

On Mon, 16 Sep 2019 at 23:34, seantalts ***@***.***> wrote: So I think this is just waiting on @SteveBronder <https://github.com/SteveBronder> for: 1. A before-and-after example illustrating what the new syntax will provide, and 2. confirmation from Aki that we will need to represent the sparsity structure of a matrix in the Stan program (because parameters will otherwise have no way of knowing how to construct themselves). I think this is ready to go as-is after that. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=ADRICBW5VSASJYO3KGIVUNLQKACYRA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD62W7LQ#issuecomment-531984302>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADRICBVQYYDDLPLEXGTBSFLQKACYRANCNFSM4IPODLXA> .

seantalts · 2019-09-21T17:28:25Z

Are you guys all together there? Want to finish this design doc up? Just need that before-and-after example assuming we're not changing the syntax.

SteveBronder · 2019-09-21T17:38:08Z

Are you guys all together there? Want to finish this design doc up? Just need that before-and-after example assuming we're not changing the syntax.

Apologies! In Paris until Monday. I'll update this on Tuesday, ask Aki about (2) and get on the phone with Dan

dpsimpson · 2019-09-26T11:04:24Z

// Define real a,b,c;
// Define sparse_matix A,B,C,D; (different sparsity pattern)
// We need 
sparse_matrix E = a*A + b*B + C*D;
E = 4*F;  // ERROR IF F is not the same sparsity

sparse_matrix G = rbind(A,B);

…notes: - When we declare a sparse matrix the sparsity structure should be fixed. - How to declare a sparse matrix without assigning a value - Need to make our own sparse matrix that holds the sparsity pattern - No explicit indexing for wip - Sparse matrix in data block, not params or transformed params block - Do ordering for all square matrices.

SteveBronder · 2019-09-26T20:49:26Z

Sorry for my radio silence, I've been pretty deep in the compiiler. I just updated this with @seantalts notes above and a recent conversation with Aki and Dan. You can see those changes here and the notes from the meeting in the commit message of a938b45

bob-carpenter · 2019-10-20T16:13:29Z

Overall

This doc should be self-contained and include any necessary material from earlier proposals rather than linking. Reference links at the end are really nice.

Summary

the matrices -> a matrix's

The exclusion of sparse vectors from the initial spec should be mentioned up front.

Motivation

ICAR does not use sparse matrices. A lot of big data regressions in fields like natural lanaguage processing are sparse.

fan-dangling -> finagling (?), index fiddling (?)

Can we really support all existing methods for matrix types? Does that mean all the mixed type operations, too such as sparse matrix times dense matrix? Is there a useful way to roll this out more incrementally?

Does all existing methods extend to types? That is, will we have a sparse simplex or sparse positive definite matrix (unfortunately called "covariance" because I was young and didn't know better---it should be called "spd" for symmetric-positive-definite).

I think the list here should just be things that wouldn't be included under "all existing methods".

Given the spec to support all existing functions (methods are technically non-static class functions), this list should just be for new things not already in the library.

Is return always sparse unless one of the arguments is dense? If so, we can just say that.

And there needs to be more description for the new operations:

fill-reducing reordering
required elements of the inverse
operations to move from sparse to dense and vice versa
specializations for some log pdfs and pmfs

likelihood -> density (our lpdf and lpmf can also be used in other contexts than just likelihoods, such as in priors)

Notes on implementation (like the reverse-mode derivative of Cholesky) can go in an implementation notes section below the spec. I think this spec should be more functional in terms of what things will look like for the user.

Guide-level

Move citation of previous specs to the references section at the end.

What does no explicit indexing within a sparse matrix mean? If a is a sparse matrix, does that mean a[i, j] isn't legal as either an rvalue or lvalue?

Sparse matrices dimensions -> Sparse matrix size specifications

Note: Stan separates dimension an size specifications in that function arguments are dimensioned but not sized; try to use "rows" and "columns" or "size" rather than "dimension" because "dimension" is ambiguous in the context of a spec (a vector is a one-dimensional data structure, but a vector of size K is used to represent a point in K-dimensional space).

Could you elaborate on what it means to order a square sparse matrix after initial construction. Is that a new function? Something done automatically under the hood?

The enumeration of blocks needs to include function arguments. In Stan, the function arguments are not sized, whereas block and local arguments are. Similarly, there are constrained types which only show up as block variables.

I'd be extra explicit in the example that nz_row_alt is different than nz_row, because it should be legal if both sparsity patterns are the same even if they're not specified by the same variable or expression.

Data

The overall description of coordinate list notation should be pulled up front, with the example. The col/row label is reversed.

This data section can then stick to how sparse matrices are used in the data block.

What is proper IO? There needs to be a spec for the data formats in both JSON and rdump at this point.

If vectors are not in the initial implementation, please pull them out into a section of things to do later or otherwise indicate how things are going to be staged.

We've thought about having sizes also be read from the data structure directly, but haven't done that anywhere. That's not to say we shouldn't do it here.

The example should have

int nz_row_ind[K] = { 2, 3, ...., 5 };

to be well-formed Stan code. But the example needs to move down to the transformed data section where it's relevant because no definitions are allowed in the data block.

What type are the values that are set? Are those in the form of a dense vector or an array of reals?

Transformed data

The Stan code for actually building up nz_row_ind, etc. should be moved down here.

The to_sparse_matrix function should be specified directly. You don't need to say much as it should be clear to everyone what it does.

Linear Algebra -> linear algebra (capitalize names, not concepts)

Does the linear algebra example imply that we can use sparse_matrix in the data block with no sizing whatsoever. Or should that be sparse_matrix[N, N]?

There's nothing in the bounds, just <>?

The sense in which the sparsity pattern is a matter for the math library is that the sparsity pattern checks have to be run time, not compile time. That still leaves them in-scope for this spec as it has to cover both the math lib part and the languiage.

Parameters, ...

Break these three into their own sections.

Parameters must specify sizes in their declarations---there's no way to deduce them. They must have th same sparsity pattern in each iteration, too. Mention that an alternative parameter declaration is just the values, with the sparse matrix being created elsewhere (model or transformed parameters).

Transformed parameters and generated quantities should presumably work the same way as transformed data, with the caveat noted that the size/sparsity-pattern not change iteration to iteration. The way to say that is that they have to be data-only variables, which is already a requirement for sizes in these (and the parameters) block.

If you want to add historical notes, put them later.

There are no temporary scopes in the parameters block. I don't see why we couldn't use them as parameters given your spec. If they are not going to be allowed as parameters, that needs to be clearly indicated. And I think it would cause a lot of user confusion.

Helper functions

These should be laid out explicitly. Propose the ones we'll start with and then we can add more later. The point is that this spec should be as specific as possible.

Reference-level explanation

The JSON I/O is user level.

It would be much more compact to use three parallel arays in JSON rather than breaking it down by value this way. That also matches the way it's declared, so that's probably not such a hardship for users. Same in Rdump. So how about JSON like

{
...
  "X" : {
    "row" : { 20, 3, ... },
    "col" : { 1, 11, ... },
    "val" : { 1.85, 2.71, ... }
  }
...
}

where the row and col values are integers and the val floats (this needs to be in the spec even though JSON does not distinguish numerical types). Numbers will be represented as elsewhere (there's some complication around NaN and inf, but that's not this spec's responsibility).

And then in Rdump:

...
X <- list(row = c(20, 3, ...),
          col = c(1, 11, ...),
	  val = c(1.85, 2.71, ...))
...

"Something like what Ben did [here]" needs to be unfolded so the spec is self-contained and precise about what's being proposed.

The implementation details aren't so important and can be saved for later. If they're included, outerIndex and innerIndices and nnz variables need to be explained.

This spec doesn't need implementation details down to the for-loop level.

Stan Math

Templating should be described.

In the hard way, The "coefficient access error Dan saw" needs to be described. Is the proposal to use A.coeff() everywhere in the code to make this work?

The simple way should be at same level of heading.

Those specializations for spare matrices need to be careful to use Eigen's mixed types properly. I hadn't realized we were doing that and could use it for addition.

These implementations should all move for reverse mode into the adjoint-partial approach which will dramatically cut down on stack funtions and virtual function calls over the way the examples are coded here by delegating to operator+ in Eigen.

Keeping Permutation Matrix

What does "this adds a bit of state" mean? That sounds scary as the entire math library is assumed to be stateless as is the generated Stan model code.

Drawbacks

I'd remove "code smell" and just list the drawback---everything's going to get more complicated on the template and code duplication front as well as in the parser as this spec has us overloading more heavily.

Rationale

Why did this get consensus? Is it the close connection to Eigen? The simpler long-form inputs? Both are positives here, I think. We definitely don't want to be innovating in the sparse matrix space!

I strongly prefer the type name to attributes. Once you have sparsity, you are normally having to deal with it both computationally and statistically.

Prior art

The prior art I'd look for would be in R, Python (NumPy/pandas), and MATLAB. Any idea what those do?

Unresolved questions

That first bullet point should be removed---it's instructions, not an unresolved question.

Language design is not an unresolved question. What needs to be described more fully is how type inference is going to work and how assignment is going to work. Can we assign sparse matrices to dense matrices, for example? Is any indexing allowed? These absolutely must be answered and not left unresolved before going forward with this.

The new Stan compiler is just the implementation of this. I don't think we need a lot of details on that in this spec unless there's some potential pitfall that's not obvious.

The whole theory of MCMC changes when you allow different numbers of parameters per iteration. For example, what's it mean to converge? We can leave that can of worms closed for now.

Cost not being worth it is the overall question for the spec, not an unresolved question.

You can leave it to reviewers to chime in on prior art.

Eigen

So our constructors are going to require conversion back and forth to the inner representation of Eigen. That seems like it should be listed under drawbacks.

avehtari · 2019-10-21T00:46:12Z

designs/0004-sparse-matrices.md

+- Addition (returns sparse matrix, possibly with different sparsity structure)
+- Sparse matrix transpose (returns sparse matrix, different sparsity structure)
+- Sparse matrix-vector multiplication (returns dense vector)
+- Sparse matrix-constant multiplication (returns sparse matrix, same sparsity)


Is "constant" correct work here? Should it be "real" or "scalar"?

avehtari · 2019-10-21T00:51:47Z

designs/0004-sparse-matrices.md

+
+- Addition (returns sparse matrix, possibly with different sparsity structure)
+- Sparse matrix transpose (returns sparse matrix, different sparsity structure)
+- Sparse matrix-vector multiplication (returns dense vector)


Later there is "Sparse matrix-dense matrix", should this be then also "Sparse matrix-dense vector" or do we assume that if sparse is not mentioned then it is dense.

avehtari · 2019-10-21T00:55:46Z

designs/0004-sparse-matrices.md

+- Sparse inner product and quadratic form (returns scalars)
+- Operations to move from sparse to dense matrices and vice versa.
+- Fill-reducing reorderings
+- A sparse Cholesky for a matrix of doubles.


Which models would benefit from additional sparse QR and sparse SVD?

avehtari · 2019-10-21T00:58:11Z

designs/0004-sparse-matrices.md

+int nz_row_ind[K]; // Non-empty row positions
+int nz_col_ind[K]; // Non-empty col positions
+
+sparse_matrix[nz_row_ind, nz_col_ind, N, M] A


avehtari · 2019-10-21T00:58:26Z

designs/0004-sparse-matrices.md

+data {
+int N; // Rows
+int M; // Cols
+sparse_matrix[N, M] A


avehtari · 2019-10-21T01:02:32Z

designs/0004-sparse-matrices.md

+Parameters can be defined as above for data or deduced from the output of other functions.
+
+```stan
+parameters {


Add comment that currently we don't know a use case where sparse matrix would be defined as a parameter. It would be common to declare in transformed parameters block a sparse matrix which depends on few lower dimensional parameters.

avehtari · 2019-10-21T01:06:23Z

designs/0004-sparse-matrices.md

+}
+transformed parameters {
+  // Non-zero elements are deduced by the operation on x
+  sparse_matrix[N, N] K = cov_exp_quad(x, alpha, rho);


cov_exp_quad is a bad example as it will always produce a dense matrix. There are some covariance functions which produce sparse covariance, but they are not useful in Stan as the sparsity structure depends on length scale parameter. Better example could be e.g.

sparse_matrix[N, M] P = A * alpha;

avehtari · 2019-10-21T01:07:52Z

designs/0004-sparse-matrices.md

+matrix[K, 3] = get_nz_elements(x);
+```
+
+There can also be methods to bind sparse matrices together by rows or columns


There can also be methods -> There has to be methods

SteveBronder and others added 10 commits August 6, 2019 17:10

Add 1/2 draft of sparse matrix

85d01f5

fixup a bit

4c9990e

update with more prior art

0dc7e3f

Update 0004-sparse-matrices.md

acac689

Add stuff so we can use only one function

ed49dfb

Fix some grammar and add links

38ed2f3

Merge branch 'spec/sparse-matrices' of github.com:SteveBronder/design…

0a56e83

…-docs into spec/sparse-matrices

update doc

5b85083

update docs

8c65419

update docs

8596f22

SteveBronder commented Aug 26, 2019

View reviewed changes

SteveBronder changed the title ~~Spec/sparse matrices~~ Sparse Matrices Aug 26, 2019

add latest discussion to prior art

e322c9d

rok-cesnovar reviewed Aug 29, 2019

View reviewed changes

SteveBronder added 3 commits September 26, 2019 23:34

update with Dan's example code

2cc0595

grammatical update

6b7e894

avehtari reviewed Oct 21, 2019

View reviewed changes


		## Parameters, Transformed Parameters, and Generated Quantities

		Parameters can be defined as above for data or deduced from the output of other functions.

		### Keeping Permutation Matrix from Cholesky

		`SimplicalCholeskybase` keeps the permutation matrix, when the user does a cholesky decomposition we can pull out this permutation matrix and keep it to use in the next iteration. We do this through `EIGEN_SPARSEMATRIX_BASE_PLUGIN`, adding the permutation matrix to the input matrix. This adds a bit of state, but assuming the sparse matrices are fixed in size and sparsity this should be fine.


		Tensorflow uses the same data storage schema inside of [`tf.sparse.SparseTensor`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/sparse/SparseTensor) with a limited amount of specific [methods](https://www.tensorflow.org/api_docs/python/tf/sparse). It does not seem that they have [Sparse Cholesky support](https://github.com/tensorflow/tensorflow/issues/15910).

		It seems like OpenAI has methods like matrix multiplication for block sparse matrices (Sparse matrices with dense sub-blocks) in tensorflow available on [github](https://github.com/openai/blocksparse).


		# Appendix: (Eigen Sparse Matrix formats)

		From the Eigen [Sparse Matrix Docs](https://eigen.tuxfamily.org/dox/group__TutorialSparse.html), Eigen matrices use a Compressed Column Scheme (CCS) to represent sparse matrices.


		## I/O

		Data can be read in from json via a list of lists and the Rdump from the equivalent.

Sparse Matrices #8

Are you sure you want to change the base?

Sparse Matrices #8

Conversation

SteveBronder commented Aug 26, 2019 • edited Loading

Summary

Language

I/O

Stan math

SteveBronder commented Aug 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteveBronder Aug 26, 2019 • edited Loading

Choose a reason for hiding this comment

seantalts commented Aug 28, 2019

bob-carpenter commented Aug 28, 2019 via email

dpsimpson commented Aug 28, 2019 via email

bob-carpenter commented Aug 28, 2019 • edited Loading

betanalpha commented Aug 28, 2019 via email

dpsimpson commented Aug 28, 2019 via email • edited Loading

dpsimpson commented Aug 28, 2019

dpsimpson commented Aug 28, 2019

betanalpha commented Aug 28, 2019 via email

dpsimpson commented Aug 28, 2019 via email

seantalts commented Aug 28, 2019

betanalpha commented Aug 28, 2019 via email

seantalts commented Aug 28, 2019

rok-cesnovar Aug 29, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dpsimpson commented Aug 29, 2019 via email

bob-carpenter commented Aug 30, 2019 via email

seantalts commented Aug 31, 2019

seantalts commented Aug 31, 2019

bob-carpenter commented Sep 1, 2019 via email

dpsimpson commented Sep 1, 2019 via email

dpsimpson commented Sep 1, 2019 via email

bob-carpenter commented Sep 1, 2019 via email

bob-carpenter commented Sep 1, 2019 via email

betanalpha commented Sep 1, 2019 via email

gregorgorjanc commented Sep 1, 2019

seantalts commented Sep 1, 2019

gregorgorjanc commented Sep 1, 2019

dpsimpson commented Sep 1, 2019 via email

seantalts commented Sep 1, 2019

seantalts commented Sep 1, 2019

bob-carpenter commented Sep 1, 2019 via email

seantalts commented Sep 16, 2019

dpsimpson commented Sep 16, 2019 via email

seantalts commented Sep 21, 2019

SteveBronder commented Sep 21, 2019

dpsimpson commented Sep 26, 2019 • edited Loading

SteveBronder commented Sep 26, 2019

bob-carpenter commented Oct 20, 2019

Overall

Summary

Motivation

Guide-level

Data

Transformed data

Parameters, ...

Helper functions

Reference-level explanation

Stan Math

Keeping Permutation Matrix

Drawbacks

Rationale

Prior art

Unresolved questions

Eigen

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteveBronder commented Aug 26, 2019 •

edited

Loading

SteveBronder Aug 26, 2019 •

edited

Loading

bob-carpenter commented Aug 28, 2019 •

edited

Loading

dpsimpson commented Aug 28, 2019 via email •

edited

Loading

rok-cesnovar Aug 29, 2019 •

edited

Loading

dpsimpson commented Sep 26, 2019 •

edited

Loading