-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse Matrices #8
base: master
Are you sure you want to change the base?
Conversation
…-docs into spec/sparse-matrices
I had this open on a separate draft Pr and am going to bring over some of the discussion |
|
||
## Parameters, Transformed Parameters, and Generated Quantities | ||
|
||
Parameters can be defined as above for data or deduced from the output of other functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have type inference yet in Stan - I think so far declarations will all need to be of the same form... not sure. Might want to make an open item of that in the design doc.
I think so far declarations will all need to be of the same form
One thing I'm a bit worried by is that users would then have to know the non-zero elements of the Cholesky or any sparse matrix output.
When I'm thinking about this, we need to specify the nonzero rows / columns when we are writing data into a sparse matrix via triplets. But the Eigen declaration for sparse matrices is really just Eigen::SparseMatrix, so it doesn't really need to know the nonzero elements. So doing something like
Eigen::SparseMatrix<double> A = inv_solver.solve(B);
Will not need to know the non-zero elements.
Does it make sense what I'm getting at here? i.e. that specifying the non-zero elements is much more for Eigen matrix initialization than it is for the math library.
Yeah, that makes sense. It's very good that code generation can support not knowing the sparsity ahead of time for Eigen in the cases where it is immediately assigned to something with Eigen sparse structure. It makes it a little more difficult for the compiler to check that this is true, though probably still mostly possible. We'd need to basically add sparsity to our type system somehow, possibly just as two new Eigen types, SparseMatrix and SparseVector (there are no SparseRowVectors that we care about, right?). And then we can annotate all of the Stan Lib functions with additional sparsity type signatures so we know when a return value will be sparse given the input types. And then the compiler can check that we don't need the sparsity on the declaration.
That brings up another awkward point - do we need to check the sparsity structure if someone creates a full declaration but then assigns to some function output? Or do we only allow the full sparsity declaration in the data and parameters blocks?
The sparsity pattern has to be a known-at-compile-time deterministic function of the data. Thankfully the Eigen sparse Cholesky (etc) does a two-phase sweep that can be separated out. Phase one is called a "symbolic factorization", which works out the sparsity pattern and does all the appropriate allocation. This is 100% a thing that can be used to infer that sparsity structure.
Some possible problems would be if someone wanted something like (apologies for the messy pseudocode)
sparse_matrix A; // structure given
sparse_matrix B; // structure given
sparse_matrix L; // structure to be inferred
real c; // data
L = chol(A + c*B); // A problem if c = 0!
That brings up another awkward point - do we need to check the sparsity structure if someone creates a full declaration but then assigns to some function output? Or do we only allow the full sparsity declaration in the data and parameters blocks?
Yeah. This is the issue here. I had a shot at it here: https://aws1.discourse-cdn.com/standard14/uploads/mc_stan/original/2X/1/13fda4102c8f48e5aadbf0fbe75d3641a187d0a3.pdf
Essentially it would allow for two different declarations of sparse matrices:
sparse_matrix[N, M, nz_foo_row_index, nz_foo_col_index] A;
sparse_matrix[foo(A)] L;
where foo is a function that returns a sparse matrix. Steve's version doesn't have the explicit [foo(A)] bit and just has sparse_matrix L = foo(A) which is probably more in line with the idea of bringing declaration closer to use.
This will work if we have a requirement that every function foo_sparse in math that returns a sparse matrix has a foo_sparse_pattern_ variant that takes only data. (This is the same pattern Eigen uses for sparse matrix factorizations.)
@seantalts Would it be possible for the new compiler to find these things and just execute them once even though it would be in the model block? Because when this is declared in the parameter block, the pattern is data, while the "values" are vars (Does this question make sense?)
A thing that would need to be avoided is
// Declare B
sparse_matrix A = foo_sparse(B);
A = bar_sparse(B);
if foo and bar return different patterns. This means that the compound declaration has to be "special" (ie call the foo_sparse_pattern_ function to declare the sparse matrix and then call foo_sparse to get the values), while the ordinary bar_sparse call doesn't call bar_sparse_pattern_
### Keeping Permutation Matrix from Cholesky | ||
|
||
`SimplicalCholeskybase` keeps the permutation matrix, when the user does a cholesky decomposition we can pull out this permutation matrix and keep it to use in the next iteration. We do this through `EIGEN_SPARSEMATRIX_BASE_PLUGIN`, adding the permutation matrix to the input matrix. This adds a bit of state, but assuming the sparse matrices are fixed in size and sparsity this should be fine. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplical?
Also we might want to think about the Math library as being stateless and thread state through everywhere with the compiler going forward... can talk this over in person, not sure it's fully formed in my head yet.
Simplical?
That's the words Eigen uses
Also we might want to think about the Math library as being stateless and thread state through everywhere with the compiler going forward... can talk this over in person, not sure it's fully formed in my head yet.
Yeah I worry about adding state like this. Another options is to do the sorting immediately when a sparse matrix is created
We could assume that every square sparse matrix is going to be decomposed and just keep a permutation around at all times. Eigen's AMD re-ordering works regardless of symmetry so it's all good. (It's also cheap to compute relative to everything else.) This would make it a deterministic function of the data that is stored in the same object as the rest of the sparse matrix (assuming we make a simple container class around Eigen::SparseMatrix) so that should keep things stateless and thread safe
|
||
Tensorflow uses the same data storage schema inside of [`tf.sparse.SparseTensor`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/sparse/SparseTensor) with a limited amount of specific [methods](https://www.tensorflow.org/api_docs/python/tf/sparse). It does not seem that they have [Sparse Cholesky support](https://github.com/tensorflow/tensorflow/issues/15910). | ||
|
||
It seems like OpenAI has methods like matrix multiplication for block sparse matrices (Sparse matrices with dense sub-blocks) in tensorflow available on [github](https://github.com/openai/blocksparse). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raises a question about our need (or lack thereof) for blocked operations too maybe?
yes @dpsimpson has brought this up as well as another person in my lab
We don't need blocked operations for sparse matrices. They're only efficient for sparse matrices with dense sub-blocks and we don't have any algorithms that take advantage of that structure. It's a real speed up on some sense (replace level 1 with level 2 and 3 BLAS operations lets you be more cache efficient for example) but it would be a weird thing to target on the first go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UGA animal breeding have a sparse-dense matrix package YAMS - it’s Fortran though. See slides at http://nce.ads.uga.edu/wiki/lib/exe/fetch.php?media=uga2018_yams.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat thanks! I'll try to find their code online
(@sparse(nz_rows=nz_rows_ind, nz_cols=nz_cols_ind), @opencl) Matrix[N, M] A; | ||
``` | ||
|
||
Another alternative would be to extend the `csr_*` functions, though this requires a lot of recomputation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The csr_* functions are not good and should go away. Also it's inconsistent to have row oriented sparse-matrices and column oriented dense matrices.
|
||
# Appendix: (Eigen Sparse Matrix formats) | ||
|
||
From the Eigen [Sparse Matrix Docs](https://eigen.tuxfamily.org/dox/group__TutorialSparse.html), Eigen matrices use a Compressed Column Scheme (CCS) to represent sparse matrices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One very important but minimally documented part of Eigen's sparse libraries is that it tries to remove "lucky" zeros (aka the symbolic step computes the structural non-zeros of A*B, but some of the elements that could be non-zero end up being zero when the product is done. These are the "lucky" zeros that eigen removes). This is very bad for us as it changes the dimension of the resulting std::vector. The key function in Eigen is prune and a quick grep through the library looks like it's only used for matrix-matrix products. So this is a danger zone!
I'm looking at godbolt and it looks like Eigen keeps the sparse matrix values for multiplication even when they are zero?
Which is odd because looking at their unary_evaluator it looks like you are right and they do a pruning mechanism?
It looks like the pruning happens here. Maybe we can fiddle the value of epsilon to turn off pruning?
I think the doc could use a little discussion on whether sparse matrices deserve their own type in Stan. Some starter points off the top of my head:
cons:
The "what do users want" question difficult to test because anyone who is already aware of sparse matrices will probably consider them necessary as first-class objects, but there's a whole class of Stan user who would just like their model to run faster and wouldn't care if we switched to sparse under the hood. I wonder if @lauren can offer advice here? |
On Aug 28, 2019, at 3:05 PM, seantalts ***@***.***> wrote:
I think the doc could use a little discussion on whether sparse matrices deserve their own type in Stan.
If they don't have their own type, how does the decision to represent a matrix get handled? At the very least, we'll need a new I/O format so that we don't need quadratic memory in and out.
What happens when we print or save parameters to a CSV file during sampling? Does it do one thing for sparse and one thing for dense? How will the user be able to figure out what's going to happen? What's going to ensure we have the same parameters every iteration if there's no way to type the sparsity pattern?
What about row vectors? That's a third way to represent a sequence of real values.
Then at higher orders, we get the same issue with array of vectors (type of vectorized outcomes for multinormal), array of row vectors, a matrix, and a 2D array? Now we have four Stan types for the same abstract 2D structure.
Is this at all like integer vs. double? We can assign integer to double (perhaps with loss of precision, and vice-versa with rounding), but mathematically we tend to think of them differently. That difference doesn't make much difference in stats other than that there are naturally integer disributions like Poisson or Bernoulli.
How do you think about integer vs. double? That's a common distinction in programming languages that languages like R decided to sweep under the rug. It's there in the guts of R implicitly, as well as a distinction from bool; I imagine something similar is being suggested for Stan, where the decision is implicit on the framework's part (in R, you can modify and inspect, but I think the proposal for Stan is to literally collapse the types from the user's perspective so that there's no way to tell if a matrix is sparse or dense).
Some starter points off the top of my head:
pros:
• can add functions dealing with sparsity in an adhoc fashion to the math library slowly over time.
• can prevent users from doing inefficient things with sparse matrices, like using it in a function that would just convert it to dense first.
It's easy enough to add converters all over the place that convert sparse to dense when necessary. But that won't give us the right result. We can't just remove zero values, as we may need them for autodiff (this is a problem with current Eigen implementations). In particular, we need to be sure that we get the same sparsity in output for parameters at each iteration. If we just willy-nilly promote to dense, that won't be the case or might not be predictable.
cons:
• Users don't necessarily care that their matrices are stored in sparse or dense form; they want efficient inference. I think there's some wisdom in trying to keep a 1-to-1 correspondence between a Stan model and the mathematical representation of it, if we can keep computational issues behind the silk screen.
They will very much care during I/O.
What mathematical representation is supposed to be one-one with a Stan program? The mapping from programs to posterior densities is many to one.
• I think everyone universally regrets adding both real[] and vector to Stan, which is a pretty similar use-case in that both are representing the same mathematical object just with slightly different computational properties. Keeping users from doing linear algebra operations on real[] doesn't seem to have served any real purpose.
I don't regret it! It's a natural consequence of having primitive types int, real, vector, row_vector and matrix and closing under taking arrays. If you don't allow it, it complicates every definition and lots of code where it has to be excluded from the natural definition. I'm not saying it's impossible, just that it's going to add its own set of complications.
I don't think of arrays and vectors as the same mathematical object any more than I think of a complex number as a pair of real numbers, despite their being the same abstract data type in terms of data.
I made the same initial distinction as is made in Eigen, namely that linear algebra operations apply to matrix types. It's also clunky in Eigen. But it is worth keeping in mind that the major C++ matrix lib did find it useful to make that distinction. They're even stricter, I think, in limiting elementwise operations to arrays.
What about row vectors? That's a third way of representing a sequence of real numbers. The reason it's nice to distinguish these types is that multiplication signatures are real(row_vector, vector) and matrix(vector, row_vector).
I think the argument is that real arrays are redundant because there's nothing you can do with an array you can't do with a vector or row_vector instead. That may be true now so that we wouldn't lose any functionality getting rid of them. I don't think it would fail anywhere.
The "what do users want" question difficult to test because anyone who is already aware of sparse matrices will probably consider them necessary as first-class objects, but there's a whole class of Stan user who would just like their model to run faster and wouldn't care if we switched to sparse under the hood. I wonder if @lauren can offer advice here?
Lauren is @lauken13, but I'm answering via email, so my @ won't work.
Dan Simpson probably has the most experience with this on the project, though I'd suggest asking Andrew, too.
|
If it’s not a type it might as well not be done. It would be extremely
difficult to use. Akin to not having a dense matrix type and making users
deal with flattened representations.
Not sure why any function would cast a sparse matrix to a different type
automatically rather than just throw a “not defined for type sparse”
warning. Struggle to think of a use case for a sparse-to-dense cast. What’s
your use case.
No earthly idea how to “pretend” sparse matrices are just matrices that are
different in hidden under the hood ways. Neither matlab or R does this.
…On Wed, 28 Aug 2019 at 15:05, seantalts ***@***.***> wrote:
I think the doc could use a little discussion on whether sparse matrices
deserve their own type in Stan. Some starter points off the top of my head:
pros:
1. can add functions dealing with sparsity in an adhoc fashion to the
math library slowly over time.
2. can prevent users from doing inefficient things with sparse
matrices, like using it in a function that would just convert it to dense
first.
cons:
1. Users don't necessarily care that their matrices are stored in
sparse or dense form; they want efficient inference. I think there's some
wisdom in trying to keep a 1-to-1 correspondence between a Stan model and
the mathematical representation of it, if we can keep computational issues
behind the silk screen.
2. I think everyone universally regrets adding both real[] and vector
to Stan, which is a pretty similar use-case in that both are representing
the same mathematical object just with slightly different computational
properties. Keeping users from doing linear algebra operations on
real[] doesn't seem to have served any real purpose.
The "what do users want" question difficult to test because anyone who is
already aware of sparse matrices will probably consider them necessary as
first-class objects, but there's a whole class of Stan user who would just
like their model to run faster and wouldn't care if we switched to sparse
under the hood. I wonder if @lauren <https://github.com/lauren> can offer
advice here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8?email_source=notifications&email_token=ADRICBWJGDIWECK6LGTGTJDQG3EBFA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MEZRQ#issuecomment-525880518>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADRICBRHGWMVFTWJ3MXPWHLQG3EBFANCNFSM4IPODLXA>
.
|
The motivation would be to make sure user programs with the right math don't break because they got the data types wrong. We see that all the time now because our users aren't programmers. Of course, if we have a function that takes a sparse matrix in and then outputs a dense matrix, that's going to be considered broken by most people who use sparse matrices.
No pretending necessary. Mathematically, a sparse matrix is just a dense matrix with a lot of zeros (no surprise to anyone there, I hope). So the user just treats everything as a matrix (except, perhaps, during I/O) and leaves the system to figure out where things should be dense or sparse under the hood. There's a question of feasibility, but we're not violating any laws of computation here. R doesn't have a built-in sparse matrix type (though there is a contributed Matrix package with sparsity). MATLAB does have built-in sparsity and uses a separate type system. https://www.mathworks.com/help/matlab/sparse-matrices.html Under the hood, it just represents dense matrix types as an array and there's no specific vector or row-vector types, just matrix types. https://www.mathworks.com/help/matlab/matrices-and-arrays.html Maybe we should've done that. I really wanted to have row vector times vector to be a scalar and really wanted matrix times vector to be a vector. In MATLAB, everything's a matrix and they just use vector constructors. In MATLAB, Consider what R does for integer vs. real vs. boolean distinctions:
I find this kind of behavior in R very confusing. Same with transposing a vector, which turns it into a matrix.
It gets really fun trying to predict the result of mixed operations
I really don't like that |
I don't think of arrays and vectors as the same mathematical object any more than I think of a complex number as a pair of real numbers, despite their being the same abstract data type in terms of data.
I agree — arrays are a general container for any type and vectors and
row vectors are linear algebraic objects with well-defined operations
one can apply on them. If one defines a type by its admissable
operations then the similarity between real[] and vector/row_vector, not
to mention matrix[1, N] and matrix[N, 1]!, is only a superficial one.
I think lots of the user confusion comes from people accustomed to R
and its dynamic/weak type system. They expect the structural similarity
to carry over to functional similarity and get frustrated when it doesn’t.
At the very least this is something that should be emphasized very strongly
in introductory material so that users are at least aware for the issue.
Not sure why any function would cast a sparse matrix to a different type
automatically rather than just throw a “not defined for type sparse”
warning. Struggle to think of a use case for a sparse-to-dense cast. What’s
your use case.
One that tops to my mind is incomplete implementation of sparse
functionality. For example if I wanted to compute a matrix exponential
but no one has yet implemented a sparse matrix exponential then can
I no longer proceed or can I cast my sparse matrix to a dense one and
proceed, albeit with the more expensive calculation.
No earthly idea how to “pretend” sparse matrices are just matrices that are
different in hidden under the hood ways. Neither matlab or R does this.
There is potential for some expression template methods that would
lazily evaluate expressions containing dense and sparse matrices
and figure out how to propagate the sparsity only at that final evaluation,
but that would be a pain to implement.
I think that the only useful “hidden” abstraction would be hiding sparse
matrices entirely and just have functions that take in sparse inputs and
return dense or low-dimensional outputs, like quadratic forms, determinants,
or densities.
|
On Wed, Aug 28, 2019 at 16:09 Bob Carpenter ***@***.***> wrote:
Not sure why any function would cast a sparse matrix to a different type
automatically rather than just throw a “not defined for type sparse”
warning.
The motivation would be to make sure user programs with the right math
don't break because they got the data types wrong. We see that all the time
now because our users aren't programmers.
Of course, if we have a function that takes a sparse matrix in and then
outputs a dense matrix, that's going to be considered broken by most people
who use sparse matrices.
This would be extremely bad design. It should throw and informative error.
Otherwise we’ll spend our whole time explaining to people why they added
one line that had an implicit cast and their code no longer worked because
it ran out of memory.
… No earthly idea how to “pretend” sparse matrices are just matrices that are
different in hidden under the hood ways. Neither matlab or R does this.
No pretending necessary. Mathematically, a sparse matrix is just a dense
matrix with a lot of zeros (no surprise to anyone there, I hope). So the
user just treats everything as a matrix (except, perhaps, during I/O) and
leaves the system to figure out where things should be dense or sparse
under the hood. There's a question of feasibility, but we're not violating
any laws of computation here.
R doesn't have a built-in sparse matrix type (though there is a
contributed Matrix package with sparsity).
MATLAB does have built-in sparsity and uses a separate type system.
https://www.mathworks.com/help/matlab/sparse-matrices.html
Under the hood, it just represents dense matrix types as an array and
there's no specific vector or row-vector types, just matrix types.
https://www.mathworks.com/help/matlab/matrices-and-arrays.html
Maybe we should've done that. I really wanted to have row vector times
vector to be a scalar and really wanted matrix times vector to be a vector.
In MATLAB, everything's a matrix and they just use vector constructors. In
MATLAB, [1 2 3] is a 1 x 3 matrix (i.e., a row vector) whereas [1; 2; 3]
is a 3 x 1 matrix and [1 2; 3 4] is a 2 x 2 matrix.
Consider what R does for integer vs. real vs. boolean distinctions:
is.integer(1)
[1] FALSE
is.integer(as.integer(1))
[1] TRUE
I find this kind of behavior in R very confusing. Same with transposing a
vector, which turns it into a matrix.
c(1, 2, 3)
[1] 1 2 3
t(c(1, 2, 3))
[,1] [,2] [,3]
[1,] 1 2 3
It gets really fun trying to predict the result of mixed operations
t(t(c(1, 2, 3))) == c(1, 2, 3)
[,1]
[1,] TRUE
[2,] TRUE
[3,] TRUE
I really don't like that t(t(x)) != x.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8?email_source=notifications&email_token=ADRICBQAWMUIKLRCGWOLR5TQG3LRBA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MKH4A#issuecomment-525902832>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADRICBVOM3AMAFQXUZU5GFDQG3LRBANCNFSM4IPODLXA>
.
|
Just to be concrete, the model Mitzi used in the ICAR case study uses an extremely small sparse matrix. A cast to dense would cast from a vector of |
There's no way to compute a sparse matrix exponential and get a sparse matrix out (because maths). So that would be easy to specialize. |
There's no way to compute a sparse matrix exponential and get a sparse matrix out (because maths). So that would be easy to specialize
Sure, the question is what is the user experience be if it hadn’t
been specialized.
I feel like at some point users will have some corner applications
where they need to finish a calculation using dense methods.
Given the potential memory blowout from naively going from a
sparse matrix to a giant dense matrix it seems prudent to not do
this implicitly but an explicit casting function that maps sparse to
dense (and potentially even throws with an informative error if
the dimensions are too big) might be a good compromise initially.
|
I can't disagree more.
These aren't incomplete or unimplemented features, these are features that
should not exist. For scale, it's like when people say that we should have
a Gibbs sampler "just in case".
For any example where you should use sparse matrices, a to_dense will break
everything. A model with sparse matrices that needs a matrix exponential
(or an eigendecomposition or anything that is naturally dense) is just not
a realistic case for using a sparse matrix and should throw an error.
Either it's a different type or it will basically not work for 90% of
people who use it (because then you need to know all of the internals to
work out how to use it well rather than just having to accept "I can't do
xxxx with a sparse matrix").
…On Wed, 28 Aug 2019 at 16:46, Michael Betancourt ***@***.***> wrote:
> There's no way to compute a sparse matrix exponential and get a sparse
matrix out (because maths). So that would be easy to specialize
>
Sure, the question is what is the user experience be if it hadn’t
been specialized.
I feel like at some point users will have some corner applications
where they need to finish a calculation using dense methods.
Given the potential memory blowout from naively going from a
sparse matrix to a giant dense matrix it seems prudent to not do
this implicitly but an explicit casting function that maps sparse to
dense (and potentially even throws with an informative error if
the dimensions are too big) might be a good compromise initially.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8?email_source=notifications&email_token=ADRICBV7TPPYHNZANOR5X3DQG3PZHA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MNL6I#issuecomment-525915641>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADRICBXPNDSPRW55Y54MRILQG3PZHANCNFSM4IPODLXA>
.
|
Very cool, thank you all for the discussion! My one request for this PR would be to capture a summary of this in the "Rationale and Alternatives" heading. Another possible point for discussion - I think adding GPU matrix support to the Stan language is almost exactly the same, modulo some degree of difference on the break-even matrix size where converting to a dense [cpu] matrix is so inefficient as to be impossible and should be outlawed. Meaning that in GPU calculation land I think it will actually be somewhat common to want to go back and forth as we flesh out the GPU-aware implementations. So I might propose that we try to treat those the same way, meaning if we go with the full new-types approach we'd have all new |
The challenge then is designing a spare matrix type, and the
accompanying documentation, that clearly communicates those
limitations.
At best I think that most users will interpret sparse matrices as
“special kinds of matrices that sometimes admit much faster
calculations” and not “special kinds of matrices that do not play
with the standard matrices”. I understand the technical utility
of keeping a sparse type completely encapsulated in its own
world, but those not as familiar with sparse linear algebra as
yourself will need something to guide them beyond the
interpretational baggage they bring with them, for example
very careful compiler error messages, extensive documentation,
etc.
… On Aug 28, 2019, at 4:53 PM, dpsimpson ***@***.***> wrote:
I can't disagree more.
These aren't incomplete or unimplemented features, these are features that
should not exist. For scale, it's like when people say that we should have
a Gibbs sampler "just in case".
For any example where you should use sparse matrices, a to_dense will break
everything. A model with sparse matrices that needs a matrix exponential
(or an eigendecomposition or anything that is naturally dense) is just not
a realistic case for using a sparse matrix and should throw an error.
Either it's a different type or it will basically not work for 90% of
people who use it (because then you need to know all of the internals to
work out how to use it well rather than just having to accept "I can't do
xxxx with a sparse matrix").
On Wed, 28 Aug 2019 at 16:46, Michael Betancourt ***@***.***>
wrote:
> > There's no way to compute a sparse matrix exponential and get a sparse
> matrix out (because maths). So that would be easy to specialize
> >
> Sure, the question is what is the user experience be if it hadn’t
> been specialized.
>
> I feel like at some point users will have some corner applications
> where they need to finish a calculation using dense methods.
> Given the potential memory blowout from naively going from a
> sparse matrix to a giant dense matrix it seems prudent to not do
> this implicitly but an explicit casting function that maps sparse to
> dense (and potentially even throws with an informative error if
> the dimensions are too big) might be a good compromise initially.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#8?email_source=notifications&email_token=ADRICBV7TPPYHNZANOR5X3DQG3PZHA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MNL6I#issuecomment-525915641>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ADRICBXPNDSPRW55Y54MRILQG3PZHANCNFSM4IPODLXA>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=AALU3FTSHLWCDSTEE6W35X3QG3QVRA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MOBNI#issuecomment-525918389>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALU3FUD3Z2XPJMOFQ4HOH3QG3QVRANCNFSM4IPODLXA>.
|
Also curious about the ICAR model - is that a good representative example? It seems to just want to store an adjacency matrix efficiently, but otherwise to not use any sparse computation at all...? I thought someone at StanCon had said we would want to completely disallow element-wise indexing into a sparse matrix, which would appear to disallow this use-case... I am assuming I'm confused here and misheard probably multiple parts of this so please correct me with exuberance.
100% agreed - I still don't really get why you would disallow |
{'col': 1, 'row': 20, 'val': 1.85}, | ||
{'col': 11, 'row': 3, 'val': 2.71}, | ||
... | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need to be nit-picking here, but this should be written as
X = [
...
{'col': 1, 'row': 20, 'val': 1.85},
{'col': 11, 'row': 3, 'val': 2.71},
...
]
Currently JSON objects are only supported on the top level (Cmdstan at least), so we will need to remove that restriction.
|
||
## I/O | ||
|
||
Data can be read in from json via a list of lists and the Rdump from the equivalent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also support that the input is in a dense matrix form in the input file but then the parser & data handler automatically constructs the sparse matrix, ignoring zeroes? We would need to somehow inform the parser to ignore zeroes during parsing, otherwise we would first create the dense matrix and then transform which defeats the purpose of introducing sparsity..
I don’t think we should do this. Huge memory waste limits it to very small
matrices. R/Python have easy to access sparse matrix support
…On Thu, Aug 29, 2019 at 03:54 Rok Češnovar ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In designs/0004-sparse-matrices.md
<#8 (comment)>:
> +## Helper Functions
+
+We can also include helper functions to extract the sparsity pattern's row and column information.
+
+```stan
+int K = num_nz_elements(x);
+// Extract effectively a tuple representation of the sparse matrix.
+matrix[K, 3] = get_nz_elements(x);
+```
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+## I/O
+
+Data can be read in from json via a list of lists and the Rdump from the equivalent.
Should we also support that the input is in a dense matrix form in the
input file but then the parser & data handler automatically constructs the
sparse matrix, ignoring zeroes? We would need to somehow inform the parser
to ignore zeroes during parsing, otherwise we would first create the dense
matrix and then transform which defeats the purpose of introducing
sparsity..
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8?email_source=notifications&email_token=ADRICBWSAUHGVDG6BSMBYSLQG56C7A5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCDB7KCI#pullrequestreview-281277705>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADRICBVDW2CB2RINAAZPNTLQG56C7ANCNFSM4IPODLXA>
.
|
On Aug 28, 2019, at 9:56 PM, Andrew Gelman ***@***.***> wrote:
Things I can do with a vector or matrix but not a 1-d or 2-d real array:
The question was whether the problem exists the other way around. Is there something you can do with a real[] that you can't do with a vector or row_vector?
- vector and matrix operations (multiplication etc)
That's intentional. It doesn't make sense to do matrix operations on arrays. You have to guess about orientation like R and it's a mess.
- various functions that are vectorized. For example, I think you can do theta ~ normal(0,1); if theta is a vector but not if it's a 1-d array.
Yes, that works if it's a 1D array. But theta ~ multi_normal(mu, Sigma) doesn't work if mu or theta are real[] types---they have to be vectors. I did it that way because they're considered as a whole in multi_normal. It wouldn't kill me to loosen this.
I might be wrong about that particular example, but there are other examples like that.
If you can ping this thread (or just me) when one comes up, that'd be great.
- various functions for subsetting, concatenation, etc., work for vectors and matrices but not arrays.
I'm not sure what you mean here. All the slicing works the same way.
If what you mean is that you can't concatenate a real[] and vector, that's by design. What would the result be?
The main things I can do with a real array but not a vector or matrix:
- pass it into an ode function
This is a real limitation. But this is the simplest possible to_array or to_vector conversion, right? We could generalize this, but I think we'll be going with an even better solution based on closures.
These higher-order functions are special in that they're implemented very differently. We could generalize the signatures without much effort if you really find to_vector to be that onerous. But it'd have to wait until stanc3 gets rolled out as work on the old parser is frozen pending that release.
- easily change code from linear to logistic regression.
That doesn't seem like it should be so easy. The likelihood changes, you need to add a link function, and there's no more scale parameter. The constraints on y also change to be lower = 0 and upper = 1 from unconstrained.
(It's natural for me to write my linear regression data as vector[N] y; but then if I want to switch to logistic regression it becomes int y[N]; and then various other aspects of the code have to be changed cos I'm switching from vector to array.
This is because Stan distinguishes integer and real types (again, so it doesn't have to guess like R). I'm strongly opposed to having integer vector types because we're not doing number theory.
As a programmer, I like that the types reflect the underlying data type rather than being implicit.
|
Okay, to try to corral some of the conversation back, it seems like folks basically want to go with what our potential sparse matrix users would like to see in Stan's type system here, and so far the only person like that we have commenting is @dpsimpson. Dan's proposal is that we should have a separate type for every kind of matrix that could also be sparse and that we would only allow efficient operations to be allowed on them in the type system. I think we want to add some clarification for which types and operations those actually are concretely. Is it enough to add a single If so, I think we need to change the design doc to add a different model as an exemplar model; the ICAR model would not benefit from sparse matrices as described in this proposal as we wouldn't be allowed to do element-wise access. Does anyone have another model that would benefit from the design in this proposal they can contribute? |
Or I may be wrong about ICAR being a bad fit - if there's a more efficient version possible once we have sparse matrices, relevant lines from that would be good to include in this design doc to show the before-and-after. |
No, the ICAR isn't a good fit. Sparse GPs would be where I think the plan's to add sparse log determinant and sparse Cholesky factorization (on precision matrices, I think---you'd need to ask Dan!).
You an also think smaller in terms of a big regression problem where the data matrix is sparse and the coefficient vector is dense. So we definitely need some mixed operations.
… On Aug 31, 2019, at 7:40 PM, seantalts ***@***.***> wrote:
Or I may be wrong about ICAR being a bad fit - if there's a more efficient version possible once we have sparse matrices, relevant lines from that would be good to include in this design doc to show the before-and-after.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Sparse GPs are a different thing that don’t involve sparse matrices. (I
know!) So they aren’t a use case.
Multivariate Gaussians with sparse precision matrices is the main use case.
On Sat, Aug 31, 2019 at 20:28 Bob Carpenter <[email protected]>
wrote:
… No, the ICAR isn't a good fit. Sparse GPs would be where I think the
plan's to add sparse log determinant and sparse Cholesky factorization (on
precision matrices, I think---you'd need to ask Dan!).
You an also think smaller in terms of a big regression problem where the
data matrix is sparse and the coefficient vector is dense. So we definitely
need some mixed operations.
> On Aug 31, 2019, at 7:40 PM, seantalts ***@***.***> wrote:
>
> Or I may be wrong about ICAR being a bad fit - if there's a more
efficient version possible once we have sparse matrices, relevant lines
from that would be good to include in this design doc to show the
before-and-after.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8?email_source=notifications&email_token=ADRICBX6I3WMD4VEN77NQCTQHMEEJA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5TXWVI#issuecomment-526875477>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADRICBUSTOWPUCA5TN4IE5LQHMEEJANCNFSM4IPODLXA>
.
|
A good use case along the lines of the ICAR that needs full linear algebra
is this model
https://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html
(Note the case study doesn’t use the sparsity to do anything, so it’s just
the model that’s relevant.). In this case you need the log determinant if a
matrix with `var` values. You probably need to use a non-centred
parameterization.
…On Sat, Aug 31, 2019 at 20:30 Daniel Simpson ***@***.***> wrote:
Sparse GPs are a different thing that don’t involve sparse matrices. (I
know!) So they aren’t a use case.
Multivariate Gaussians with sparse precision matrices is the main use
case.
On Sat, Aug 31, 2019 at 20:28 Bob Carpenter ***@***.***>
wrote:
> No, the ICAR isn't a good fit. Sparse GPs would be where I think the
> plan's to add sparse log determinant and sparse Cholesky factorization (on
> precision matrices, I think---you'd need to ask Dan!).
>
> You an also think smaller in terms of a big regression problem where the
> data matrix is sparse and the coefficient vector is dense. So we definitely
> need some mixed operations.
>
> > On Aug 31, 2019, at 7:40 PM, seantalts ***@***.***>
> wrote:
> >
> > Or I may be wrong about ICAR being a bad fit - if there's a more
> efficient version possible once we have sparse matrices, relevant lines
> from that would be good to include in this design doc to show the
> before-and-after.
> >
> > —
> > You are receiving this because you commented.
> > Reply to this email directly, view it on GitHub, or mute the thread.
> >
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#8?email_source=notifications&email_token=ADRICBX6I3WMD4VEN77NQCTQHMEEJA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5TXWVI#issuecomment-526875477>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ADRICBUSTOWPUCA5TN4IE5LQHMEEJANCNFSM4IPODLXA>
> .
>
|
What's the application of multivariate Gaussians with sparse precision matrices? I had wrongly supposed it was GPs.
Setting aside generic sparsity, aren't there some GPs where the covariance matrix is structured like banded or something like that?
|
I thought the full CAR model was a GP. That's the kind of model where I thought we'd need the sparse log determinant.
…On Aug 31, 2019, at 9:01 PM, dpsimpson ***@***.***> wrote:
A good use case along the lines of the ICAR that needs full linear algebra
is this model
https://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html
|
Gaussian processes specify a Gaussian density through a structured
covariance matrix which is inverted, while CAR models specify a Gaussian
density through sparse precision matrix which does not need to be
inverted. In the latter case the precision matrix is ill-posed and cannot
be inverted into a precision matrix and the log determinant term isn’t
an exact log determinant but something like a log determinant with a
singularity subtracted away.
The kind of random sparsity that sparse linear algebra is most effective
for arises is natural when modeling precision matrices (which encode
conditional independencies, like the adjacency graph of a CAR model)
where as Gaussian processes yield covariances that aren’t sparse
but rather dense and structured (Kronecker, Toeplitz, etc).
At least that is my understanding.
… On Sep 1, 2019, at 4:00 PM, Bob Carpenter ***@***.***> wrote:
I thought the full CAR model was a GP. That's the kind of model where I thought we'd need the sparse log determinant.
On Aug 31, 2019, at 9:01 PM, dpsimpson ***@***.***> wrote:
>
> A good use case along the lines of the ICAR that needs full linear algebra
> is this model
> https://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=AALU3FWWMA3IFIO4IGNYF5DQHPDJLA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5UC67Q#issuecomment-526921598>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALU3FULGMLDDPCKZLVZWXDQHPDJLANCNFSM4IPODLXA>.
|
The point of Gaussian model with a sparse precision matrix is efficiency - we can fit models that we couldn’t if the “model” would be dense. Think millions of parameters. This is why Dan was so adamant about avoiding casting sparse to dense. The critical point is conditional independence, which gives sparse structure. Examples are time-series (banded precision), some spatial (CAR with neighbourhood structure, SPDE with some meshing magic), phylogeny (banded precision), pedigree (expanded band) and some other models. |
Thanks all, and especially @gregorgorjanc for joining the conversation and explaining more of the background for me :) So we will need to allow element-wise reads after all, is that right? I'm thinking that because that is the most obvious way to code up the ICAR example, but it's also possible we would still want to disallow that and only let users use some new distributions functions we'd be creating (which I guess would be a bunch of our multivariate functions but with I want to also think ahead and think about how we can define more of the Math library in Stan itself (if we can do that efficiently), so if we can think ahead and not put in any constraints that prevent a user from coding their own sparse multivariate distributions in Stan itself that's probably a good forward-looking move in my opinion. |
For completeness of discussion, BUGS “language” specifies Gaussian distribution with a precision parameter. I believe this was due to the natural/canonical parametrisation of the Gaussian and the conditional (graph) model structure embedded implicitly in the “language”. |
Sean: No. you do it build the matrix with sums and scalar products. Density
implemented with matvecs. Linear algebra is your friend.
Gregor: Hi!! Yes! And thanks!
Michael: ugh. Kinda? “Random” is a weird word there. CAR models have
invertible precisions (the ICAR That Mitzi used is a boundary case. We’ve
not looked at others because we don’t have the linear algebra). GPs are a
much broader class of things than you talk about there.
Not sure a design doc is the place to learn about statistical uses of
sparse matrices. I recommend Rue and Held’s book. Or that we move this less
relevant portion of the discussion to the discourse.
…On Sun, Sep 1, 2019 at 12:35 Gregor Gorjanc ***@***.***> wrote:
For completeness of discussion, BUGS “language” specifies Gaussian
distribution with a precision parameter. I believe this was due to the
natural/canonical parametrisation of the Gaussian and the conditional
(graph) model structure embedded implicitly in the “language”.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8?email_source=notifications&email_token=ADRICBSAURYMUSV7EG2VGNDQHPVN5A5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5UGDDI#issuecomment-526934413>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADRICBQDTQZT2F4DLU463JTQHPVN5ANCNFSM4IPODLXA>
.
|
Ok cool! If people can build their own I'm happy with the design. Would someone mind adding what the ICAR model would look like with sparse matrices to the design doc as an example? |
(Or post it as a comment in this thread and I can add it to the PR for the doc) |
On Sep 1, 2019, at 12:04 PM, seantalts ***@***.***> wrote:
Thanks all, and especially @gregorgorjanc for joining the conversation and explaining more of the background for me :)
So we will need to allow element-wise reads after all, is that right? I'm thinking that because that is the most obvious way to code up the ICAR example,
It's the CAR example that has the full structure here. The ICAR is easy to compute as is.
but it's also possible we would still want to disallow that and only let users use some new distributions functions we'd be creating (which I guess would be a bunch of our multivariate functions but with sparse_ prepended to the name, to follow the convention; e.g. "sparse_multi_normal`).
I'm not sure we need to change the name here for sparse cases, but we could. I just worry about crossing that with _precision, etc.
I want to also think ahead and think about how we can define more of the Math library in Stan itself (if we can do that efficiently), so if we can think ahead and not put in any constraints that prevent a user from coding their own sparse multivariate distributions in Stan itself that's probably a good forward-looking move in my opinion.
Right now, we can't code the density functions in Stan because we don't have type traits and we don't have the vector views.
|
So I think this is just waiting on @SteveBronder for:
I think this is ready to go as-is after that. |
I’m in Helsinki next week so maybe we should finish it off then
…On Mon, 16 Sep 2019 at 23:34, seantalts ***@***.***> wrote:
So I think this is just waiting on @SteveBronder
<https://github.com/SteveBronder> for:
1. A before-and-after example illustrating what the new syntax will
provide, and
2. confirmation from Aki that we will need to represent the sparsity
structure of a matrix in the Stan program (because parameters will
otherwise have no way of knowing how to construct themselves).
I think this is ready to go as-is after that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8?email_source=notifications&email_token=ADRICBW5VSASJYO3KGIVUNLQKACYRA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD62W7LQ#issuecomment-531984302>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADRICBVQYYDDLPLEXGTBSFLQKACYRANCNFSM4IPODLXA>
.
|
Are you guys all together there? Want to finish this design doc up? Just need that before-and-after example assuming we're not changing the syntax. |
Apologies! In Paris until Monday. I'll update this on Tuesday, ask Aki about (2) and get on the phone with Dan |
|
…notes: - When we declare a sparse matrix the sparsity structure should be fixed. - How to declare a sparse matrix without assigning a value - Need to make our own sparse matrix that holds the sparsity pattern - No explicit indexing for wip - Sparse matrix in data block, not params or transformed params block - Do ordering for all square matrices.
Sorry for my radio silence, I've been pretty deep in the compiiler. I just updated this with @seantalts notes above and a recent conversation with Aki and Dan. You can see those changes here and the notes from the meeting in the commit message of a938b45 |
OverallThis doc should be self-contained and include any necessary material from earlier proposals rather than linking. Reference links at the end are really nice. Summarythe matrices -> a matrix's The exclusion of sparse vectors from the initial spec should be mentioned up front. MotivationICAR does not use sparse matrices. A lot of big data regressions in fields like natural lanaguage processing are sparse. fan-dangling -> finagling (?), index fiddling (?) Can we really support all existing methods for matrix types? Does that mean all the mixed type operations, too such as sparse matrix times dense matrix? Is there a useful way to roll this out more incrementally? Does all existing methods extend to types? That is, will we have a sparse simplex or sparse positive definite matrix (unfortunately called "covariance" because I was young and didn't know better---it should be called "spd" for symmetric-positive-definite). I think the list here should just be things that wouldn't be included under "all existing methods". Given the spec to support all existing functions (methods are technically non-static class functions), this list should just be for new things not already in the library. Is return always sparse unless one of the arguments is dense? If so, we can just say that. And there needs to be more description for the new operations:
likelihood -> density (our lpdf and lpmf can also be used in other contexts than just likelihoods, such as in priors) Notes on implementation (like the reverse-mode derivative of Cholesky) can go in an implementation notes section below the spec. I think this spec should be more functional in terms of what things will look like for the user. Guide-levelMove citation of previous specs to the references section at the end. What does no explicit indexing within a sparse matrix mean? If Sparse matrices dimensions -> Sparse matrix size specifications Note: Stan separates dimension an size specifications in that function arguments are dimensioned but not sized; try to use "rows" and "columns" or "size" rather than "dimension" because "dimension" is ambiguous in the context of a spec (a vector is a one-dimensional data structure, but a vector of size K is used to represent a point in K-dimensional space). Could you elaborate on what it means to order a square sparse matrix after initial construction. Is that a new function? Something done automatically under the hood? The enumeration of blocks needs to include function arguments. In Stan, the function arguments are not sized, whereas block and local arguments are. Similarly, there are constrained types which only show up as block variables. I'd be extra explicit in the example that DataThe overall description of coordinate list notation should be pulled up front, with the example. The This data section can then stick to how sparse matrices are used in the data block. What is proper IO? There needs to be a spec for the data formats in both JSON and rdump at this point. If vectors are not in the initial implementation, please pull them out into a section of things to do later or otherwise indicate how things are going to be staged. We've thought about having sizes also be read from the data structure directly, but haven't done that anywhere. That's not to say we shouldn't do it here. The example should have
to be well-formed Stan code. But the example needs to move down to the transformed data section where it's relevant because no definitions are allowed in the data block. What type are the values that are set? Are those in the form of a dense vector or an array of reals? Transformed dataThe Stan code for actually building up The Linear Algebra -> linear algebra (capitalize names, not concepts) Does the linear algebra example imply that we can use There's nothing in the bounds, just The sense in which the sparsity pattern is a matter for the math library is that the sparsity pattern checks have to be run time, not compile time. That still leaves them in-scope for this spec as it has to cover both the math lib part and the languiage. Parameters, ...Break these three into their own sections. Parameters must specify sizes in their declarations---there's no way to deduce them. They must have th same sparsity pattern in each iteration, too. Mention that an alternative parameter declaration is just the values, with the sparse matrix being created elsewhere (model or transformed parameters). Transformed parameters and generated quantities should presumably work the same way as transformed data, with the caveat noted that the size/sparsity-pattern not change iteration to iteration. The way to say that is that they have to be data-only variables, which is already a requirement for sizes in these (and the parameters) block. If you want to add historical notes, put them later. There are no temporary scopes in the parameters block. I don't see why we couldn't use them as parameters given your spec. If they are not going to be allowed as parameters, that needs to be clearly indicated. And I think it would cause a lot of user confusion. Helper functionsThese should be laid out explicitly. Propose the ones we'll start with and then we can add more later. The point is that this spec should be as specific as possible. Reference-level explanationThe JSON I/O is user level. It would be much more compact to use three parallel arays in JSON rather than breaking it down by value this way. That also matches the way it's declared, so that's probably not such a hardship for users. Same in Rdump. So how about JSON like
where the And then in Rdump:
"Something like what Ben did [here]" needs to be unfolded so the spec is self-contained and precise about what's being proposed. The implementation details aren't so important and can be saved for later. If they're included, This spec doesn't need implementation details down to the for-loop level. Stan MathTemplating should be described. In the hard way, The "coefficient access error Dan saw" needs to be described. Is the proposal to use The simple way should be at same level of heading. Those specializations for spare matrices need to be careful to use Eigen's mixed types properly. I hadn't realized we were doing that and could use it for addition. These implementations should all move for reverse mode into the adjoint-partial approach which will dramatically cut down on stack funtions and virtual function calls over the way the examples are coded here by delegating to Keeping Permutation MatrixWhat does "this adds a bit of state" mean? That sounds scary as the entire math library is assumed to be stateless as is the generated Stan model code. DrawbacksI'd remove "code smell" and just list the drawback---everything's going to get more complicated on the template and code duplication front as well as in the parser as this spec has us overloading more heavily. RationaleWhy did this get consensus? Is it the close connection to Eigen? The simpler long-form inputs? Both are positives here, I think. We definitely don't want to be innovating in the sparse matrix space! I strongly prefer the type name to attributes. Once you have sparsity, you are normally having to deal with it both computationally and statistically. Prior artThe prior art I'd look for would be in R, Python (NumPy/pandas), and MATLAB. Any idea what those do? Unresolved questionsThat first bullet point should be removed---it's instructions, not an unresolved question. Language design is not an unresolved question. What needs to be described more fully is how type inference is going to work and how assignment is going to work. Can we assign sparse matrices to dense matrices, for example? Is any indexing allowed? These absolutely must be answered and not left unresolved before going forward with this. The new Stan compiler is just the implementation of this. I don't think we need a lot of details on that in this spec unless there's some potential pitfall that's not obvious. The whole theory of MCMC changes when you allow different numbers of parameters per iteration. For example, what's it mean to converge? We can leave that can of worms closed for now. Cost not being worth it is the overall question for the spec, not an unresolved question. You can leave it to reviewers to chime in on prior art. EigenSo our constructors are going to require conversion back and forth to the inner representation of Eigen. That seems like it should be listed under drawbacks. |
- Addition (returns sparse matrix, possibly with different sparsity structure) | ||
- Sparse matrix transpose (returns sparse matrix, different sparsity structure) | ||
- Sparse matrix-vector multiplication (returns dense vector) | ||
- Sparse matrix-constant multiplication (returns sparse matrix, same sparsity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "constant" correct work here? Should it be "real" or "scalar"?
|
||
- Addition (returns sparse matrix, possibly with different sparsity structure) | ||
- Sparse matrix transpose (returns sparse matrix, different sparsity structure) | ||
- Sparse matrix-vector multiplication (returns dense vector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later there is "Sparse matrix-dense matrix", should this be then also "Sparse matrix-dense vector" or do we assume that if sparse is not mentioned then it is dense.
- Sparse inner product and quadratic form (returns scalars) | ||
- Operations to move from sparse to dense matrices and vice versa. | ||
- Fill-reducing reorderings | ||
- A sparse Cholesky for a matrix of doubles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which models would benefit from additional sparse QR and sparse SVD?
int nz_row_ind[K]; // Non-empty row positions | ||
int nz_col_ind[K]; // Non-empty col positions | ||
|
||
sparse_matrix[nz_row_ind, nz_col_ind, N, M] A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing ;
data { | ||
int N; // Rows | ||
int M; // Cols | ||
sparse_matrix[N, M] A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing ;
Parameters can be defined as above for data or deduced from the output of other functions. | ||
|
||
```stan | ||
parameters { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment that currently we don't know a use case where sparse matrix would be defined as a parameter. It would be common to declare in transformed parameters block a sparse matrix which depends on few lower dimensional parameters.
} | ||
transformed parameters { | ||
// Non-zero elements are deduced by the operation on x | ||
sparse_matrix[N, N] K = cov_exp_quad(x, alpha, rho); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cov_exp_quad is a bad example as it will always produce a dense matrix. There are some covariance functions which produce sparse covariance, but they are not useful in Stan as the sparsity structure depends on length scale parameter. Better example could be e.g.
sparse_matrix[N, M] P = A * alpha;
matrix[K, 3] = get_nz_elements(x); | ||
``` | ||
|
||
There can also be methods to bind sparse matrices together by rows or columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can also be methods -> There has to be methods
Sparse matrix support in the stan language and backends.
Rendered Version
Summary
There has been much discussion about sparse matrices in Stan. This design doc brings together the discusions on how to implement them at the language, IO, and math levels. The below gives a TL;DR for each section.
Language
There will be a new
sparse_matrix
type with the non-zero (NZ) sparsity structure defined as bounds.*bounds make specifying the sparsity optional so that the sparsity pattern can be deduced under the hood for algebra etc. at the stan math level.
I/O
Sparse matrices come in lists of lists from the json or rdump
Stan math
We can either do a big refactoring to simplify the codebase or include specializations for the functions that take sparse matrices.
* I personally prefer the attribute style mentioned in the alternatives section, but Dan and Aki have both expressed interest in the
<>
style and did not like the attributed style. While it's only an N of 2 I like to think the user is right in what is aesthetically pleasing to them.