Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Matrices #8

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

SteveBronder
Copy link
Collaborator

@SteveBronder SteveBronder commented Aug 26, 2019

Sparse matrix support in the stan language and backends.

Rendered Version

Summary

There has been much discussion about sparse matrices in Stan. This design doc brings together the discusions on how to implement them at the language, IO, and math levels. The below gives a TL;DR for each section.

Language

There will be a new sparse_matrix type with the non-zero (NZ) sparsity structure defined as bounds.*

sparse_matrix<nz_rows=nz_row_ind, nz_cols=nz_col_ind>[N, M] A;

bounds make specifying the sparsity optional so that the sparsity pattern can be deduced under the hood for algebra etc. at the stan math level.

sparse_matix[N, N] B = A * A';

I/O

Sparse matrices come in lists of lists from the json or rdump

Stan math

We can either do a big refactoring to simplify the codebase or include specializations for the functions that take sparse matrices.

* I personally prefer the attribute style mentioned in the alternatives section, but Dan and Aki have both expressed interest in the <> style and did not like the attributed style. While it's only an N of 2 I like to think the user is right in what is aesthetically pleasing to them.

@SteveBronder
Copy link
Collaborator Author

I had this open on a separate draft Pr and am going to bring over some of the discussion


## Parameters, Transformed Parameters, and Generated Quantities

Parameters can be defined as above for data or deduced from the output of other functions.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seantalts

We don't have type inference yet in Stan - I think so far declarations will all need to be of the same form... not sure. Might want to make an open item of that in the design doc.

@SteveBronder

I think so far declarations will all need to be of the same form

One thing I'm a bit worried by is that users would then have to know the non-zero elements of the Cholesky or any sparse matrix output.

When I'm thinking about this, we need to specify the nonzero rows / columns when we are writing data into a sparse matrix via triplets. But the Eigen declaration for sparse matrices is really just Eigen::SparseMatrix, so it doesn't really need to know the nonzero elements. So doing something like

Eigen::SparseMatrix<double> A = inv_solver.solve(B);

Will not need to know the non-zero elements.

Does it make sense what I'm getting at here? i.e. that specifying the non-zero elements is much more for Eigen matrix initialization than it is for the math library.

@seantalts

Yeah, that makes sense. It's very good that code generation can support not knowing the sparsity ahead of time for Eigen in the cases where it is immediately assigned to something with Eigen sparse structure. It makes it a little more difficult for the compiler to check that this is true, though probably still mostly possible. We'd need to basically add sparsity to our type system somehow, possibly just as two new Eigen types, SparseMatrix and SparseVector (there are no SparseRowVectors that we care about, right?). And then we can annotate all of the Stan Lib functions with additional sparsity type signatures so we know when a return value will be sparse given the input types. And then the compiler can check that we don't need the sparsity on the declaration.

That brings up another awkward point - do we need to check the sparsity structure if someone creates a full declaration but then assigns to some function output? Or do we only allow the full sparsity declaration in the data and parameters blocks?

@dpsimpson

The sparsity pattern has to be a known-at-compile-time deterministic function of the data. Thankfully the Eigen sparse Cholesky (etc) does a two-phase sweep that can be separated out. Phase one is called a "symbolic factorization", which works out the sparsity pattern and does all the appropriate allocation. This is 100% a thing that can be used to infer that sparsity structure.

Some possible problems would be if someone wanted something like (apologies for the messy pseudocode)

sparse_matrix A; // structure given
sparse_matrix B; // structure given
sparse_matrix L; // structure to be inferred
real c; // data
L = chol(A + c*B);  // A problem if c = 0!

That brings up another awkward point - do we need to check the sparsity structure if someone creates a full declaration but then assigns to some function output? Or do we only allow the full sparsity declaration in the data and parameters blocks?

Yeah. This is the issue here. I had a shot at it here: https://aws1.discourse-cdn.com/standard14/uploads/mc_stan/original/2X/1/13fda4102c8f48e5aadbf0fbe75d3641a187d0a3.pdf

Essentially it would allow for two different declarations of sparse matrices:

sparse_matrix[N, M, nz_foo_row_index, nz_foo_col_index] A;
sparse_matrix[foo(A)] L;

where foo is a function that returns a sparse matrix. Steve's version doesn't have the explicit [foo(A)] bit and just has sparse_matrix L = foo(A) which is probably more in line with the idea of bringing declaration closer to use.

This will work if we have a requirement that every function foo_sparse in math that returns a sparse matrix has a foo_sparse_pattern_ variant that takes only data. (This is the same pattern Eigen uses for sparse matrix factorizations.)

@seantalts Would it be possible for the new compiler to find these things and just execute them once even though it would be in the model block? Because when this is declared in the parameter block, the pattern is data, while the "values" are vars (Does this question make sense?)

A thing that would need to be avoided is

// Declare B
sparse_matrix A = foo_sparse(B);
A = bar_sparse(B);

if foo and bar return different patterns. This means that the compound declaration has to be "special" (ie call the foo_sparse_pattern_ function to declare the sparse matrix and then call foo_sparse to get the values), while the ordinary bar_sparse call doesn't call bar_sparse_pattern_

### Keeping Permutation Matrix from Cholesky

`SimplicalCholeskybase` keeps the permutation matrix, when the user does a cholesky decomposition we can pull out this permutation matrix and keep it to use in the next iteration. We do this through `EIGEN_SPARSEMATRIX_BASE_PLUGIN`, adding the permutation matrix to the input matrix. This adds a bit of state, but assuming the sparse matrices are fixed in size and sparsity this should be fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seantalts

Simplical?
Also we might want to think about the Math library as being stateless and thread state through everywhere with the compiler going forward... can talk this over in person, not sure it's fully formed in my head yet.

@SteveBronder

Simplical?

That's the words Eigen uses

Also we might want to think about the Math library as being stateless and thread state through everywhere with the compiler going forward... can talk this over in person, not sure it's fully formed in my head yet.

Yeah I worry about adding state like this. Another options is to do the sorting immediately when a sparse matrix is created

@dpsimpson

We could assume that every square sparse matrix is going to be decomposed and just keep a permutation around at all times. Eigen's AMD re-ordering works regardless of symmetry so it's all good. (It's also cheap to compute relative to everything else.) This would make it a deterministic function of the data that is stored in the same object as the rest of the sparse matrix (assuming we make a simple container class around Eigen::SparseMatrix) so that should keep things stateless and thread safe


Tensorflow uses the same data storage schema inside of [`tf.sparse.SparseTensor`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/sparse/SparseTensor) with a limited amount of specific [methods](https://www.tensorflow.org/api_docs/python/tf/sparse). It does not seem that they have [Sparse Cholesky support](https://github.com/tensorflow/tensorflow/issues/15910).

It seems like OpenAI has methods like matrix multiplication for block sparse matrices (Sparse matrices with dense sub-blocks) in tensorflow available on [github](https://github.com/openai/blocksparse).
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seantalts

Raises a question about our need (or lack thereof) for blocked operations too maybe?

@SteveBronder

yes @dpsimpson has brought this up as well as another person in my lab

@dpsimpson

We don't need blocked operations for sparse matrices. They're only efficient for sparse matrices with dense sub-blocks and we don't have any algorithms that take advantage of that structure. It's a real speed up on some sense (replace level 1 with level 2 and 3 BLAS operations lets you be more cache efficient for example) but it would be a weird thing to target on the first go

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UGA animal breeding have a sparse-dense matrix package YAMS - it’s Fortran though. See slides at http://nce.ads.uga.edu/wiki/lib/exe/fetch.php?media=uga2018_yams.pdf

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat thanks! I'll try to find their code online

(@sparse(nz_rows=nz_rows_ind, nz_cols=nz_cols_ind), @opencl) Matrix[N, M] A;
```

Another alternative would be to extend the `csr_*` functions, though this requires a lot of recomputation.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpsimpson

The csr_* functions are not good and should go away. Also it's inconsistent to have row oriented sparse-matrices and column oriented dense matrices.


# Appendix: (Eigen Sparse Matrix formats)

From the Eigen [Sparse Matrix Docs](https://eigen.tuxfamily.org/dox/group__TutorialSparse.html), Eigen matrices use a Compressed Column Scheme (CCS) to represent sparse matrices.
Copy link
Collaborator Author

@SteveBronder SteveBronder Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpsimpson

One very important but minimally documented part of Eigen's sparse libraries is that it tries to remove "lucky" zeros (aka the symbolic step computes the structural non-zeros of A*B, but some of the elements that could be non-zero end up being zero when the product is done. These are the "lucky" zeros that eigen removes). This is very bad for us as it changes the dimension of the resulting std::vector. The key function in Eigen is prune and a quick grep through the library looks like it's only used for matrix-matrix products. So this is a danger zone!

@SteveBronder

I'm looking at godbolt and it looks like Eigen keeps the sparse matrix values for multiplication even when they are zero?

https://godbolt.org/z/JJsyp0

Which is odd because looking at their unary_evaluator it looks like you are right and they do a pruning mechanism?

https://github.com/eigenteam/eigen-git-mirror/blob/c399f06d3f3a77b5bd2a74c11e635e4952b72a4b/Eigen/src/SparseCore/SparseProduct.h#L139

It looks like the pruning happens here. Maybe we can fiddle the value of epsilon to turn off pruning?

@SteveBronder SteveBronder changed the title Spec/sparse matrices Sparse Matrices Aug 26, 2019
@seantalts
Copy link
Member

I think the doc could use a little discussion on whether sparse matrices deserve their own type in Stan. Some starter points off the top of my head:
pros:

  1. can add functions dealing with sparsity in an adhoc fashion to the math library slowly over time.
  2. can prevent users from doing inefficient things with sparse matrices, like using it in a function that would just convert it to dense first.

cons:

  1. Users don't necessarily care that their matrices are stored in sparse or dense form; they want efficient inference. I think there's some wisdom in trying to keep a 1-to-1 correspondence between a Stan model and the mathematical representation of it, if we can keep computational issues behind the silk screen.
  2. I think everyone universally regrets adding both real[] and vector to Stan, which is a pretty similar use-case in that both are representing the same mathematical object just with slightly different computational properties. Keeping users from doing linear algebra operations on real[] doesn't seem to have served any real purpose.

The "what do users want" question difficult to test because anyone who is already aware of sparse matrices will probably consider them necessary as first-class objects, but there's a whole class of Stan user who would just like their model to run faster and wouldn't care if we switched to sparse under the hood. I wonder if @lauren can offer advice here?

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented Aug 28, 2019 via email

@dpsimpson
Copy link

dpsimpson commented Aug 28, 2019 via email

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented Aug 28, 2019

Not sure why any function would cast a sparse matrix to a different type
automatically rather than just throw a “not defined for type sparse”
warning.

The motivation would be to make sure user programs with the right math don't break because they got the data types wrong. We see that all the time now because our users aren't programmers.

Of course, if we have a function that takes a sparse matrix in and then outputs a dense matrix, that's going to be considered broken by most people who use sparse matrices.

No earthly idea how to “pretend” sparse matrices are just matrices that are
different in hidden under the hood ways. Neither matlab or R does this.

No pretending necessary. Mathematically, a sparse matrix is just a dense matrix with a lot of zeros (no surprise to anyone there, I hope). So the user just treats everything as a matrix (except, perhaps, during I/O) and leaves the system to figure out where things should be dense or sparse under the hood. There's a question of feasibility, but we're not violating any laws of computation here.

R doesn't have a built-in sparse matrix type (though there is a contributed Matrix package with sparsity).

MATLAB does have built-in sparsity and uses a separate type system.

https://www.mathworks.com/help/matlab/sparse-matrices.html

Under the hood, it just represents dense matrix types as an array and there's no specific vector or row-vector types, just matrix types.

https://www.mathworks.com/help/matlab/matrices-and-arrays.html

Maybe we should've done that. I really wanted to have row vector times vector to be a scalar and really wanted matrix times vector to be a vector. In MATLAB, everything's a matrix and they just use vector constructors. In MATLAB, [1 2 3] is a 1 x 3 matrix (i.e., a row vector) whereas [1; 2; 3] is a 3 x 1 matrix and [1 2; 3 4] is a 2 x 2 matrix.

Consider what R does for integer vs. real vs. boolean distinctions:

> is.integer(1)
[1] FALSE

> is.integer(as.integer(1))
[1] TRUE

I find this kind of behavior in R very confusing. Same with transposing a vector, which turns it into a matrix.

> c(1, 2, 3)
[1] 1 2 3

> t(c(1, 2, 3))
     [,1] [,2] [,3]
[1,]    1    2    3

It gets really fun trying to predict the result of mixed operations

> t(t(c(1, 2, 3))) == c(1, 2, 3)
     [,1]
[1,] TRUE
[2,] TRUE
[3,] TRUE

I really don't like that t(t(x)) != x.

@betanalpha
Copy link

betanalpha commented Aug 28, 2019 via email

@dpsimpson
Copy link

dpsimpson commented Aug 28, 2019 via email

@dpsimpson
Copy link

This would be extremely bad design. It should throw and informative error. Otherwise we’ll spend our whole time explaining to people why they added one line that had an implicit cast and their code no longer worked because it ran out of memory.

Just to be concrete, the model Mitzi used in the ICAR case study uses an extremely small sparse matrix. A cast to dense would cast from a vector of 2055 elements to a vector (flattened column major matrix) of 709 * 709 = 502681 elements. So this is very different to casting from real[] to vector[], which are roughly the same size in memory (modulo possible locality stuff and maybe some small class overhead).

@dpsimpson
Copy link

One that tops to my mind is incomplete implementation of sparse functionality. For example if I wanted to compute a matrix exponential but no one has yet implemented a sparse matrix exponential then can I no longer proceed or can I cast my sparse matrix to a dense one and proceed, albeit with the more expensive calculation.

There's no way to compute a sparse matrix exponential and get a sparse matrix out (because maths). So that would be easy to specialize.

@betanalpha
Copy link

betanalpha commented Aug 28, 2019 via email

@dpsimpson
Copy link

dpsimpson commented Aug 28, 2019 via email

@seantalts
Copy link
Member

Very cool, thank you all for the discussion! My one request for this PR would be to capture a summary of this in the "Rationale and Alternatives" heading.

Another possible point for discussion - I think adding GPU matrix support to the Stan language is almost exactly the same, modulo some degree of difference on the break-even matrix size where converting to a dense [cpu] matrix is so inefficient as to be impossible and should be outlawed. Meaning that in GPU calculation land I think it will actually be somewhat common to want to go back and forth as we flesh out the GPU-aware implementations. So I might propose that we try to treat those the same way, meaning if we go with the full new-types approach we'd have all new cl(_cholesky_factor_corr | _cov | cholesky_factor_cov | ... )_matrix types as well (luckily someone said you'd never want to put a sparse matrix on a GPU or we'd have that combinatoric explosion as well).

@betanalpha
Copy link

betanalpha commented Aug 28, 2019 via email

@seantalts
Copy link
Member

Also curious about the ICAR model - is that a good representative example? It seems to just want to store an adjacency matrix efficiently, but otherwise to not use any sparse computation at all...? I thought someone at StanCon had said we would want to completely disallow element-wise indexing into a sparse matrix, which would appear to disallow this use-case... I am assuming I'm confused here and misheard probably multiple parts of this so please correct me with exuberance.

I understand the technical utility
of keeping a sparse type completely encapsulated in its own
world, but those not as familiar with sparse linear algebra as
yourself will need something to guide them beyond the
interpretational baggage they bring with them, for example
very careful compiler error messages, extensive documentation,
etc.

100% agreed - I still don't really get why you would disallow to_dense in the type system instead of just throwing a runtime warning like we do for deepcopies and other inefficient stuff. But I'm happy to defer to folks who actually write models that need sparse computation. I still dream of a world in which folks don't even need to learn about sparse matrix computations in order to experience their benefit, but perhaps that's a Stan 3 (or 4) thing.

{'col': 1, 'row': 20, 'val': 1.85},
{'col': 11, 'row': 3, 'val': 2.71},
...
}
Copy link
Member

@rok-cesnovar rok-cesnovar Aug 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need to be nit-picking here, but this should be written as

X = [
  ...
  {'col': 1, 'row': 20, 'val': 1.85},
  {'col': 11, 'row': 3, 'val': 2.71},
  ...
]

Currently JSON objects are only supported on the top level (Cmdstan at least), so we will need to remove that restriction.


## I/O

Data can be read in from json via a list of lists and the Rdump from the equivalent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also support that the input is in a dense matrix form in the input file but then the parser & data handler automatically constructs the sparse matrix, ignoring zeroes? We would need to somehow inform the parser to ignore zeroes during parsing, otherwise we would first create the dense matrix and then transform which defeats the purpose of introducing sparsity..

@dpsimpson
Copy link

dpsimpson commented Aug 29, 2019 via email

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented Aug 30, 2019 via email

@seantalts
Copy link
Member

Okay, to try to corral some of the conversation back, it seems like folks basically want to go with what our potential sparse matrix users would like to see in Stan's type system here, and so far the only person like that we have commenting is @dpsimpson. Dan's proposal is that we should have a separate type for every kind of matrix that could also be sparse and that we would only allow efficient operations to be allowed on them in the type system. I think we want to add some clarification for which types and operations those actually are concretely. Is it enough to add a single sparse_matrix type to Stan with the operations listed here? That's what's in this proposal now, just double checking that's still kosher.

If so, I think we need to change the design doc to add a different model as an exemplar model; the ICAR model would not benefit from sparse matrices as described in this proposal as we wouldn't be allowed to do element-wise access. Does anyone have another model that would benefit from the design in this proposal they can contribute?

@seantalts
Copy link
Member

Or I may be wrong about ICAR being a bad fit - if there's a more efficient version possible once we have sparse matrices, relevant lines from that would be good to include in this design doc to show the before-and-after.

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented Sep 1, 2019 via email

@dpsimpson
Copy link

dpsimpson commented Sep 1, 2019 via email

@dpsimpson
Copy link

dpsimpson commented Sep 1, 2019 via email

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented Sep 1, 2019 via email

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented Sep 1, 2019 via email

@betanalpha
Copy link

betanalpha commented Sep 1, 2019 via email

@gregorgorjanc
Copy link

The point of Gaussian model with a sparse precision matrix is efficiency - we can fit models that we couldn’t if the “model” would be dense. Think millions of parameters. This is why Dan was so adamant about avoiding casting sparse to dense.

The critical point is conditional independence, which gives sparse structure. Examples are time-series (banded precision), some spatial (CAR with neighbourhood structure, SPDE with some meshing magic), phylogeny (banded precision), pedigree (expanded band) and some other models.

@seantalts
Copy link
Member

Thanks all, and especially @gregorgorjanc for joining the conversation and explaining more of the background for me :)

So we will need to allow element-wise reads after all, is that right? I'm thinking that because that is the most obvious way to code up the ICAR example, but it's also possible we would still want to disallow that and only let users use some new distributions functions we'd be creating (which I guess would be a bunch of our multivariate functions but with sparse_ prepended to the name, to follow the convention; e.g. "sparse_multi_normal`).

I want to also think ahead and think about how we can define more of the Math library in Stan itself (if we can do that efficiently), so if we can think ahead and not put in any constraints that prevent a user from coding their own sparse multivariate distributions in Stan itself that's probably a good forward-looking move in my opinion.

@gregorgorjanc
Copy link

For completeness of discussion, BUGS “language” specifies Gaussian distribution with a precision parameter. I believe this was due to the natural/canonical parametrisation of the Gaussian and the conditional (graph) model structure embedded implicitly in the “language”.

@dpsimpson
Copy link

dpsimpson commented Sep 1, 2019 via email

@seantalts
Copy link
Member

Ok cool! If people can build their own I'm happy with the design. Would someone mind adding what the ICAR model would look like with sparse matrices to the design doc as an example?

@seantalts
Copy link
Member

(Or post it as a comment in this thread and I can add it to the PR for the doc)

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented Sep 1, 2019 via email

@seantalts
Copy link
Member

So I think this is just waiting on @SteveBronder for:

  1. A before-and-after example illustrating what the new syntax will provide, and
  2. confirmation from Aki that we will need to represent the sparsity structure of a matrix in the Stan program (because parameters will otherwise have no way of knowing how to construct themselves).

I think this is ready to go as-is after that.

@dpsimpson
Copy link

dpsimpson commented Sep 16, 2019 via email

@seantalts
Copy link
Member

Are you guys all together there? Want to finish this design doc up? Just need that before-and-after example assuming we're not changing the syntax.

@SteveBronder
Copy link
Collaborator Author

Are you guys all together there? Want to finish this design doc up? Just need that before-and-after example assuming we're not changing the syntax.

Apologies! In Paris until Monday. I'll update this on Tuesday, ask Aki about (2) and get on the phone with Dan

@dpsimpson
Copy link

dpsimpson commented Sep 26, 2019

// Define real a,b,c;
// Define sparse_matix A,B,C,D; (different sparsity pattern)
// We need 
sparse_matrix E = a*A + b*B + C*D;
E = 4*F;  // ERROR IF F is not the same sparsity

sparse_matrix G = rbind(A,B); 

…notes:

- When we declare a sparse matrix the sparsity structure should be fixed.
- How to declare a sparse matrix without assigning a value
- Need to make our own sparse matrix that holds the sparsity pattern
- No explicit indexing for wip
- Sparse matrix in data block, not params or transformed params block
- Do ordering for all square matrices.
@SteveBronder
Copy link
Collaborator Author

Sorry for my radio silence, I've been pretty deep in the compiiler. I just updated this with @seantalts notes above and a recent conversation with Aki and Dan. You can see those changes here and the notes from the meeting in the commit message of a938b45

@bob-carpenter
Copy link
Collaborator

Overall

This doc should be self-contained and include any necessary material from earlier proposals rather than linking. Reference links at the end are really nice.

Summary

the matrices -> a matrix's

The exclusion of sparse vectors from the initial spec should be mentioned up front.

Motivation

ICAR does not use sparse matrices. A lot of big data regressions in fields like natural lanaguage processing are sparse.

fan-dangling -> finagling (?), index fiddling (?)

Can we really support all existing methods for matrix types? Does that mean all the mixed type operations, too such as sparse matrix times dense matrix? Is there a useful way to roll this out more incrementally?

Does all existing methods extend to types? That is, will we have a sparse simplex or sparse positive definite matrix (unfortunately called "covariance" because I was young and didn't know better---it should be called "spd" for symmetric-positive-definite).

I think the list here should just be things that wouldn't be included under "all existing methods".

Given the spec to support all existing functions (methods are technically non-static class functions), this list should just be for new things not already in the library.

Is return always sparse unless one of the arguments is dense? If so, we can just say that.

And there needs to be more description for the new operations:

  • fill-reducing reordering
  • required elements of the inverse
  • operations to move from sparse to dense and vice versa
  • specializations for some log pdfs and pmfs

likelihood -> density (our lpdf and lpmf can also be used in other contexts than just likelihoods, such as in priors)

Notes on implementation (like the reverse-mode derivative of Cholesky) can go in an implementation notes section below the spec. I think this spec should be more functional in terms of what things will look like for the user.

Guide-level

Move citation of previous specs to the references section at the end.

What does no explicit indexing within a sparse matrix mean? If a is a sparse matrix, does that mean a[i, j] isn't legal as either an rvalue or lvalue?

Sparse matrices dimensions -> Sparse matrix size specifications

Note: Stan separates dimension an size specifications in that function arguments are dimensioned but not sized; try to use "rows" and "columns" or "size" rather than "dimension" because "dimension" is ambiguous in the context of a spec (a vector is a one-dimensional data structure, but a vector of size K is used to represent a point in K-dimensional space).

Could you elaborate on what it means to order a square sparse matrix after initial construction. Is that a new function? Something done automatically under the hood?

The enumeration of blocks needs to include function arguments. In Stan, the function arguments are not sized, whereas block and local arguments are. Similarly, there are constrained types which only show up as block variables.

I'd be extra explicit in the example that nz_row_alt is different than nz_row, because it should be legal if both sparsity patterns are the same even if they're not specified by the same variable or expression.

Data

The overall description of coordinate list notation should be pulled up front, with the example. The col/row label is reversed.

This data section can then stick to how sparse matrices are used in the data block.

What is proper IO? There needs to be a spec for the data formats in both JSON and rdump at this point.

If vectors are not in the initial implementation, please pull them out into a section of things to do later or otherwise indicate how things are going to be staged.

We've thought about having sizes also be read from the data structure directly, but haven't done that anywhere. That's not to say we shouldn't do it here.

The example should have

int nz_row_ind[K] = { 2, 3, ...., 5 };

to be well-formed Stan code. But the example needs to move down to the transformed data section where it's relevant because no definitions are allowed in the data block.

What type are the values that are set? Are those in the form of a dense vector or an array of reals?

Transformed data

The Stan code for actually building up nz_row_ind, etc. should be moved down here.

The to_sparse_matrix function should be specified directly. You don't need to say much as it should be clear to everyone what it does.

Linear Algebra -> linear algebra (capitalize names, not concepts)

Does the linear algebra example imply that we can use sparse_matrix in the data block with no sizing whatsoever. Or should that be sparse_matrix[N, N]?

There's nothing in the bounds, just <>?

The sense in which the sparsity pattern is a matter for the math library is that the sparsity pattern checks have to be run time, not compile time. That still leaves them in-scope for this spec as it has to cover both the math lib part and the languiage.

Parameters, ...

Break these three into their own sections.

Parameters must specify sizes in their declarations---there's no way to deduce them. They must have th same sparsity pattern in each iteration, too. Mention that an alternative parameter declaration is just the values, with the sparse matrix being created elsewhere (model or transformed parameters).

Transformed parameters and generated quantities should presumably work the same way as transformed data, with the caveat noted that the size/sparsity-pattern not change iteration to iteration. The way to say that is that they have to be data-only variables, which is already a requirement for sizes in these (and the parameters) block.

If you want to add historical notes, put them later.

There are no temporary scopes in the parameters block. I don't see why we couldn't use them as parameters given your spec. If they are not going to be allowed as parameters, that needs to be clearly indicated. And I think it would cause a lot of user confusion.

Helper functions

These should be laid out explicitly. Propose the ones we'll start with and then we can add more later. The point is that this spec should be as specific as possible.

Reference-level explanation

The JSON I/O is user level.

It would be much more compact to use three parallel arays in JSON rather than breaking it down by value this way. That also matches the way it's declared, so that's probably not such a hardship for users. Same in Rdump. So how about JSON like

{
...
  "X" : {
    "row" : { 20, 3, ... },
    "col" : { 1, 11, ... },
    "val" : { 1.85, 2.71, ... }
  }
...
}

where the row and col values are integers and the val floats (this needs to be in the spec even though JSON does not distinguish numerical types). Numbers will be represented as elsewhere (there's some complication around NaN and inf, but that's not this spec's responsibility).

And then in Rdump:

...
X <- list(row = c(20, 3, ...),
          col = c(1, 11, ...),
	  val = c(1.85, 2.71, ...))
...

"Something like what Ben did [here]" needs to be unfolded so the spec is self-contained and precise about what's being proposed.

The implementation details aren't so important and can be saved for later. If they're included, outerIndex and innerIndices and nnz variables need to be explained.

This spec doesn't need implementation details down to the for-loop level.

Stan Math

Templating should be described.

In the hard way, The "coefficient access error Dan saw" needs to be described. Is the proposal to use A.coeff() everywhere in the code to make this work?

The simple way should be at same level of heading.

Those specializations for spare matrices need to be careful to use Eigen's mixed types properly. I hadn't realized we were doing that and could use it for addition.

These implementations should all move for reverse mode into the adjoint-partial approach which will dramatically cut down on stack funtions and virtual function calls over the way the examples are coded here by delegating to operator+ in Eigen.

Keeping Permutation Matrix

What does "this adds a bit of state" mean? That sounds scary as the entire math library is assumed to be stateless as is the generated Stan model code.

Drawbacks

I'd remove "code smell" and just list the drawback---everything's going to get more complicated on the template and code duplication front as well as in the parser as this spec has us overloading more heavily.

Rationale

Why did this get consensus? Is it the close connection to Eigen? The simpler long-form inputs? Both are positives here, I think. We definitely don't want to be innovating in the sparse matrix space!

I strongly prefer the type name to attributes. Once you have sparsity, you are normally having to deal with it both computationally and statistically.

Prior art

The prior art I'd look for would be in R, Python (NumPy/pandas), and MATLAB. Any idea what those do?

Unresolved questions

That first bullet point should be removed---it's instructions, not an unresolved question.

Language design is not an unresolved question. What needs to be described more fully is how type inference is going to work and how assignment is going to work. Can we assign sparse matrices to dense matrices, for example? Is any indexing allowed? These absolutely must be answered and not left unresolved before going forward with this.

The new Stan compiler is just the implementation of this. I don't think we need a lot of details on that in this spec unless there's some potential pitfall that's not obvious.

The whole theory of MCMC changes when you allow different numbers of parameters per iteration. For example, what's it mean to converge? We can leave that can of worms closed for now.

Cost not being worth it is the overall question for the spec, not an unresolved question.

You can leave it to reviewers to chime in on prior art.

Eigen

So our constructors are going to require conversion back and forth to the inner representation of Eigen. That seems like it should be listed under drawbacks.

- Addition (returns sparse matrix, possibly with different sparsity structure)
- Sparse matrix transpose (returns sparse matrix, different sparsity structure)
- Sparse matrix-vector multiplication (returns dense vector)
- Sparse matrix-constant multiplication (returns sparse matrix, same sparsity)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "constant" correct work here? Should it be "real" or "scalar"?


- Addition (returns sparse matrix, possibly with different sparsity structure)
- Sparse matrix transpose (returns sparse matrix, different sparsity structure)
- Sparse matrix-vector multiplication (returns dense vector)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later there is "Sparse matrix-dense matrix", should this be then also "Sparse matrix-dense vector" or do we assume that if sparse is not mentioned then it is dense.

- Sparse inner product and quadratic form (returns scalars)
- Operations to move from sparse to dense matrices and vice versa.
- Fill-reducing reorderings
- A sparse Cholesky for a matrix of doubles.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which models would benefit from additional sparse QR and sparse SVD?

int nz_row_ind[K]; // Non-empty row positions
int nz_col_ind[K]; // Non-empty col positions

sparse_matrix[nz_row_ind, nz_col_ind, N, M] A

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing ;

data {
int N; // Rows
int M; // Cols
sparse_matrix[N, M] A

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing ;

Parameters can be defined as above for data or deduced from the output of other functions.

```stan
parameters {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment that currently we don't know a use case where sparse matrix would be defined as a parameter. It would be common to declare in transformed parameters block a sparse matrix which depends on few lower dimensional parameters.

}
transformed parameters {
// Non-zero elements are deduced by the operation on x
sparse_matrix[N, N] K = cov_exp_quad(x, alpha, rho);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cov_exp_quad is a bad example as it will always produce a dense matrix. There are some covariance functions which produce sparse covariance, but they are not useful in Stan as the sparsity structure depends on length scale parameter. Better example could be e.g.

sparse_matrix[N, M] P = A * alpha;

matrix[K, 3] = get_nz_elements(x);
```

There can also be methods to bind sparse matrices together by rows or columns

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can also be methods -> There has to be methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants