-
-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add arena_matrix specialization for sparse matrices #2971
Conversation
What is the "sparse matrix impl for vari"? Is this something that's exposed to users anywhere or used elsewhere in our code base? Is this work toward adding sparse matrices to Stan? If so, is there a design doc somewhere about the plans? |
The sparse matrix impl for vari is the SoA pattern for autodiff on sparse matrices. There is a design doc below, but it needs updated and does not contain much info on the low level implementation |
Why are you forking the design doc rather than updating? We don't want to have to review two design docs. We've traditionally only reviewed functional specs. My overall suggestion is to work through what some example models would look like, such as a time series model with a sparse Cholesky precision matrix. If you add a pull request for an updated design design doc, I'd be happy to comment. Some quick comments:
|
This is not a fork, this just changes the underlying vari implementation to use a new
Do you mean in the design doc?
Sure, do you mind if after I copy paste your comments about the design doc into the discussion page. That design doc needs heavily updated imo, but either way this PR should go through as this is a current bug and I'm 99% sure this is how the impl would need to look for the design we would approve |
2773df6
to
367ed98
Compare
@SteveBronder, I'm guessing this is still an issue. Do you remember what prevented the tests from passing? |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few q's and suggestions, otherwise looks good
* @param other Eigen Sparse Matrix class | ||
*/ | ||
template <typename T, require_same_t<T, PlainObject>* = nullptr> | ||
arena_matrix(T&& other) // NOLINT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like any of these constructors uses the innerNonZerosPtr
member from the inputs, is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I think I left them out because elsewhere in the Eigen docs it said it was just for compatibility with other packages. But I'll put them in just in case. No reason not to
for (; static_cast<bool>(it) && static_cast<bool>(iz); ++it, ++iz) { | ||
f(it.valueRef(), iz.value()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this (and the other overloads) just loop over (*this).innerSize()
? The compiler might be able to optimise better if it knows the length.
Otherwise, I think this is more readable as a while
loop:
while (static_cast<bool>(it) && static_cast<bool>(iz)) {
f(it.valueRef(), iz.value());
++it;
++iz;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! So actually since we know for the sparse arena matrix the pointer for the values is contiguous from our memory arena we can actually just loop over the value pointer itself with the number of nonzero entries. At least for the scalar and arena matrix inplace ops. For general sparse and dense idt we have that guarantee so we should still do something like the while loop you have in the above
for (int k = 0; k < (*this).outerSize(); ++k) { | ||
typename Base::InnerIterator it(*this, k); | ||
for (; static_cast<bool>(it); ++it) { | ||
f(it.valueRef(), x(it.row(), it.col())); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these operations something that could be implemented using Eigen's NullaryExpr
framework? That could allow more room for Eigen to optimise/simplify ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NullaryExpr
is only available for dense types :/
If we wanted to optimize this harder we could write specific operator+=
for each of sparse, dense, and scalar ops that under the hood will make an Eigen map to a dense eigen vector of the values. The current inplace ops for two sparse arena matrices is
template <typename F, typename Expr,
require_convertible_t<Expr&, MatrixType>* = nullptr,
require_same_t<Expr, arena_matrix<MatrixType>>* = nullptr>
inline void inplace_ops_impl(F&& f, Expr&& other) {
auto&& x = to_ref(other);
auto* val_ptr = (*this).valuePtr();
auto* x_val_ptr = x.valuePtr();
const auto non_zeros = (*this).nonZeros();
for (Eigen::Index i = 0; i < non_zeros; ++i) {
f(val_ptr[i], x_val_ptr[i]);
}
}
But we could instead for +=
do
template <typename F, typename Expr,
require_convertible_t<Expr&, MatrixType>* = nullptr,
require_same_t<Expr, arena_matrix<MatrixType>>* = nullptr>
inline void operator+=(F&& f, Expr&& other) {
auto&& x = to_ref(other);
auto* val_ptr = (*this).valuePtr();
auto* x_val_ptr = x.valuePtr();
const auto non_zeros = (*this).nonZeros();
Eigen::Map<MatrixType>(val_ptr, non_zeros) += Eigen::Map<MatrixType>(x_val_ptr, non_zeros);
}
That would allow Eigen to optimize a lot more, but at the expense of losing a bit of generalization from just having inplace_ops_impl
. If you'd like the above then I'm fine with the extra code imo
stan/math/rev/core/arena_matrix.hpp
Outdated
* @note Caution! Inplace operators assume that either | ||
* 1. The right hand side sparse matrix has the same sparcity pattern | ||
* 2. You only intend to add a scalar or dense matrix coefficients to the | ||
* nonzero values of `this` This is intended to be used within the reverse | ||
* pass for accumulation of the adjoint and is built as such. Any other use | ||
* case should be be sure the above assumptions are satisfied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @note Caution! Inplace operators assume that either | |
* 1. The right hand side sparse matrix has the same sparcity pattern | |
* 2. You only intend to add a scalar or dense matrix coefficients to the | |
* nonzero values of `this` This is intended to be used within the reverse | |
* pass for accumulation of the adjoint and is built as such. Any other use | |
* case should be be sure the above assumptions are satisfied. | |
* @note Caution! Inplace operators assume that either | |
* 1. The right hand side sparse matrix has the same sparsity pattern | |
* 2. You only intend to add a scalar or dense matrix coefficients to the | |
* nonzero values of `this`. This is intended to be used within the reverse | |
* pass for accumulation of the adjoint and is built as such. Any other use | |
* case should be sure that the above assumptions are satisfied. |
stan/math/rev/core/arena_matrix.hpp
Outdated
* @note Caution!! Inplace operators assume that either | ||
* 1. The right hand side sparse matrix has the same sparcity pattern | ||
* 2. You only intend to add a scalar or dense matrix coefficients to the | ||
* nonzero values of `this` This is intended to be used within the reverse | ||
* pass for accumulation of the adjoint and is built as such. Any other use | ||
* case should be be sure the above assumptions are satisfied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @note Caution!! Inplace operators assume that either | |
* 1. The right hand side sparse matrix has the same sparcity pattern | |
* 2. You only intend to add a scalar or dense matrix coefficients to the | |
* nonzero values of `this` This is intended to be used within the reverse | |
* pass for accumulation of the adjoint and is built as such. Any other use | |
* case should be be sure the above assumptions are satisfied. | |
* @note Caution! Inplace operators assume that either | |
* 1. The right hand side sparse matrix has the same sparsity pattern | |
* 2. You only intend to add a scalar or dense matrix coefficients to the | |
* nonzero values of `this`. This is intended to be used within the reverse | |
* pass for accumulation of the adjoint and is built as such. Any other use | |
* case should be sure that the above assumptions are satisfied. |
|
||
/** | ||
* Equivalent to `Eigen::Matrix`, except that the data is stored on AD stack. | ||
* That makes these objects triviali destructible and usable in `vari`s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* That makes these objects triviali destructible and usable in `vari`s. | |
* That makes these objects trivially destructible and usable in `vari`s. |
* @tparam MatrixType Eigen matrix type this works as (`MatrixXd`, `VectorXd` | ||
* ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @tparam MatrixType Eigen matrix type this works as (`MatrixXd`, `VectorXd` | |
* ...) | |
* @tparam MatrixType Eigen matrix type this works as (`MatrixXd`, `VectorXd`, | |
* ...) |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks!
Summary
While working on a branch to let Stan math use ctest and cmake I found a memory leak in the sparse matrix impl for vari. It was a confusing bug that had to deal with how Eigen's implimentation manages dynamic memory for sparse matrices. This PR removes the use of
chainable_alloc
and directly holding sparse matrices in thevar_value<SparseMatrix>
class and instead uses a new specialization ofarena_matrix
for sparse matrices.Tests
Tests were added for the new
arena_matrix
's constructor, assignment operators, and inplace operators and can be run with.Side Effects
Yes! A big design decision here is to add inplace operators
+=
and-=
to the sparse implementation ofarena_matrix
. Normally,+=
and-=
do not really make sense for sparse matrices since adding a scalar to all elements of a sparse matrix would just make it a dense matrix. But for Stan math, when we do the adjoint accumulation in reverse mode, we only want to accumulate over the nonzero values of the adjoints. Talking to @bob-carpenter about this, we could think of a sparse matrix as having a mix ofdouble
andvar
scalars, where everywhere we see a 0 there would be adouble
and everywhere else there would be avar
. The little vector below is what I mean with this interpretation, where we have doubles for 0 values and vars for nonzero valuesIf we do
cos(A)
, sincecos(0) = 1
, we end up with a dense matrix of vars like the followingWe want a dense output of vars because later series of expressions may use those scalar values of the output matrix.
When we call grad for reverse mode, we are taking the adjoint (gradient) of the output with respect to the input. For the zero values of the sparse matrix our adjoint is with respect to a constant which means we don't want to propagate the associated values in our return matrix upwards. So we want our
operator+=(Sparse this, Dense rhs)
to ignore any zero values when we do the adjoint accumulation. This means that when we usearena_matrix<SparseMatrix>
in our vari implementation we end up with code for the reverse pass that nicely looks like all the rest of our codeSo while it's nonstandard for a sparse matrix to have a
+=
and-=
defined in it, for our use case it actually makes a lot of sense.Release notes
Adds a sparse matrix implimentation for
arena_matrix
Checklist
Copyright holder: Simons Foundation
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested