-
-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow arena_matrix to use move semantics #2928
Allow arena_matrix to use move semantics #2928
Conversation
…e-move-semantics' into feature/reverse-mode-move-semantics
…e-move-semantics' into feature/reverse-mode-move-semantics
…e-move-semantics' into feature/reverse-mode-move-semantics
…de-move-semantics
…-dev/math into feature/reverse-mode-move-semantics
@t4c1 sorry but would you mind taking a quick glance at this? @andrjohns would you feel comfortable reviewing this? |
stan/math/prim/prob/normal_log.hpp
Outdated
return normal_lpdf<propto, T_y, T_loc, T_scale>(y, mu, sigma); | ||
inline return_type_t<T_y, T_loc, T_scale> normal_log(T_y&& y, T_loc&& mu, | ||
T_scale&& sigma) { | ||
return normal_lpdf<propto>(y, mu, sigma); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming you want to use perfect forwarding here. That would be:
return normal_lpdf<propto>(y, mu, sigma); | |
return normal_lpdf<propto>(std::forward<T_y>(y), std::forward<T_loc>(mu), std::forward<T_scale>(sigma)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thanks! The normal distribution changes are mostly just to show what the new signatures would look like. But I'd like to make them so that we can forward them into operands_and_partials
.
@andrjohns would you mind taking a look at this? |
@SteveBronder, this is rad! I'm concerned with the impact it has on our use of Just to be absolutely clear, this is a "concern" and it is not blocking in my mind. It seems like we should do this; I just want to be cognizant of the effort to get this properly done before embarking on it. |
Fix csr_matrix_times_vector linker error
@SteveBronder, I'm going to edit the PR to have the base branch be |
Add support for Windows ARM64
Thanks! I'll update the merge conflicts tmrw |
use a seperate class for csr_matrix adjoint
Don't set build and clean rules for SUNDIALS if external libs are used
…de-move-semantics
@syclik @andrjohns fixed the merge conflict, all good for me to merge to 5.0? |
Yep, all good from me |
I’m still reading it. I was on it and it was taking a bit more time to
really get it done.
I’ll make sure to read through it carefully.
…On Fri, Apr 26, 2024 at 2:47 PM Andrew Johnson ***@***.***> wrote:
@syclik <https://github.com/syclik> @andrjohns
<https://github.com/andrjohns> fixed the merge conflict, all good for me
to merge to 5.0?
Yep, all good from me
—
Reply to this email directly, view it on GitHub
<#2928 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADH6F2C3GHORBAMX5GBDE3Y7KOMNAVCNFSM6AAAAAA3BYBQHSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZZHE2DQNBWGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@@ -179,6 +177,93 @@ We can see in the above that the standard style of a move (the constructor takin | |||
But in Stan, particularly for reverse mode, we need to keep memory around even if it's only temporary for when we call the gradient calculations in the reverse pass. | |||
And since memory for reverse mode is stored in our arena allocator no copying happens in the first place. | |||
|
|||
Functions for Stan math's reverse mode autodiff should use [_perfect forwarding_](https://drewcampbell92.medium.com/understanding-move-semantics-and-perfect-forwarding-part-3-65575d523ff8) arguments. This arguments are a template parameter with `&&` next to them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar. Not sure exactly what should be used, but "This arguments" is off.
Maybe "Template arguments that allow perfect forwarding"?
} | ||
``` | ||
|
||
A perfect forwarding argument of a function accepts any reference type as it's input argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "it's" --> "its"
|
||
The `std::forward<T>` in the in the code above tells the compiler that if `T` is deduced to be an rvalue reference (such as `Eigen::MatrixXd&&`), then it should be moved to `my_other_function`, where there it can possibly use another objects move constructor to reuse memory. | ||
|
||
In Stan, perfect forwarding is used in reverse mode functions which can accept an Eigen matrix type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest putting this paragraph above L190. That way the words describe the perfect forwarding and then the expansion of what it does is written out after it.
As written, this sentence doesn't quite make sense here: the code above doesn't have std::forward
.
|
||
The use of auto with the Stan Math library should be used with care, like in [Eigen](https://eigen.tuxfamily.org/dox/TopicPitfalls.html). | ||
Along with the cautions mentioned in the Eigen docs, there are also memory considerations when using reverse mode automatic differentiation. | ||
When returning from a function in the Stan math library with an Eigen matrix output with a scalar `var` type, the actual returned type will often be an `arena_matrix<Eigen::Matrix<...>>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize "Stan Math"
Along with the cautions mentioned in the Eigen docs, there are also memory considerations when using reverse mode automatic differentiation. | ||
When returning from a function in the Stan math library with an Eigen matrix output with a scalar `var` type, the actual returned type will often be an `arena_matrix<Eigen::Matrix<...>>`. | ||
The `arena_matrix` class is an Eigen matrix where the underlying array of memory is located in Stan's memory arena. | ||
The `arena_matrix` that is returned by Stan functions is normally the same one resting in the callback used to calculate gradients in the reverse pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change "Stan functions" to "Math functions"
The `arena_matrix` that is returned by Stan functions is normally the same one resting in the callback used to calculate gradients in the reverse pass. | ||
Directly changing the elements of this matrix would also change the memory the reverse pass callback sees which would result in incorrect calculations. | ||
|
||
The simple solution to this is that when you use a math library function that returns a matrix and then want to assign to any of the individual elements of the matrix, assign to an actual Eigen matrix type instead of using auto. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"math library" -> "Math library"
/*! @tparam T the type to check */ | ||
template <typename T> | ||
using require_not_arena_matrix_t | ||
= require_t<bool_constant<!is_arena_matrix<std::decay_t<T>>::value>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this defined with bool_constant
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require_t
wants back a struct with a bool value
member so we wrap the !struct<T>::value
in bool constant for that.
We should really just make a negate<>
type trait so the above can become
require_t<negate<is_arena_matrix<std::decay_t<T>>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'd like to do that in a separate PR
@@ -49,25 +49,91 @@ class arena_matrix<MatrixType, require_eigen_dense_base_t<MatrixType>> | |||
ChainableStack::instance_->memalloc_.alloc_array<Scalar>(size), | |||
size) {} | |||
|
|||
private: | |||
template <typename T> | |||
constexpr auto get_rows(const T& x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two functions are a bit hard to understand. Could you add doxygen doc to explain what the intent is here? It's not coming through directly through code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two function definitions are identical. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two function definitions are identical. Is that intentional?
One gets the rows in case the conditional is true and the other gets the cols if the conditional is true. And vice versa for false
@@ -280,7 +330,7 @@ class arena_matrix<MatrixType, require_eigen_sparse_base_t<MatrixType>> | |||
template <typename Expr, | |||
require_not_same_t<Expr, arena_matrix<MatrixType>>* = nullptr> | |||
arena_matrix& operator=(Expr&& expr) { | |||
*this = arena_matrix(std::forward<Expr>(expr)); | |||
new (this) arena_matrix(std::forward<Expr>(expr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind walking me through this? I think I know what it's doing, but I want to double check.
(Nothing too long; even just a couple of sentences.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! This is a good question because it's kind of complicated why we use placement new here.
This signature is for assigning Eigen types that are not arena matrices. This is using placement new. The placement new is new
but instead of calling malloc
to assign new memory it is given a pointer as an argument to assign the memory for the newly constructed object. When we use an arena matrix we are doing something like
arena_matrix<Eigen::MatrixXd> my_arena = x;
So on the lhs the arean matrix is constructed using the constructor with no arguments arena_matrix()
. Then we call the assignment operator here. Since this
is already constructed, inherits from Eigen::Map
and an Eigen::Map
cannot replace the memory it's data points to we use placement new here for the assignment.
Eigen explicitly says to do this in it's documentation for map
Tip: to change the array of data mapped by a Map object, you can use the C++ placement new syntax:
Example:
int data[] = {1,2,3,4,5,6,7,8,9};
Map<RowVectorXi> v(data,4);
cout << "The mapped vector v is: " << v << "\n";
new (&v) Map<RowVectorXi>(data+4,5);
cout << "Now v is: " << v << "\n";
@@ -129,49 +129,49 @@ inline var multiply(const T1& A, const T2& B) { | |||
* @tparam T1 type of the scalar | |||
* @tparam T2 type of the matrix or expression | |||
* | |||
* @param A scalar | |||
* @param a scalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks!
Minor comments on the doc. I'm more curious about some of the code changes. Is there any place where we might have made a mistake by using auto internally? Do we need to sweep through the code base searching for that based on the changes here, or are we ok? |
I ctrl+f'd for |
@syclik and @andrjohns anymore Qs? Else I'm going to merge this to the 5.0 branch |
No, no more questions!! Please merge when you get a chance!!! Thank you! |
Summary
This PR does two things
arena_matrix
to use move semanticsarena_matrix
' s constructor is passed an rvalue, instead of allocating memory onto the stack allocator and then doing a copy,arena_matrix
now pushes that rvalue into achainable_ptr
that sits in the chainable stacksvar_alloc_stack_
and is deleted afterstan::math::recover_memory()
is called. This saves us the allocation and the copy, instead we just delay paying the deletion cost which we would have paid anyway when the temporary was deletedEigen::Matrix
type. The hard copy into the Eigen matrix is to make sure assignments done later to the matrix (i.e.X(1, 1) = a
) do not overwrite the memory necessary for the reverse pass.Instead, I suggest we just do what Eigen does and include docs that tell people that
auto
is unsafe to use recklessly with the stan math library. Instead users who want to do assignment to elements of the matrices should have the assigned object's type be the appropriate Eigen matrix type.For example, in the below code, using
auto
for the assignment from the function and then doing an assignment to an element of the matrix would mess up the reverse mode pass of the function the matrix was generated in. However, assigning the result of the function to an actual matrix type and then doing the element assignment is safe.This is a breaking change so I think we should do a major version bump. In the tests below I've found anywhere from a 5% to 15% speed increase from using this.
The graph below plots the relative % improvement of using move semantics relative to the current develop. We are testing the expression below, which just does a bunch of multiplies, adds, and then a sum.
The variables
x
andy
in the above are matrices of the same size. Only the forward pass of reverse mode autodiff is used in the benchmark since this should not have any effect on the reverse passes performance.We can see that for small sizes this is great. As we get to larger sizes it's still good, but at that point a lot of the actual computation and fetching data for the operations is taking up enough time that the memory allocation is not a huge concern.
You can run this performance benchmark yourself using the following
Tests
Tests are added to the
arena_matrix_test
file for checking moves work. This can be run withSide Effects
Yes
In order to utilize this we need to use perfect forwarding in our reverse mode functions. See the new
multiply
andadd
signatures in this PR for an example. I think I could do this for most of our functions that can use it with a day or two of work, though now our functions will be more "modern" but also harder to understand.Release notes
Allows
arena_matrix
to use move semantics along with the stack allocator.Checklist
Math issue #(issue number)
Copyright holder: Simons Foundation
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested