-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Eigen::Map to replace arr functions #1425
Comments
The way I read this, there is a 30% reduction in execution time for LogSum? 15% for DotSelf and 5 for Sum? I would not call that minor. And even if the execution times would be exactly the same and we "only" gain the code reduction I would be all for it. So if I understand correct, we can remove the
This idea has been brewing for some time already anyways for many reasons (most recently sparse matrices I think). This is just another example of why we really need to do this. Some latest discussions on this topic are here: https://discourse.mc-stan.org/t/refactoring-math-with-more-general-signatures/10157/6 What would be the best way forward here? A PR (or multiple PRs) to refactor the |
It does, but I think we need to wait until we can squash the prim/may/arr
distinction. Right now, we do have the distinction in place and the code
base is consistent. We shouldn’t be adding (too many) special cases; it
becomes a cognitive burden to know when the rules apply and don’t.
I think this is a good idea and we should do it, not just here, but in lots
of places!
…On Mon, Oct 28, 2019 at 9:09 AM Andrew Johnson ***@***.***> wrote:
Description
As initially discussed in this PR
<#1373 (comment)>,
there is some code duplication in having the same functions implemented for
std::vector and Eigen separately. This could be avoided by using a
wrapper which compiles to a no-op for Eigen types and returns an
Eigen::Map for std::vector types prior to calling the mat definition:
template <typename Derived>
const auto& as_eigen(const Eigen::MatrixBase<Derived>& v) {
return v;
}
template <typename T>
const auto as_eigen(const std::vector<T>& v) {
return Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>>(v.data(),
v.size());
}
This also means that the templating for the mat definitions needs to be
changed from Eigen::Matrix<double, R, C> to Eigen::MatrixBase<Derived> so
that it works with Eigen::Map inputs.
After some initial testing, the speed gains are minor:
2019-10-28 20:40:26
Running ./log_sum_exp_arr
Run on (8 X 4000 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
LogSumExp_Old 8175394 ns 8175190 ns 92
LogSumExp_New 5511970 ns 5511907 ns 124
DotSelf_Old 5136511 ns 5136391 ns 132
DotSelf_New 4497041 ns 4496989 ns 157
Sum_Old 4868005 ns 4867939 ns 139
Sum_New 4608184 ns 4608127 ns 149
Benchmarking code here
<https://github.com/andrjohns/perf-math/blob/feature/tbb_test/log_sum_exp_arr.cpp>
The main benefits would be in code reduction for functions with existing
arr definitions, as well as making it easier to extend other functions to
take std::vector inputs (without requiring additional definitions).
Does this seem like a worthwhile change?
Current Version:
v3.0.0
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1425?email_source=notifications&email_token=AADH6FYOONN3X5GBE6XX7M3QQ3P67A5CNFSM4JFZ47B2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUYHEXQ>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADH6FZ5GXL64V37BTSFWELQQ3P67ANCNFSM4JFZ47BQ>
.
|
Ha. @rok-cesnovar, I think our messages crossed. And you're right -- we can remove the |
Great! I'll go ahead then. I think Rok's workflow was a good idea:
After some testing, I've realised that the signatures need to be more general than
Then, once the we're ready to collapse the
PS @SteveBronder your require generics are an absolute lifesaver here! |
This looks cool! Questions:
No pressure. Just asking cause these seem related. Also it probably makes sense to just work this out for one or two representative functions first. No need to try to get rid of everything in array in one go (that'll just make the first review painful for whoever does it). |
Pretty much the same conceptually, in that it's taking different vector types and returning a single type. The main difference is that it needs to return an
That depends on what you mean, the
Tricky one. For a given function it would be easy to add another definition explicitly for nested vectors, but for a single function that handles both vectors and nested vectors, I could look into implementing the
Good call! |
@andrjohns, can you explain this in more detail? I want to make sure I understand. Ok first read, I think what you’ve written here is not correct: if the return type is always an Eigen type, then we can’t mix the arr and mat implementation because this will lose the type information that’s necessary to work with Stan. But maybe what you’ve written isn’t what you were envisioning? Also, starting with one function and seeing it all the way through is definitely the right way to go. |
Sure! It's not that the return type of the function (e.g. Using
What I'm proposing is something like:
So the return type ( |
You don't have to implement |
I did have a look at that, the only issue was that it was specifically for vectors, whereas I'm hoping to work with any Eigen object (including Map). I've put together a rough proof-of-concept here. I've implemented a view framework (similar to The templating for the nested views is all explicit at the moment, since I'm still figuring out to navigate the specialisations. But this should give a good overview of what I had in mind. |
I like the idea. I think in that case existing function can be reworked to work with any Eigen expression. We don't need two functions that do almost the same thing. |
If you want I can do that. |
I'm not sure the functions are aiming to do the same thing - since Eigen types are meant to be returned unchanged, not as a column vector. I don't think |
No, it can't be applied to a matrix. But that is not a problem, since a matrix can not be stored in a std::vector. But I see it could be a problem if you want to return row vectors for row vector inputs. |
Glad to hear the requires are useful! Are they missing anything conceptually that would be useful? Also for some of the above I think you want the |
Why do these need to directly call Eigen functions? Don't we usually want to call Stan functions instead to get more efficient derivatives?
Row vectors and column vectors are not substitutable in Eigen. So row vector times vector works, but vector times vector doesn't.
+1 to working it out for an example function first.
|
return max + std::log((x.array() - max).exp().sum());
This will not be an efficient way to write this if it's not calling our vectorized exp and specialized sum functions.
...
So the return type (double) doesn't change, it's just that std::vectors are mapped to Eigen types within the function.
I think this may be tricky to get return types back in a lot of cases. For instance, what happens to transpose as a function?
|
A matrix has contiguous memory, so it can be stored in a std::vector. If you store a row vector or column vector, you lose the type going to std::vector and if you store a matrix, you lose number of rows and columns.
|
I was referring to the matrix functions that have separate
They're calling Eigen's vectorised functions, which can be faster (see this old forum post for benchmarking I've done with exp)
I'm not sure what you mean here. There isn't an
I think this comes down to the taxonomy of different types of matrix functions:
The first two should be simple enough - the return type is known for the first, and the input for the second is modified in-place. The third type will need some work to deduce the return container type (similar to |
I've put together a very rough working example for this using
The results are stored in an The I probably didn't explain that very well, but here's the branch I'm working from which also includes tests for the new functionality in |
Getting over a cold so I'll post more tmrw. But why not have two methods? One that iterates on containers of scales and the other that iterates on containers of containers. C of C the iterator of scalars |
There's a non-zero chance I was trying to be too clever by squishing it all into one method. Putting it into two makes for much cleaner code:
|
This is really interesting. Since it seems you were able to ratchet this up to std::vector, what are the chances of me ever writing type-agnostic C++ code and controlling what types go into a function at the interface level? This is almost definitely a Very Bad idea at some level, but the reason I'm asking is I recently worked on this bit of code (Gaussian process kernel stuff): https://github.com/stan-dev/math/blob/f573698ea3c6095a6d24de65915a501e9f3b84bc/stan/math/prim/mat/fun/gp_dot_prod_cov.hpp The x arguments to the function can be It'd be cool if the arguments could actually be any of std::vectorstd::vector, Also, when I've played with this stuff before there have been issues where super-generic templated functions can absorb Eigen intermediate expression types and give unhappy results (like things you would have regularly called eval on). How's that work here? Edits: Added some backticks so that template stuff would print correctly |
I may be going off the rails there too -- feel free to say so if you think this is out of scope. |
No problem, you're absolutely right that I was pretty bummed. |
I think the conversation has gone semi off the rails but in a pretty good direction! Ben I'll look over your code tmrw and see if we can do something like the above |
In a simple, proof-of-concept, kind of way, it's already done. I've currently got I'm not sure how well (or easily) this would extend to something like |
Should the second one be for vectors that contain vectors of scalars? Something like below in pseudocode template <typename Vec, require_vector_st<std::is_arithmetic, Vec>...,
require_vector_vt<is_vector, Vec>...>
inline auto log_softmax(Vec&& v) {
check_nonzero_size("log_softmax", "v", v);
// return_container does not exist yet but gives back either eigen or std vector type
return_container_t<Vec> result(v.size());
for(int i = 0; i < v.size(); i++){
result[i] = log_softmax(std::forward<decltype(v[i])>(v[i]));
}
return result;
}
In the future we may also want the above to work over containers of eigen vs. std vectors so we can have the eigen one just return a unaryExpr template type that can be lazy evaluated later |
Well at some point copies would be necessary. I don't think I can write a function that takes all the Stan types As another wrench, how do ragged arrays fit into this: stan-dev/design-docs#7? That wouldn't break what you have so far, but it would break any sort of matrix/real[,] equivalency we tried to set up, right? Or wrong? |
On Oct 29, 2019, at 9:52 PM, Andrew Johnson ***@***.***> wrote:
Why do these need to directly call Eigen functions? Don't we usually want to call Stan functions instead to get more efficient derivatives?
I was referring to the matrix functions that have separate prim and rev implementations (e.g. log_sum_exp), so they would just be working with matrices/vectors of doubles (i.e. no need for autodiff/derivatives). The example I used (v.squaredNorm()) is from prim/mat/fun/dot_self.
This will not be an efficient way to write this if it's not calling our vectorized exp and specialized sum functions.
They're calling Eigen's vectorised functions, which can be faster (see this old forum post for benchmarking I've done with exp)
That should be fine for primitives assuming it returns the same results and isn't just reducing precision in favor of speed. We might then want to replace the basic implementation of `exp()` itself if Eigen has a better one than in the standard library.
The vectorized functions won't vectorize over autodiff variables, as the values need to be peeled out. The best approach would be to use the adj-jac-apply where the whole thing could involve a single virtual function.
For instance, what happens to transpose as a function?
I'm not sure what you mean here. There isn't an arr definition of transpose, so I'm not sure why this would need to change?
I see---didn't realize this was just to deal with vectorized array functions. Those are less worrisome as all the computations are independent on the items (at least until we can vectorize to a single virtual function call).
...
• Functions that directly modify the input (e.g. sort_asc)
Yikes. Is that what we're calling in Stan? It's not supposed to be modifying its input in Stan, but I can see it does in the math lib.
|
Yeah that's why I mentioned the coefficient access stuff. I think the issue in the past was that coefficient access on an expression template will access the untransformed data member |
To promote any
That was determining whether the return type should be an Eigen type or an std::vector, but it might not be necessary (explained in the next comment)
To get this working I went back to three definitions:
The first definition works with general eigen types and expressions (things like The second definition is for It's less concise than just having two definitions, but it makes the function much more flexible with input types. |
I can't find that. Can you link the explanation or explain it here? |
That's great if we can handle most of their template expressions this way.
... The expression has to be evaluated before it's returned, otherwise there's the issue that @SteveBronder mentioned, where the 'finished' value isn't returned.
Ideally, we'd want to return the expression template so we could chain calls together for as long as possible and let Eigen resolve when to eval() them. It feels like trying to keep operations on the GPU.
The second definition is for std::vector<T> where T is an int or double, and the third is for nested containers.
It's less concise than just having two definitions, but it makes the function much more flexible with input types.
I'm guessing this will be worth it for the flexibility. Thanks for figuring it out.
|
It was this comment:
I've got an example of that occurring here: https://godbolt.org/z/8k-zqz |
Thanks. I asked this for the same reason as @bob-carpenter nicely explained - for performance reasons it is better to avoid I think we can work around this by requiring that functions that needs coeficient access evaluate expressions. Than we dont need to eval every return value. I dont know why piping to ostream does not do it, but that should not bother us. |
Oh hold on, it looks like the problem is the cast statement. Everything returns correctly if there's no cast: https://godbolt.org/z/EywVwH Odd. |
I think I have a final framework ready for comment. I've taken a similar approach to The function header itself looks like this: struct log_softmax_fun {
template <typename T>
static inline auto fun(const T& v) {
check_nonzero_size("log_softmax", "v", v);
return v.array() - log_sum_exp(v);
}
};
template <typename T>
inline auto log_softmax(const T& x) {
return apply_vector_unary<log_softmax_fun, T>::apply(x);
} And the template<typename F, typename T, typename Enable = void>
struct apply_vector_unary {};
template <typename F, typename T>
struct apply_vector_unary<F, T, require_eigen_st<std::is_floating_point, T>> {
static inline auto apply(const T& x) {
return F::fun(x);
}
};
template <typename F, typename T>
struct apply_vector_unary<F, T, require_eigen_st<std::is_integral, T>> {
static inline auto apply(const T& x) {
return F::fun(x.template cast<double>());
}
};
template <typename F, typename T>
struct apply_vector_unary<F, std::vector<T>, require_stan_scalar_t<T>> {
using map_t = typename Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>>;
using return_t = typename std::vector<return_type_t<T>>;
static inline return_t apply(const std::vector<T>& x) {
return_t result(x.size());
as_eigen(result) = apply_vector_unary<F, map_t>::apply(as_eigen(x));
return result;
}
};
template <typename F, typename T>
struct apply_vector_unary<F, T, require_vector_vt<is_vector, T>> {
using return_t = typename std::vector<return_container_t<value_type_t<T>>>;
static inline return_t apply(const T& x) {
return_t result(x.size());
for(int i = 0; i < x.size(); i++){
as_eigen(result[i])
= apply_vector_unary<F, decltype(x[i])>::apply(as_eigen(x[i]));
}
return result;
}
}; |
The wrinkle is that this assumes a vector-type return. Functions that reduce a vector down to a scalar (e.g. |
I like it!
imo I think that's fine, idt we want to think about this in the R way of applying to matrix rows and columns. If we wanted to we could have a
I think scatter, broadcast, and reduce should be their own structs. With reduce we might want something about how to reduce it i.e. do addition for a sum or product etc. If we are going to rely on more meta-programming I think we should try to be more formal about our template arg names ala template <typename Func, typename VecVec>
struct apply_vector_unary<Func, VecVec, require_vector_vt<is_vector, VecVec>> {
using return_t = typename std::vector<return_container_t<value_type_t<VecVec>>>;
static inline return_t apply(VecVec&& x) {
return_t result(x.size());
for(int i = 0; i < x.size(); i++){
as_eigen(result[i])
= apply_vector_unary<F, decltype(x[i])>::apply(as_eigen(x[i]));
}
return result;
}
}; |
Could we just replace the call to apply scalar unary for vectors etc. with the above stuff? That would be an easy plug and play |
I don't think we'll be able to get to that level of generality, since some functions have implementation-specific reductions to avoid over-/under-flow (e.g. For example, this what I've got
With
You'll notice that the Eigen declarations are the same as in |
That's a bit trickier, since |
... idt we want to think about this in the R way of applying to matrix rows and columns.
Once we get closures, we'll probably want to add those map functions. We already have some specialized cross-product functions.
|
We definitely need specific-case reductions like log-sum-exp. General ones might also be useful. The algorithms std library has accumulators that Steve's been adding to the Stan math lib code. The accumulators are based on initialization and combination. These will be more useful when get more functional programming in Stan. I'm not going to be the first to say the m-word.
|
Anyone who mentions those things and forces me to google what they are for the 19th time will be reported for CoC violation |
When even the Wikipedia's explanation is a mess, I don't hold out much hope that the 19th time will be the charm.
… On Nov 11, 2019, at 8:20 PM, Steve Bronder ***@***.***> wrote:
the m-word.
Anyone who mentions those things and forces me to google what they are for the 19th time will be reported for CoC violation
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Just a random Idea. Do we for some reason need to separate function implementations from function signatures into a separate struct? I think we could get code that is a bit easier to read if we used generic lambdas and template deduction. So instead of this:
Is there any reason not to do this:
Of cource |
The one advantage to the way it is now is that the functions are all static so that instances never get constructed. That may not be very high overhead compared to everything else. In the testing framework, I've just been using lambdas everywhere.
… On Nov 12, 2019, at 2:36 AM, Tadej Ciglarič ***@***.***> wrote:
Just a random Idea. Do we for some reason need to separate function implementations from function signatures into a separate struct? I think we could get code that is a bit easier to read if we used generic lambdas and template deduction. So instead of this:
struct log_softmax_fun {
template <typename T>
static inline auto fun(const T& v) {
}
};
template <typename T>
inline auto log_softmax(const T& x) {
return apply_vector_unary<log_softmax_fun, T>::apply(x);
}
Is there any reason not to do this:
template <typename T>
inline auto log_softmax(const T& x) {
return apply_vector_unary<T>::apply(x, [](auto& v){
check_nonzero_size("log_softmax", "v", v);
return v.array() - log_sum_exp(v);
});
}
Of cource apply_vector_unary would have to modified a bit to accept functor instead of template type with run member function.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
They are trivially constructible tho' so idt the overhead would be that much. Could run both through godbolt to see what happens |
I think there should not be any overhead in optimized code. But it can't hurt to check. |
Uhm... this is a great thread and I am really looking forward to this template magic...given the complexity and size of discussion would it make sense to write a brief design doc and file that in the design repo? (I really never like when people tell me to write design docs - lots of work - but after having done it things gain in clarity) Feel free to ignore my comment. |
^what does and doesn't need a design doc is still kind of vague, but I think it does fit here. Plus with the new site we can take the approved design doc and add it there so contributors have an easier way to get why we do things a certain way. |
I'm on board with the design doc, especially since the scope of this has changed quite a bit from my initial post (in a good way!). I'd just like to clarify the intended direction of this before I write up a doc about the wrong thing. In my mind, this issue is now concerned with extending the existing
(Let me know if I've missed a type in there) In terms of a design, I'm not sure whether it would be possible to have a flexible implementation that would handle more than one of these types of functions, or whether we need a separate one for each (e.g. |
I would very strongly prefer that we do *not* extend matrix operations to std::vector. I know there's been a lot of pressure to make Stan more like R in that it will guess what the shape of your vector is, but I very much want to keep Stan strongly typed. For instance, I do *not* want a std::vector dot product, or an Eigen row vector * std::vector to be a scalar.
You should be able to be flexible on the argument types in a meta-programming framework---it's a pretty easy generalization. At worst, you can just take the shape as a template parameter. There are many more signatures than listed when you get into matrix operations.
real(row-vector, vector)
matrix(vector, row-vector)
matrix(matrix, vector)
vector(real, vector)
....
But those are exactly the ones I don't want to generalize to std::vector.
There's also the distribution functions, which are already coded to be flexible about
this using views.
|
That sounds good to me, it was going to be a bit of a mess determining whether the to treat them as column or row-vectors. |
Description
As initially discussed in this PR, there is some code duplication in having the same functions implemented for
std::vector
andEigen
separately. This could be avoided by using a wrapper which compiles to a no-op forEigen
types and returns anEigen::Map
forstd::vector
types prior to calling themat
definition:This also means that the templating for the
mat
definitions needs to be changed fromEigen::Matrix<double, R, C>
toEigen::MatrixBase<Derived>
so that it works withEigen::Map
inputs.After some initial testing, the speed gains are minor:
Benchmarking code here
The main benefits would be in code reduction for functions with existing
arr
definitions, as well as making it easier to extend other functions to takestd::vector
inputs (without requiring additional definitions).Does this seem like a worthwhile change?
Current Version:
v3.0.0
The text was updated successfully, but these errors were encountered: