Use Eigen::Map to replace arr functions #1425

andrjohns · 2019-10-28T13:09:02Z

Description

As initially discussed in this PR, there is some code duplication in having the same functions implemented for std::vector and Eigen separately. This could be avoided by using a wrapper which compiles to a no-op for Eigen types and returns an Eigen::Map for std::vector types prior to calling the mat definition:

template <typename Derived>
const auto& as_eigen(const Eigen::MatrixBase<Derived>& v) {
  return v;
}

template <typename T>
const auto as_eigen(const std::vector<T>& v) {
  return Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>>(v.data(),
                                                               v.size());
}

This also means that the templating for the mat definitions needs to be changed from Eigen::Matrix<double, R, C> to Eigen::MatrixBase<Derived> so that it works with Eigen::Map inputs.

After some initial testing, the speed gains are minor:

2019-10-28 20:40:26
Running ./log_sum_exp_arr
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
-----------------------------------------------------
Benchmark              Time           CPU Iterations
-----------------------------------------------------
LogSumExp_Old    8175394 ns    8175190 ns         92
LogSumExp_New    5511970 ns    5511907 ns        124
DotSelf_Old      5136511 ns    5136391 ns        132
DotSelf_New      4497041 ns    4496989 ns        157
Sum_Old          4868005 ns    4867939 ns        139
Sum_New          4608184 ns    4608127 ns        149

Benchmarking code here

The main benefits would be in code reduction for functions with existing arr definitions, as well as making it easier to extend other functions to take std::vector inputs (without requiring additional definitions).

Does this seem like a worthwhile change?

Current Version:

v3.0.0

The text was updated successfully, but these errors were encountered:

rok-cesnovar · 2019-10-28T13:27:27Z

The way I read this, there is a 30% reduction in execution time for LogSum? 15% for DotSelf and 5 for Sum? I would not call that minor. And even if the execution times would be exactly the same and we "only" gain the code reduction I would be all for it.

So if I understand correct, we can remove the /arr folder completely in this case? If that is the case, I am 100% on board with this idea and please let me know if I can help in any way.

This also means that the templating for the mat definitions needs to be changed from Eigen::Matrix<double, R, C> to Eigen::MatrixBase<Derived> so that it works with Eigen::Map inputs.

This idea has been brewing for some time already anyways for many reasons (most recently sparse matrices I think). This is just another example of why we really need to do this. Some latest discussions on this topic are here: https://discourse.mc-stan.org/t/refactoring-math-with-more-general-signatures/10157/6

What would be the best way forward here? A PR (or multiple PRs) to refactor the /mat signatures and then a PR to add what you are proposing here?

syclik · 2019-10-28T13:29:46Z

It does, but I think we need to wait until we can squash the prim/may/arr distinction. Right now, we do have the distinction in place and the code base is consistent. We shouldn’t be adding (too many) special cases; it becomes a cognitive burden to know when the rules apply and don’t. I think this is a good idea and we should do it, not just here, but in lots of places!

…

On Mon, Oct 28, 2019 at 9:09 AM Andrew Johnson ***@***.***> wrote: Description As initially discussed in this PR <#1373 (comment)>, there is some code duplication in having the same functions implemented for std::vector and Eigen separately. This could be avoided by using a wrapper which compiles to a no-op for Eigen types and returns an Eigen::Map for std::vector types prior to calling the mat definition: template <typename Derived> const auto& as_eigen(const Eigen::MatrixBase<Derived>& v) { return v; } template <typename T> const auto as_eigen(const std::vector<T>& v) { return Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>>(v.data(), v.size()); } This also means that the templating for the mat definitions needs to be changed from Eigen::Matrix<double, R, C> to Eigen::MatrixBase<Derived> so that it works with Eigen::Map inputs. After some initial testing, the speed gains are minor: 2019-10-28 20:40:26 Running ./log_sum_exp_arr Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 256K (x4) L3 Unified 8192K (x1) ----------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------- LogSumExp_Old 8175394 ns 8175190 ns 92 LogSumExp_New 5511970 ns 5511907 ns 124 DotSelf_Old 5136511 ns 5136391 ns 132 DotSelf_New 4497041 ns 4496989 ns 157 Sum_Old 4868005 ns 4867939 ns 139 Sum_New 4608184 ns 4608127 ns 149 Benchmarking code here <https://github.com/andrjohns/perf-math/blob/feature/tbb_test/log_sum_exp_arr.cpp> The main benefits would be in code reduction for functions with existing arr definitions, as well as making it easier to extend other functions to take std::vector inputs (without requiring additional definitions). Does this seem like a worthwhile change? Current Version: v3.0.0 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1425?email_source=notifications&email_token=AADH6FYOONN3X5GBE6XX7M3QQ3P67A5CNFSM4JFZ47B2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUYHEXQ>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADH6FZ5GXL64V37BTSFWELQQ3P67ANCNFSM4JFZ47BQ> .

syclik · 2019-10-28T15:13:36Z

Ha. @rok-cesnovar, I think our messages crossed. And you're right -- we can remove the arr implementation completely and just use the mat implementation. We still need to flatten everything, but that's a different issue and we don't need to hold up anything for that.

andrjohns · 2019-10-29T01:09:05Z

Great! I'll go ahead then.

I think Rok's workflow was a good idea:

Generalise mat signatures
Replace arr functions

After some testing, I've realised that the signatures need to be more general than MatrixBase<Derived>, since they need to be able to take std::vectors as inputs. At the moment, I'm thinking of something like this, using log_sum_exp as an example:

template <typename T, typename = require_eigen_st<std::is_arithmetic,T>>
double log_sum_exp(const T& x) {
...
}

Then, once the we're ready to collapse the arr functions, the template can be updated to:

template <typename T, typename = require_vector_like_st<std::is_arithmetic,T>>
double log_sum_exp(const T& x) {
...
}

PS @SteveBronder your require generics are an absolute lifesaver here!

bbbales2 · 2019-10-29T01:53:45Z

This looks cool! Questions:

How does this compare to scalar_seq_view in https://github.com/stan-dev/math/blob/develop/stan/math/prim/scal/prob/normal_lpdf.hpp#L62 ? (I guess that one can absorb scalars so maybe I answered myself -- but they seem very similar?)
Can this mechanism absorb row_vectors as well?
Can this be used to write functions that take a std::vector of any of a std::vector, an Eigen vector or an Eigen row vector?

No pressure. Just asking cause these seem related.

Also it probably makes sense to just work this out for one or two representative functions first. No need to try to get rid of everything in array in one go (that'll just make the first review painful for whoever does it).

andrjohns · 2019-10-29T03:50:31Z

How does this compare to scalar_seq_view in https://github.com/stan-dev/math/blob/develop/stan/math/prim/scal/prob/normal_lpdf.hpp#L62 ? (I guess that one can absorb scalars so maybe I answered myself -- but they seem very similar?)

Pretty much the same conceptually, in that it's taking different vector types and returning a single type. The main difference is that it needs to return an Eigen type specifically, so that we can directly call their functions (e.g. v.squaredNorm())

Can this mechanism absorb row_vectors as well?

That depends on what you mean, the as_eigen wrapper will return all Eigen objects unchanged, and the require_eigen_st<std::is_arithmetic,T> templating will allow all types under the EigenBase umbrella (which includes row-vectors). So it will work in the sense that the templating will be inclusive of row-vectors, but not in the sense that row-vectors would be mapped to column-vectors. But that's an easy enough inclusion to make as well.

Can this be used to write functions that take a std::vector of any of a std::vector, an Eigen vector or an Eigen row vector?

Tricky one. For a given function it would be easy to add another definition explicitly for nested vectors, but for a single function that handles both vectors and nested vectors, I could look into implementing the as_eigen wrapper in a similar way to vector_seq_view (a la eigen_seq_view). That should make it pretty trivial (touch wood) to add nested vectors to current mat functions as well.

Also it probably makes sense to just work this out for one or two representative functions first. No need to try to get rid of everything in array in one go (that'll just make the first review painful for whoever does it).

Good call!

syclik · 2019-10-29T05:41:42Z

The main difference is that it needs to return an Eigen type specifically, so that we can directly call their functions (e.g. v.squaredNorm())

@andrjohns, can you explain this in more detail? I want to make sure I understand. Ok first read, I think what you’ve written here is not correct: if the return type is always an Eigen type, then we can’t mix the arr and mat implementation because this will lose the type information that’s necessary to work with Stan. But maybe what you’ve written isn’t what you were envisioning?

Also, starting with one function and seeing it all the way through is definitely the right way to go.

andrjohns · 2019-10-29T06:08:11Z

Sure! It's not that the return type of the function (e.g. log_sum_exp) will always be an Eigen type, it's that the return type of wrapper (i.e.as_eigen) will always be an Eigen type.

Using log_sum_exp as an example. The current definition is:

template <int R, int C>
double log_sum_exp(const Eigen::Matrix<double, R, C>& x) {
  const double max = x.maxCoeff();
  if (!std::isfinite(max)) {
    return max;
  }
  return max + std::log((x.array() - max).exp().sum());
}

What I'm proposing is something like:

template <typename T, typename = require_vector_like_st<std::is_arithmetic,T>>
double log_sum_exp(const T& x) {
  auto v = as_eigen(x);
  const double max = v.maxCoeff();
  if (!std::isfinite(max)) {
    return max;
  }
  return max + std::log((v.array() - max).exp().sum());
}

So the return type (double) doesn't change, it's just that std::vectors are mapped to Eigen types within the function.

t4c1 · 2019-10-29T08:45:22Z

You don't have to implement as_eigen(). I think as_column_vector_or_scalar() I implemented when working with GLMs will do exactley what you want.

andrjohns · 2019-10-29T09:07:01Z

You don't have to implement as_eigen(). I think as_column_vector_or_scalar() I implemented when working with GLMs will do exactley what you want.

I did have a look at that, the only issue was that it was specifically for vectors, whereas I'm hoping to work with any Eigen object (including Map).

I've put together a rough proof-of-concept here. I've implemented a view framework (similar to vector_seq_view) and updated the mat definition to work with both std::vectors and std::vectors containing other vector types

The templating for the nested views is all explicit at the moment, since I'm still figuring out to navigate the specialisations. But this should give a good overview of what I had in mind.

t4c1 · 2019-10-29T09:18:25Z

I like the idea. I think in that case existing function can be reworked to work with any Eigen expression. We don't need two functions that do almost the same thing.

t4c1 · 2019-10-29T09:22:00Z

If you want I can do that.

andrjohns · 2019-10-29T12:27:13Z

I'm not sure the functions are aiming to do the same thing - since Eigen types are meant to be returned unchanged, not as a column vector. I don't think as_column_vector_or_scalar could be applied to a matrix?

t4c1 · 2019-10-29T12:46:20Z

No, it can't be applied to a matrix. But that is not a problem, since a matrix can not be stored in a std::vector. But I see it could be a problem if you want to return row vectors for row vector inputs.

SteveBronder · 2019-10-29T18:20:39Z

Glad to hear the requires are useful! Are they missing anything conceptually that would be useful? Also for some of the above I think you want the _vt (value_type) instead of _st (scalar_type) since value type only looks one level down. Scalar type looks all the way down so vector of vector would also go in that signature. If as_eigen can work with vector of vectors then _st is the right choice.

bob-carpenter · 2019-10-29T19:17:18Z

Why do these need to directly call Eigen functions? Don't we usually want to call Stan functions instead to get more efficient derivatives? Row vectors and column vectors are not substitutable in Eigen. So row vector times vector works, but vector times vector doesn't. +1 to working it out for an example function first.

bob-carpenter · 2019-10-29T19:19:15Z

return max + std::log((x.array() - max).exp().sum());

This will not be an efficient way to write this if it's not calling our vectorized exp and specialized sum functions. ...

So the return type (double) doesn't change, it's just that std::vectors are mapped to Eigen types within the function.

I think this may be tricky to get return types back in a lot of cases. For instance, what happens to transpose as a function?

bob-carpenter · 2019-10-29T19:20:14Z

A matrix has contiguous memory, so it can be stored in a std::vector. If you store a row vector or column vector, you lose the type going to std::vector and if you store a matrix, you lose number of rows and columns.

andrjohns · 2019-10-30T01:52:00Z

Why do these need to directly call Eigen functions? Don't we usually want to call Stan functions instead to get more efficient derivatives?

I was referring to the matrix functions that have separate prim and rev implementations (e.g. log_sum_exp), so they would just be working with matrices/vectors of doubles (i.e. no need for autodiff/derivatives). The example I used (v.squaredNorm()) is from prim/mat/fun/dot_self.

This will not be an efficient way to write this if it's not calling our vectorized exp and specialized sum functions.

They're calling Eigen's vectorised functions, which can be faster (see this old forum post for benchmarking I've done with exp)

For instance, what happens to transpose as a function?

I'm not sure what you mean here. There isn't an arr definition of transpose, so I'm not sure why this would need to change?

I think this may be tricky to get return types back in a lot of cases

I think this comes down to the taxonomy of different types of matrix functions:

Reductions that return a scalar (e.g. log_sum_exp, dot_self)
Functions that directly modify the input (e.g. sort_asc)
Functions that return a vector of the same type as the input (e.g. value_of)

The first two should be simple enough - the return type is known for the first, and the input for the second is modified in-place. The third type will need some work to deduce the return container type (similar to apply_scalar_unary)

andrjohns · 2019-10-30T03:46:23Z

The third type will need some work to deduce the return container type

I've put together a very rough working example for this using log_softmax (suggestions appreciated). At the moment, log_softmax can only take column vectors as inputs (and subsequently return a column vector). The below function can work with any type of vector (column, row, and std::) as well as those vectors nested within an std::vector:

template <typename T, typename T_v = typename T::value_type,
          typename nest_vec_t = std::conditional_t<is_vector_like<T_v>::value,
                                                   T_v, T>,
          typename = require_vector_like_st<std::is_arithmetic, T>>
inline auto log_softmax(const T& v) {
  eigen_seq_view<T> u(v);
  check_nonzero_size("log_softmax", "v", u[0]);
  std::vector<nest_vec_t> result(u.size());
  for(int i = 0; i < u.size(); i++){
    result[i].resize(u[i].size());
    as_eigen(result[i]) = u[i].array() - log_sum_exp(u[i]);
  }
  return match_input_dim<T>(result);
}

eigen_seq_view works like vector_seq_view, except the view is to an eigen object (via as_eigen).

The results are stored in an std::vector ('result'), which is the size of the view (where the sizing works the same as vector_seq_view). For a non-nested vector (e.g. Eigen::VectorXd), this results vector will be size 1, whereas for a nested vector (e.g. std::vector<Eigen::VectorXd>) the vector will the length of the array.

The match_input_dim struct then determines whether to return the std::vector of results as-is (in the case of a nested input), or whether to return only the first entry (in the case of a non-nested input).

I probably didn't explain that very well, but here's the branch I'm working from which also includes tests for the new functionality in log_softmax as well as a similar refactor for log_sum_exp

SteveBronder · 2019-10-30T04:38:40Z

Getting over a cold so I'll post more tmrw. But why not have two methods? One that iterates on containers of scales and the other that iterates on containers of containers. C of C the iterator of scalars

andrjohns · 2019-10-30T05:11:35Z

But why not have two methods?

There's a non-zero chance I was trying to be too clever by squishing it all into one method. Putting it into two makes for much cleaner code:

template <typename T, typename = require_vector_like_st<std::is_arithmetic, T>>
inline T log_softmax(const T& v) {
  check_nonzero_size("log_softmax", "v", v);
  auto u = as_eigen(v);
  T result(u.size());
  as_eigen(result) = u.array() - log_sum_exp(u);
  return result;
}

template <typename T, typename = require_vector_like_st<std::is_arithmetic, T>>
inline std::vector<T> log_softmax(const std::vector<T>& v) {
  check_nonzero_size("log_softmax", "v", v);
  std::vector<T> result(v.size());
  for(int i = 0; i < v.size(); i++){
    result[i] = log_softmax(v[i]);
  }
  return result;
}

bbbales2 · 2019-10-30T14:31:05Z

This is really interesting. Since it seems you were able to ratchet this up to std::vector, what are the chances of me ever writing type-agnostic C++ code and controlling what types go into a function at the interface level?

This is almost definitely a Very Bad idea at some level, but the reason I'm asking is I recently worked on this bit of code (Gaussian process kernel stuff): https://github.com/stan-dev/math/blob/f573698ea3c6095a6d24de65915a501e9f3b84bc/stan/math/prim/mat/fun/gp_dot_prod_cov.hpp

The x arguments to the function can be std::vector<Eigen::VectorXd>-like.

It'd be cool if the arguments could actually be any of std::vectorstd::vector, std::vector<Eigen::RowVectorXd>, or Eigen::MatrixXd. When I told @avehtari that he'd have to give his inputs as arrays of vectors in Stan, he was pretty bummed. But I am also not keen on writing and testing 6 more gp_dot_prod_cov variants to allow him to use Stan matrix types (especially since this would require me to update a bunch of other gp stuff to keep it all equivalent).

Also, when I've played with this stuff before there have been issues where super-generic templated functions can absorb Eigen intermediate expression types and give unhappy results (like things you would have regularly called eval on). How's that work here?

Edits: Added some backticks so that template stuff would print correctly

bbbales2 · 2019-10-30T14:33:08Z

I may be going off the rails there too -- feel free to say so if you think this is out of scope.

avehtari · 2019-10-30T15:23:58Z

feel free to say so if you think this is out of scope.

No problem, you're absolutely right that I was pretty bummed.

SteveBronder · 2019-10-30T15:29:11Z

I think the conversation has gone semi off the rails but in a pretty good direction! Ben I'll look over your code tmrw and see if we can do something like the above

andrjohns · 2019-10-30T15:38:50Z

Since it seems you were able to ratchet this up to std::vector, what are the chances of me ever writing type-agnostic C++ code and controlling what types go into a function at the interface level?

In a simple, proof-of-concept, kind of way, it's already done. I've currently got log_sum_exp working with std::vectors and most every Eigen type (Array, Map, Diagonal), as well as std::vectors of all of those. But functions like log_softmax are currently restricted to std::vectors and 'standard' eigen types (Matrix, Vector, Row-Vector, Array), as well as std::vectors of all of those, since it's harder to deduce the correct return type when the input is an expression (e.g. Map or Diagonal).

I'm not sure how well (or easily) this would extend to something like gp_dot_prod_cov, but given that the return type is always an Eigen::Matrix, it might not be too crazy

SteveBronder · 2019-10-30T15:51:04Z

Should the second one be for vectors that contain vectors of scalars? Something like below in pseudocode

template <typename Vec, require_vector_st<std::is_arithmetic, Vec>...,
 require_vector_vt<is_vector, Vec>...>
inline auto log_softmax(Vec&& v) {
  check_nonzero_size("log_softmax", "v", v);
// return_container does not exist yet but gives back either eigen or std vector type
  return_container_t<Vec> result(v.size());
  for(int i = 0; i < v.size(); i++){
    result[i] = log_softmax(std::forward<decltype(v[i])>(v[i]));
  }
  return result;
}

vector_like works for std::vectors, eigen vectors, and pointers (which are all array like accessible with []). I think here we may just want require_vector_*

In the future we may also want the above to work over containers of eigen vs. std vectors so we can have the eigen one just return a unaryExpr template type that can be lazy evaluated later

bbbales2 · 2019-10-30T15:55:20Z

Well at some point copies would be necessary. I don't think I can write a function that takes all the Stan types matrix, vector[], row_vector[], real[,] and doesn't require copies?

As another wrench, how do ragged arrays fit into this: stan-dev/design-docs#7? That wouldn't break what you have so far, but it would break any sort of matrix/real[,] equivalency we tried to set up, right? Or wrong?

bob-carpenter · 2019-10-30T21:51:00Z

On Oct 29, 2019, at 9:52 PM, Andrew Johnson ***@***.***> wrote: Why do these need to directly call Eigen functions? Don't we usually want to call Stan functions instead to get more efficient derivatives? I was referring to the matrix functions that have separate prim and rev implementations (e.g. log_sum_exp), so they would just be working with matrices/vectors of doubles (i.e. no need for autodiff/derivatives). The example I used (v.squaredNorm()) is from prim/mat/fun/dot_self. This will not be an efficient way to write this if it's not calling our vectorized exp and specialized sum functions. They're calling Eigen's vectorised functions, which can be faster (see this old forum post for benchmarking I've done with exp)

That should be fine for primitives assuming it returns the same results and isn't just reducing precision in favor of speed. We might then want to replace the basic implementation of `exp()` itself if Eigen has a better one than in the standard library. The vectorized functions won't vectorize over autodiff variables, as the values need to be peeled out. The best approach would be to use the adj-jac-apply where the whole thing could involve a single virtual function.

For instance, what happens to transpose as a function? I'm not sure what you mean here. There isn't an arr definition of transpose, so I'm not sure why this would need to change?

I see---didn't realize this was just to deal with vectorized array functions. Those are less worrisome as all the computations are independent on the items (at least until we can vectorize to a single virtual function call). ...

• Functions that directly modify the input (e.g. sort_asc)

Yikes. Is that what we're calling in Stan? It's not supposed to be modifying its input in Stan, but I can see it does in the math lib.

SteveBronder · 2019-11-04T18:19:16Z

The other types this could work with are all the intermediate expression templates that get generated from operations like .transpose().

Yeah that's why I mentioned the coefficient access stuff. I think the issue in the past was that coefficient access on an expression template will access the untransformed data member

andrjohns · 2019-11-08T13:31:02Z

Why do we need the template cast here? Not saying we don't, just curious.

To promote any int inputs, since Eigen doesn't automatically promote: https://godbolt.org/z/RnS3BY

What is return_container_t doing? Is it matching the container type of T or something?

That was determining whether the return type should be an Eigen type or an std::vector, but it might not be necessary (explained in the next comment)

The other types this could work with are all the intermediate expression templates that get generated from operations like .transpose()

To get this working I went back to three definitions:

template <typename T, typename = require_eigen_st<std::is_arithmetic, T>,
          typename PromotedScalarT = return_type_t<typename T::Scalar>>
inline auto log_softmax(const T& v) {
  check_nonzero_size("log_softmax", "v", v);
  auto u = v.template cast<PromotedScalarT>().eval();
  return (u.array() - log_sum_exp(u)).eval();
}

template <typename T, typename = require_any_arithmetic_t<T>,
          typename PromotedScalarT = return_type_t<T>>
inline auto log_softmax(const std::vector<T>& v) {
  check_nonzero_size("log_softmax", "v", v);
  auto u = as_column_vector_or_scalar(v).template cast<return_type_t<T>>().eval();
  std::vector<T> result(u.size());
  as_eigen(result) = u.array() - log_sum_exp(u);
  return result;
}

template <typename Vec, require_vector_st<std::is_arithmetic, Vec>...,
          require_vector_vt<is_vector, Vec>...>
inline auto log_softmax(Vec&& v) {
  check_nonzero_size("log_softmax", "v", v);
  std::vector<return_container_t<value_type_t<Vec>>> result(v.size());
  for(int i = 0; i < v.size(); i++){
    result[i] = log_softmax(std::forward<decltype(v[i])>(v[i]));
  }
  return result;
}

The first definition works with general eigen types and expressions (things like log_softmax(m.transpose().diagonal()). The expression has to be evaluated before it's returned, otherwise there's the issue that @SteveBronder mentioned, where the 'finished' value isn't returned.

The second definition is for std::vector<T> where T is an int or double, and the third is for nested containers.

It's less concise than just having two definitions, but it makes the function much more flexible with input types.

t4c1 · 2019-11-08T14:58:14Z

The expression has to be evaluated before it's returned, otherwise there's the issue that @SteveBronder mentioned, where the 'finished' value isn't returned.

I can't find that. Can you link the explanation or explain it here?

bob-carpenter · 2019-11-08T15:20:40Z

That's great if we can handle most of their template expressions this way.

... The expression has to be evaluated before it's returned, otherwise there's the issue that @SteveBronder mentioned, where the 'finished' value isn't returned.

Ideally, we'd want to return the expression template so we could chain calls together for as long as possible and let Eigen resolve when to eval() them. It feels like trying to keep operations on the GPU.

The second definition is for std::vector<T> where T is an int or double, and the third is for nested containers. It's less concise than just having two definitions, but it makes the function much more flexible with input types.

I'm guessing this will be worth it for the flexibility. Thanks for figuring it out.

andrjohns · 2019-11-08T15:25:18Z

I can't find that. Can you link the explanation or explain it here?

It was this comment:

Yeah that's why I mentioned the coefficient access stuff. I think the issue in the past was that coefficient access on an expression template will access the untransformed data member

I've got an example of that occurring here: https://godbolt.org/z/8k-zqz

t4c1 · 2019-11-08T15:32:38Z

Thanks. I asked this for the same reason as @bob-carpenter nicely explained - for performance reasons it is better to avoid .eval() if it is not necessary.

I think we can work around this by requiring that functions that needs coeficient access evaluate expressions. Than we dont need to eval every return value. I dont know why piping to ostream does not do it, but that should not bother us.

andrjohns · 2019-11-08T15:48:22Z

Oh hold on, it looks like the problem is the cast statement. Everything returns correctly if there's no cast: https://godbolt.org/z/EywVwH

Odd.

andrjohns · 2019-11-11T12:28:34Z

I think I have a final framework ready for comment. I've taken a similar approach to apply_scalar_unary, as Bob recommended, and implemented an apply_vector_unary framework which will work with arbitrary Eigen types, std::vector types, and containers of both (while also avoiding the use of .eval()).

The function header itself looks like this:

struct log_softmax_fun {
  template <typename T>
  static inline auto fun(const T& v) {
    check_nonzero_size("log_softmax", "v", v);
    return v.array() - log_sum_exp(v);
  }
};

template <typename T>
inline auto log_softmax(const T& x) {
  return apply_vector_unary<log_softmax_fun, T>::apply(x);
}

And the apply_vector_unary header looks like this:

template<typename F, typename T, typename Enable = void>
struct apply_vector_unary {};

template <typename F, typename T>
struct apply_vector_unary<F, T, require_eigen_st<std::is_floating_point, T>> {

  static inline auto apply(const T& x) {
    return F::fun(x);
  }
};


template <typename F, typename T>
struct apply_vector_unary<F, T, require_eigen_st<std::is_integral, T>> {

  static inline auto apply(const T& x) {
    return F::fun(x.template cast<double>());
  }
};


template <typename F, typename T>
struct apply_vector_unary<F, std::vector<T>, require_stan_scalar_t<T>> {

  using map_t = typename Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>>;
  using return_t = typename std::vector<return_type_t<T>>;

  static inline return_t apply(const std::vector<T>& x) {
    return_t result(x.size());
    as_eigen(result) = apply_vector_unary<F, map_t>::apply(as_eigen(x));
    return result;
  }
};


template <typename F, typename T>
struct apply_vector_unary<F, T, require_vector_vt<is_vector, T>> {

  using return_t = typename std::vector<return_container_t<value_type_t<T>>>;

  static inline return_t apply(const T& x) {
    return_t result(x.size());
    for(int i = 0; i < x.size(); i++){
      as_eigen(result[i])
                  = apply_vector_unary<F, decltype(x[i])>::apply(as_eigen(x[i]));
    }
    return result;
  }
};

andrjohns · 2019-11-11T12:28:48Z

The wrinkle is that this assumes a vector-type return. Functions that reduce a vector down to a scalar (e.g. log_sum_exp) would need either a separate framework (e.g. apply_vector_unary_reduction) or a flag could be added to signal that a scalar return is needed (e.g. apply_vector_unary<F, T, reduce>). Thoughts?

SteveBronder · 2019-11-11T20:02:39Z

I like it!

The wrinkle is that this assumes a vector-type return.

imo I think that's fine, idt we want to think about this in the R way of applying to matrix rows and columns. If we wanted to we could have a apply_vector_unary that specialized for types that are Eigen Matrices that are not vectors (require_t<conjunction<is_eigen<T>, !is_eigen_vector<T>>>) then apply over the columns with .col() right? Or something of the sort

Functions that reduce a vector down to a scalar (e.g. log_sum_exp) would need either a separate framework (e.g. apply_vector_unary_reduction) or a flag could be added to signal that a scalar return is needed (e.g. apply_vector_unary<F, T, reduce>). Thoughts?

I think scatter, broadcast, and reduce should be their own structs. With reduce we might want something about how to reduce it i.e. do addition for a sum or product etc.

If we are going to rely on more meta-programming I think we should try to be more formal about our template arg names ala

template <typename Func, typename VecVec>
struct apply_vector_unary<Func, VecVec, require_vector_vt<is_vector, VecVec>> {

  using return_t = typename std::vector<return_container_t<value_type_t<VecVec>>>;

  static inline return_t apply(VecVec&& x) {
    return_t result(x.size());
    for(int i = 0; i < x.size(); i++){
      as_eigen(result[i])
                  = apply_vector_unary<F, decltype(x[i])>::apply(as_eigen(x[i]));
    }
    return result;
  }
};

SteveBronder · 2019-11-11T20:03:49Z

Could we just replace the call to apply scalar unary for vectors etc. with the above stuff? That would be an easy plug and play

andrjohns · 2019-11-12T00:40:38Z

I think scatter, broadcast, and reduce should be their own structs. With reduce we might want something about how to reduce it i.e. do addition for a sum or product etc.

I don't think we'll be able to get to that level of generality, since some functions have implementation-specific reductions to avoid over-/under-flow (e.g. log_sum_exp), while others have the reduction built-in to the function call (e.g.m.mean()). I was thinking that apply_vector_unary_reduce would be more to signal that scalar is being returned from a matrix function, rather than a generic class for reducing matrices down to scalars.

For example, this what I've got mean looking like:

struct mean_fun {
  template <typename T>
  static inline auto fun(const T& v) {
    check_nonzero_size("mean", "v", v);
    return v.mean();
  }
};

template <typename T>
inline auto mean(const T& x) {
  return apply_vector_unary_reduce<mean_fun, T>::apply(x);
}

With apply_vector_unary_reduce:

template<typename F, typename T, typename Enable = void>
struct apply_vector_unary_reduce {};

template <typename F, typename T>
struct apply_vector_unary_reduce<F, T, require_eigen_st<std::is_floating_point, T>> {

  static inline auto apply(const T& x) {
    return F::fun(x);
  }
};

template <typename F, typename T>
struct apply_vector_unary_reduce<F, T, require_eigen_st<std::is_integral, T>> {

  static inline auto apply(const T& x) {
    return F::fun(x.template cast<double>());
  }
};

template <typename F, typename T>
struct apply_vector_unary_reduce<F, std::vector<T>, require_stan_scalar_t<T>> {

  using map_t = typename Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, 1>>;
  using return_t = return_type_t<T>;

  static inline return_t apply(const std::vector<T>& x) {
    return apply_vector_unary_reduce<F, map_t>::apply(as_eigen(x));
  }
};

template <typename F, typename T>
struct apply_vector_unary_reduce<F, T, require_vector_vt<is_vector, T>> {

  using vec_type = value_type_t<T>;
  using return_t = typename std::vector<scalar_type_t<vec_type>>;

  static inline return_t apply(const T& x) {
    return_t result(x.size());
    for(int i = 0; i < x.size(); i++){
      result[i] = apply_vector_unary_reduce<F, vec_type>::apply(x[i]);
    }
    return result;
  }
};

You'll notice that the Eigen declarations are the same as in apply_vector_unary, since the auto return type lets us be agnostic here. But for the std::vector<T> inputs, the return type needs to be specified.

andrjohns · 2019-11-12T00:44:07Z

Could we just replace the call to apply scalar unary for vectors etc. with the above stuff? That would be an easy plug and play

That's a bit trickier, since apply_scalar_unary is overloaded for scalar inputs, several functions only have a prim/mat definition, and not a corresponding prim/scal (e.g. ceil and cos). Those functions would need a prim/scal definition introduced before apply_scalar_unary could be replaced

bob-carpenter · 2019-11-12T01:06:42Z

... idt we want to think about this in the R way of applying to matrix rows and columns.

Once we get closures, we'll probably want to add those map functions. We already have some specialized cross-product functions.

bob-carpenter · 2019-11-12T01:15:19Z

We definitely need specific-case reductions like log-sum-exp. General ones might also be useful. The algorithms std library has accumulators that Steve's been adding to the Stan math lib code. The accumulators are based on initialization and combination. These will be more useful when get more functional programming in Stan. I'm not going to be the first to say the m-word.

SteveBronder · 2019-11-12T01:20:46Z

the m-word.

Anyone who mentions those things and forces me to google what they are for the 19th time will be reported for CoC violation

bob-carpenter · 2019-11-12T01:27:07Z

When even the Wikipedia's explanation is a mess, I don't hold out much hope that the 19th time will be the charm.

…

On Nov 11, 2019, at 8:20 PM, Steve Bronder ***@***.***> wrote: the m-word. Anyone who mentions those things and forces me to google what they are for the 19th time will be reported for CoC violation — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

t4c1 · 2019-11-12T07:36:09Z

Just a random Idea. Do we for some reason need to separate function implementations from function signatures into a separate struct? I think we could get code that is a bit easier to read if we used generic lambdas and template deduction. So instead of this:

struct log_softmax_fun {
  template <typename T>
  static inline auto fun(const T& v) {
  }
};

template <typename T>
inline auto log_softmax(const T& x) {
  return apply_vector_unary<log_softmax_fun, T>::apply(x);
}

Is there any reason not to do this:

template <typename T>
inline auto log_softmax(const T& x) {
  return apply_vector_unary<T>::apply(x, [](auto& v){
    check_nonzero_size("log_softmax", "v", v);
    return v.array() - log_sum_exp(v);
  });
}

Of cource apply_vector_unary would have to modified a bit to accept functor instead of template type with run member function.

bob-carpenter · 2019-11-13T17:53:03Z

The one advantage to the way it is now is that the functions are all static so that instances never get constructed. That may not be very high overhead compared to everything else. In the testing framework, I've just been using lambdas everywhere.

…

On Nov 12, 2019, at 2:36 AM, Tadej Ciglarič ***@***.***> wrote: Just a random Idea. Do we for some reason need to separate function implementations from function signatures into a separate struct? I think we could get code that is a bit easier to read if we used generic lambdas and template deduction. So instead of this: struct log_softmax_fun { template <typename T> static inline auto fun(const T& v) { } }; template <typename T> inline auto log_softmax(const T& x) { return apply_vector_unary<log_softmax_fun, T>::apply(x); } Is there any reason not to do this: template <typename T> inline auto log_softmax(const T& x) { return apply_vector_unary<T>::apply(x, [](auto& v){ check_nonzero_size("log_softmax", "v", v); return v.array() - log_sum_exp(v); }); } Of cource apply_vector_unary would have to modified a bit to accept functor instead of template type with run member function. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

SteveBronder · 2019-11-13T18:09:22Z

They are trivially constructible tho' so idt the overhead would be that much. Could run both through godbolt to see what happens

t4c1 · 2019-11-13T18:21:34Z

I think there should not be any overhead in optimized code. But it can't hurt to check.

wds15 · 2019-11-13T19:18:43Z

Uhm... this is a great thread and I am really looking forward to this template magic...given the complexity and size of discussion would it make sense to write a brief design doc and file that in the design repo?

(I really never like when people tell me to write design docs - lots of work - but after having done it things gain in clarity)

Feel free to ignore my comment.

SteveBronder · 2019-11-13T20:29:23Z

^what does and doesn't need a design doc is still kind of vague, but I think it does fit here. Plus with the new site we can take the approved design doc and add it there so contributors have an easier way to get why we do things a certain way.

andrjohns · 2019-11-27T10:47:08Z

I'm on board with the design doc, especially since the scope of this has changed quite a bit from my initial post (in a good way!). I'd just like to clarify the intended direction of this before I write up a doc about the wrong thing.

In my mind, this issue is now concerned with extending the existing prim/mat functions that take Eigen column vectors as inputs to also work with Eigen row vectors and std::vectors, without the need for additional specialisations for each type (and combination) of inputs. There are going to be several different implementations, given the broad taxonomy of vector-based functions we have at the moment:

f(vector) -> scalar
- e.g. log_sum_exp
f(vector) -> vector
- e.g. log_softmax
f(vector) -> matrix
- e.g. diag_matrix
f(vector, scalar) -> vector
- e.g. head
f(vector, vector) -> scalar
- e.g. dot_product
f(vector, vector) -> vector
- e.g. elt_multiply

(Let me know if I've missed a type in there)

In terms of a design, I'm not sure whether it would be possible to have a flexible implementation that would handle more than one of these types of functions, or whether we need a separate one for each (e.g. apply_vector_unary_reduce,apply_vector_unary, and apply_vector_scalar_unary; although it seems like the naming itself might also get out of hand here)

bob-carpenter · 2019-11-27T13:40:18Z

I would very strongly prefer that we do *not* extend matrix operations to std::vector. I know there's been a lot of pressure to make Stan more like R in that it will guess what the shape of your vector is, but I very much want to keep Stan strongly typed. For instance, I do *not* want a std::vector dot product, or an Eigen row vector * std::vector to be a scalar. You should be able to be flexible on the argument types in a meta-programming framework---it's a pretty easy generalization. At worst, you can just take the shape as a template parameter. There are many more signatures than listed when you get into matrix operations. real(row-vector, vector) matrix(vector, row-vector) matrix(matrix, vector) vector(real, vector) .... But those are exactly the ones I don't want to generalize to std::vector. There's also the distribution functions, which are already coded to be flexible about this using views.

andrjohns · 2019-11-27T22:03:00Z

I would very strongly prefer that we do not extend matrix operations to std::vector

That sounds good to me, it was going to be a bit of a mess determining whether the to treat them as column or row-vectors.

andrjohns mentioned this issue Dec 15, 2019

Unary Vector Functions - Vectorisation framework stan-dev/design-docs#12

Merged

andrjohns mentioned this issue Dec 29, 2019

Generalised unary vector function framework #1558

Merged

5 tasks

rok-cesnovar closed this as completed in 4c32cfe Jan 24, 2020

mcol added this to the 3.1.0++ milestone Jan 27, 2020

Use Eigen::Map to replace arr functions #1425

Use Eigen::Map to replace arr functions #1425

Comments

andrjohns commented Oct 28, 2019

Description

Current Version:

rok-cesnovar commented Oct 28, 2019

syclik commented Oct 28, 2019 via email

syclik commented Oct 28, 2019

andrjohns commented Oct 29, 2019

bbbales2 commented Oct 29, 2019

andrjohns commented Oct 29, 2019

syclik commented Oct 29, 2019

andrjohns commented Oct 29, 2019

t4c1 commented Oct 29, 2019

andrjohns commented Oct 29, 2019

t4c1 commented Oct 29, 2019

t4c1 commented Oct 29, 2019

andrjohns commented Oct 29, 2019

t4c1 commented Oct 29, 2019

SteveBronder commented Oct 29, 2019

bob-carpenter commented Oct 29, 2019 via email

bob-carpenter commented Oct 29, 2019 via email

bob-carpenter commented Oct 29, 2019 via email

andrjohns commented Oct 30, 2019

andrjohns commented Oct 30, 2019

SteveBronder commented Oct 30, 2019

andrjohns commented Oct 30, 2019

bbbales2 commented Oct 30, 2019 • edited Loading

bbbales2 commented Oct 30, 2019

avehtari commented Oct 30, 2019

SteveBronder commented Oct 30, 2019

andrjohns commented Oct 30, 2019

SteveBronder commented Oct 30, 2019

bbbales2 commented Oct 30, 2019

bob-carpenter commented Oct 30, 2019 via email

SteveBronder commented Nov 4, 2019

andrjohns commented Nov 8, 2019

t4c1 commented Nov 8, 2019

bob-carpenter commented Nov 8, 2019 via email

andrjohns commented Nov 8, 2019

t4c1 commented Nov 8, 2019

andrjohns commented Nov 8, 2019

andrjohns commented Nov 11, 2019 • edited by SteveBronder Loading

andrjohns commented Nov 11, 2019

SteveBronder commented Nov 11, 2019

SteveBronder commented Nov 11, 2019

andrjohns commented Nov 12, 2019

andrjohns commented Nov 12, 2019

bob-carpenter commented Nov 12, 2019 via email

bob-carpenter commented Nov 12, 2019 via email

SteveBronder commented Nov 12, 2019

bob-carpenter commented Nov 12, 2019 via email

t4c1 commented Nov 12, 2019

bob-carpenter commented Nov 13, 2019 via email

SteveBronder commented Nov 13, 2019

t4c1 commented Nov 13, 2019

wds15 commented Nov 13, 2019

SteveBronder commented Nov 13, 2019

andrjohns commented Nov 27, 2019

bob-carpenter commented Nov 27, 2019 via email

andrjohns commented Nov 27, 2019

bbbales2 commented Oct 30, 2019 •

edited

Loading

andrjohns commented Nov 11, 2019 •

edited by SteveBronder

Loading