Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let value_of and value_of_rec return expressions #1872

Merged
merged 8 commits into from
May 15, 2020

Conversation

t4c1
Copy link
Contributor

@t4c1 t4c1 commented May 8, 2020

Summary

value_of and value_of_rec now can now return Eigen expressions. So if their result is only used in a single expression it needs not be saved in memory. We can do that even before all functions are generalized as these two functions are not exposed to stan lang. value_of_rec now also uses perfect forwarding.

All functions using eigher of these two functions and acceptiong the result into a variable of type auto can only use such a variable once. Otherwise they would evaluate the expression multiple times. If they need to use it multiple times they can accept the result into Eigen::Ref. A helper fucntion to_ref has been added to simplify that. All existing cases that would evaluate result of one of these two functions have been fixed using to_ref.

Tests

Added tests for value_of and value_of_rec used on expressions.

Side Effects

See second paragraph of the Summary.

Release notes

value_of and value_of_rec now can now return Eigen expressions.

Checklist

  • Math issue Generalize matrix function signatures #1470

  • Copyright holder: Tadej Ciglarič

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@t4c1
Copy link
Contributor Author

t4c1 commented May 11, 2020

Oh, it seems this need to wait on #1848 to be merged. Possibly also on #1847 and #1844.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 5.09 5.19 0.98 -1.85% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.01 0.67% faster
eight_schools/eight_schools.stan 0.09 0.09 1.01 0.94% faster
gp_regr/gp_regr.stan 0.23 0.23 1.0 -0.44% slower
irt_2pl/irt_2pl.stan 6.59 6.55 1.01 0.53% faster
performance.compilation 85.89 83.99 1.02 2.21% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.69 7.69 1.0 0.05% faster
pkpd/one_comp_mm_elim_abs.stan 21.22 21.15 1.0 0.33% faster
sir/sir.stan 95.42 96.34 0.99 -0.96% slower
gp_regr/gen_gp_data.stan 0.05 0.05 0.99 -1.38% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.01 3.0 1.0 0.3% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.32 0.34 0.95 -4.8% slower
arK/arK.stan 1.83 1.84 0.99 -0.71% slower
arma/arma.stan 0.83 0.83 1.0 0.34% faster
garch/garch.stan 0.56 0.56 1.0 0.03% faster
Mean result: 0.99708041898

Jenkins Console Log
Blue Ocean
Commit hash: 4aec6f1


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@t4c1 t4c1 requested a review from andrjohns May 14, 2020 07:59
Copy link
Collaborator

@andrjohns andrjohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple queries, mostly around return types with forwarding references. Otherwise looking pretty good!

stan/math/prim/fun/to_ref.hpp Show resolved Hide resolved
stan/math/prim/fun/to_ref.hpp Show resolved Hide resolved
stan/math/prim/fun/value_of.hpp Show resolved Hide resolved
stan/math/prim/prob/hmm_marginal_lpdf.hpp Outdated Show resolved Hide resolved
test/unit/math/prim/fun/value_of_rec_test.cpp Outdated Show resolved Hide resolved
test/unit/math/prim/fun/value_of_rec_test.cpp Outdated Show resolved Hide resolved
@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 5.07 5.26 0.96 -3.72% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -1.7% slower
eight_schools/eight_schools.stan 0.09 0.09 1.01 0.6% faster
gp_regr/gp_regr.stan 0.23 0.23 1.03 2.55% faster
irt_2pl/irt_2pl.stan 6.58 6.57 1.0 0.08% faster
performance.compilation 85.28 83.98 1.02 1.52% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.68 7.7 1.0 -0.24% slower
pkpd/one_comp_mm_elim_abs.stan 20.41 19.93 1.02 2.34% faster
sir/sir.stan 93.77 93.13 1.01 0.68% faster
gp_regr/gen_gp_data.stan 0.05 0.05 1.0 0.0% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.02 3.04 0.99 -0.56% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.32 0.34 0.95 -5.23% slower
arK/arK.stan 1.83 1.84 0.99 -0.51% slower
arma/arma.stan 0.82 0.82 0.99 -0.67% slower
garch/garch.stan 0.57 0.56 1.01 0.67% faster
Mean result: 0.997609594792

Jenkins Console Log
Blue Ocean
Commit hash: 5c9a64d


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Collaborator

@andrjohns andrjohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rok-cesnovar rok-cesnovar merged commit d06f5fd into stan-dev:develop May 15, 2020
@rok-cesnovar rok-cesnovar deleted the value_of_return_expr branch May 15, 2020 04:47
@bbbales2
Copy link
Member

@t4c1 Is there a design doc or anything on these expressions? Or if there isn't one, do you mind starting one?

I don't mind using these things, and I don't doubt that they can be useful, but I had a run-in with this pull request where I was banking on value_of giving me a regular Eigen type, and it gave me something else. This turned into a segfault in a weird place in the code so I had to revert the change in my branch (here).

The line in question is here. In that case I'm trying to save a copy of all the arguments in a tuple for the reverse pass. An expression won't work, cause the original memory will go out of scope.

I guess vaguely questions I have for this are:

  1. If my function takes in something as input, when should I evaluate it? If I use the values more than once? More than twice?

  2. If I get an input matrix, how much does an operator()(int i, int j) access cost?

  3. When should I return an expression and when should I eval to return a non-expression?

  4. If I have a variable a which is an expression, if f(a) evaluates a, would f(b) be able to take advantage of that evaluation or does it do its own?

  5. Is there a way I can still get copies of variables when I need them? (if not I think I need a way -- see above).

@andrjohns
Copy link
Collaborator

I'll leave the rest of the question to tadej, but this should fix your code:

std::tuple<decltype(value_of(plain_type_t<T_Args>()))...> value_of_args_tuple_;

Addingplain_type_t will give the type of the final Eigen object that the expression would evaluate to

@bbbales2
Copy link
Member

Oooo, right, and then the expression would copy. That makes sense thanks.

@t4c1
Copy link
Contributor Author

t4c1 commented May 22, 2020

There is no design doc. Since Eigen was used by Stan Math for a long time I assumed people were somewhat familiar with it. I guess I was wrong and design doc really would be helpful. Anyway I will try to answer your questions here:

  1. General rule of tumb is that you don't know how complex input expressions are so you should never evaluate them more than once.

  2. This one is a bit more tricky. In general you are not losing much if you always evaluate before using elementwise access (also you should prefere .coeff() and .coeffRef() functions to [] and (), as they do not do bounds checks).

  3. For now only the functions, not exposed to stan language are allowed to return expressions. Exposed functions will be allowed to return expressions once all of them can take expressions as inputs.

  4. You better evaluate an expression before calling two functions with it.

  5. Assigning to plain_type_t<T> is a sure way to make a copy. In most cases Eigen::Ref is preferable as is does not evaluate simple expressions such as .block(). Or wait for Generalize glms #1899 to be merged - it introduces ref_type_t<T>, which avoids some weird edge cases of Eigen::Ref (its copy constructor does not copy anything).

@bbbales2
Copy link
Member

I assumed people were somewhat familiar with it

Ha! Tricked you.

also you should prefere .coeff() and .coeffRef()

Yuck. Thanks for the explanation though. I saw some function calls like this the other day and wanted to change them to ().

For now only the functions, not exposed to stan language are allowed to return expressions.

Oh I see. Yes this would be good thing to talk about in a design doc, especially the rollout, since this leads to accidental breaks.

You better evaluate an expression before calling two functions with it.

Oooh, so I suppose part of this will include some stanc3 analysis to make sure this doesn't happen?

@t4c1
Copy link
Contributor Author

t4c1 commented May 22, 2020

Oooh, so I suppose part of this will include some stanc3 analysis to make sure this doesn't happen?

I intended this to be transparent to Stan language and compiler - it all stays within Math. But if compiler can be somehow used to check that, I am alll for it.

@bbbales2
Copy link
Member

Well since this could cause slowdowns if we accidentally use expressions more than once, we kinda have to push this up to the language level too to avoid getting silently bit by it.

@t4c1
Copy link
Contributor Author

t4c1 commented May 22, 2020

I really do not understand how do you propose to do it. AFAIK stanc parses stan lang not C++. And this changes happen in C++.

@rok-cesnovar
Copy link
Member

rok-cesnovar commented May 22, 2020

Just to clarifiy:

matrix A[5,5] = D*F + 2*C*(D+4);
matrix B[5,5] = A*A + A*A;

This isn't problematic as A will be evaluated once assigned in the first line. This is basically to reduce copies inside D*F + 2*C*(D+4);. Right?

Anyhow, yes. Lets make a doc.

@t4c1
Copy link
Contributor Author

t4c1 commented May 22, 2020

Right. Doc PR is up here: stan-dev/design-docs#23

@t4c1
Copy link
Contributor Author

t4c1 commented May 22, 2020

Actually not to reduce copies. To reduce number of times matrices are loaded from RAM to CPU.

@rok-cesnovar
Copy link
Member

Yeah I meant the "copy" from RAM to the CPU. I didn't realize there was a doc already. Thanks.

@bbbales2
Copy link
Member

This isn't problematic as A will be evaluated once assigned in the first line.

Ooooh, okay, I see. Yeah that was what I was worried about.

As long as matrix A[5,5] compiles to a non-expression type then that'll cause the copy to happen.

AFAIK stanc parses stan lang not C++

Well it uses the C++ and makes assumptions about the C++.

I vaguely remember at one point discussions about:

real a = 1.0;

Like should that be a var or should that be a double, and I thought there was discussion of leaning on auto to make that happen.

This would mean leaning on C++ auto for types is a no-go, but not sure if that problem still exists or whatever, but they are connected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants