-
-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Adds boost root finders with reverse mode specializations #2720
base: develop
Are you sure you want to change the base?
Conversation
(1) I imagine it will end up like the ODE solvers where we have slightly different versions with a mostly common set of arguments. But Halley is probably the one that would get used the most inside Stan Math. |
(2) It seems like it would be good to permit (or require) one functor that returns a tuple of: level, first derivative, second derivative. |
(3) If a root needs to be found for multiple parameter values, I would say that the caller is responsible for invoking |
I think we should default the
|
You could also easily test |
From your email, |
Also, @charlesm93 might have some thoughts. As I mentioned yesterday, the motivation for having a one-dimensional specialization of
I guess the main question is what is the fastest way of implicitly differentiating the root with respect to unknown parameters? A concrete example would be thinking about implementing the inverse CDF of the Beta distribution in C++ with possibly unknown shape parameters. Given a cumulative probability, |
We should probably also do the beta inverse CDF as a test case. Boost has a more advanced way of obtaining an initial guess in this case, but they still usually do Halley iteration
|
So I got the tests for beta up and running, for the beta lcdf it seems to work fine for values where lgamma(alpha + beta) > 1. Maybe I wrote the derivative of the pdf wrt x wrong? Or there could be a numerics trick I'm missing. For the test against boost's quantile function,
|
Also do we want to test lcdf or cdf functions? |
There are analytical derivatives for the beta quantile case at https://functions.wolfram.com/GammaBetaErf/InverseBetaRegularized/20/01/ |
@bgoodri so I got the rev test up and running with the explicit gradients for the beta cdf. I also wrote a lil' function that just checks the gradients and our function to make sure they are near'ish to each other. Though the test is failing for some odd values that I wouldn't expect them to. The failing ones all print their values after a failure so you can take a look if you like (though note after one failure it will just keep printing all the sets of values it is testing. This is just because I have not figured out how to stop that from happening in gtest). I wonder if it could be some numerically inaccuracy with our |
(2): if the function is going to be used internally, we should try and optimize as much as possible. I see two options:
If this is exposed to the user, we may want to trade some efficiency for a simpler API. |
Why is there a specialized rev method? The implicit function is valid for both forward and reverse mode autodiff. Since the solution is a scalar, the full Jacobian of derivatives reduces to a vector, so we don't gain much from doing contractions with either an initial tangent or a cotangent. I think it would make sense to explicitly compute the full Jacobian and then do the vector-vector product for either forward or reverse mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good to me but it still requires some additional work. Some of the C++ looks arcane to me, so I've asked for clarifications and I'm relying on the quality of the unit tests.
Here are the main requests:
- The unit tests need to check that the correct root is returned. You can work out the solution analytically; plug the root into the function being evaluated and check that it is close to 0 within the specified tolerance; or use
algebra_solver
as a benchmark. - When setting tuning parameters for the rootfinder, leave a comment to justify those choices. Are these arbitrary tuning parameters or are they required to make the root finder work?
- Need additional tests which prompt the error messages.
- Address comments on the file. A handful of them are about doxygen doc.
Some suggestions:
- Consider using the implicit function theorem for both forward and reverse mode autodiff. I don't think using a direct method for forward diff is justified.
- As mentioned in the discussion, consider passing only one functor rather than a tuple of functors.
namespace math { | ||
namespace internal { | ||
template <typename Tuple, typename... Args> | ||
inline auto func_with_derivs(Tuple&& f_tuple, Args&&... args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some doxygen doc. It's not clear what this function does.
* initial lower bracket | ||
* @param max The maximum possible value for the result, this is used as an | ||
* initial upper bracket | ||
* @param digits The desired number of binary digits precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indicate that digits cannot exceed the precision of f_tuple
.
return boost::math::tools::halley_iterate(args...); | ||
}, | ||
std::forward<FTuple>(f_tuple), guess, min, max, digits, max_iter, | ||
std::forward<Types>(args)...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This my lacking C++ skills speaking but could you describe how the call to root_finder_tol
works here? How does the [](auto&&... args)
argument work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return root_finder_tol(
[](auto&&... args) {
return boost::math::tools::halley_iterate(args...);
},
std::forward<FTuple>(f_tuple), guess, min, max, digits, max_iter,
std::forward<Types>(args)...);
The lambda is in this call to tell root_finder_tol
which root finder it is using. So in this case we pass a lambda expecting a parameter pack that calls the halley root finder. If we used a struct here it would look like
struct halley_iterator {
template <typename Types>
operator(Types&&... args) {
return boost::math::tools::halley_iterate(args...);
}
};
auto root_finder(SolverFun&& f_solver, FTuple&& f_tuple, | ||
const GuessScalar guess, const MinScalar min, | ||
const MaxScalar max, Types&&... args) { | ||
constexpr int digits = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how was this default chosen? Maybe add one sentence about this choice in the doxygen doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to update this to be in line with what boost's docs say
The value of digits is crucial to good performance of these functions, if it is set too high then at best you will get one extra (unnecessary) iteration, and at worst the last few steps will proceed by bisection. Remember that the returned value can never be more accurate than f(x) can be evaluated, and that if f(x) suffers from cancellation errors as it tends to zero then the computed steps will be effectively random. The value of digits should be set so that iteration terminates before this point: remember that for second and third order methods the number of correct digits in the result is increasing quite substantially with each iteration, digits should be set by experiment so that the final iteration just takes the next value into the zone where f(x) becomes inaccurate. A good starting point for digits would be 0.6D for Newton and 0.4D for Halley or Shröder iteration, where D is std::numeric_limits::digits.
https://www.boost.org/doc/libs/1_62_0/libs/math/doc/html/math_toolkit/roots/roots_deriv.html
[&x_var, &f](auto&&... args) { return f(x_var, std::move(args)...); }, | ||
std::move(args_vals_tuple)); | ||
fx_var.grad(); | ||
Jf_x = x_var.adj(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a scalar. You might want to call it df_dx, since J usually suggest a Jacobian matrix (but this is a minor detail). Does nested_rev_autodiff
mean you're using reverse mode? For computing a scalar, I recommend using forward mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does nested_rev_autodiff mean you're using reverse mode?
Yes
For computing a scalar, I recommend using forward mode.
What is the intuition on fwd being faster in this case? I thought if it was many input single output then reverse is the best?
return beta_lpdf_deriv(x, alpha, beta, p); | ||
}; | ||
double guess = .3; | ||
double min = 0.0000000001; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would min = 0
not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not this should be indicated in a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I can update this
double p = 0.45; | ||
double alpha = .5; | ||
double beta = .4; | ||
stan::test::expect_ad(full_f, alpha, beta, p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder for me: this checks both reverse and forward mode AD, and higher-order AD as well, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does expect_ad
allow a tolerance parameter -- as a way to keep track of how closely finite-diff and autodiff agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in addition to testing derivatives, are you also checking that the root finder returns the correct value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder for me: this checks both reverse and forward mode AD, and higher-order AD as well, correct?
Yep!
Does expect_ad allow a tolerance parameter -- as a way to keep track of how closely finite-diff and autodiff agree?
Yes, though I'm not sure yet which level of tolerance we can / should expect
in addition to testing derivatives, are you also checking that the root finder returns the correct value?
It just checks finite diff vs autodiff of the function for correct gradients, if the primative and reverse mode specializations give back different answers it would throw an error, but I do need some test against like our current algebra solver to make sure things are as we expect.
double alpha = .4; | ||
double beta = .5; | ||
stan::test::expect_ad(full_f, alpha, beta, p); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about this unit test that I made for the previous test.
<< finit_grad_fx(1) | ||
<< "\n" | ||
"beta: " | ||
<< finit_grad_fx(2) << "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is here for the discussion in the PR, but in time, we'll remove the std::cout
.
double known_p_grad = deriv_p(p, a, b); | ||
double known_alpha_grad = deriv_a(p, a, b); | ||
double known_beta_grad = deriv_b(p, a, b); | ||
std::cout << "--- Mathematica Calculate Grad----\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using another software as a benchmark, it would be good to have the code or a link to the code (or in the case of Mathematica, the prompt that you use)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes def agree would be good to copy/paste those into here
Thanks @charlesm93 |
Thanks @charlesm93! I'm going to rework the API rn, I'd like to make it a bit more efficient for reverse mode which would mean supplying two separate functors, one for calculating the function and its derivatives all at the same time for efficiency, and another functor for calculating just the function |
…eature/root-finder
@bgoodri alrighty so I got reverse mode to be adjoints to be right up to 1e-9. It turns out that we need more precision when doing the forward pass during reverse mode which I found kind of weird and interesting. For the beta cdf example it seemsl like we need about 23 (???) digits of precision which is very odd to me. Like we only work in doubles so I'm not sure why having anything over 16 would be a thing? But tests now pass for testing alpha,beta, and p from .1 to .9 |
The forward-mode values are often used as part of the derivative calculations, so I suspect that may be the problem.
As you point out, we only have about 16 digits of relative precision in double-precision, so we can't return results with that high precision. CPUs often do higher-precision internal arithmetic then truncate the return, but there's simply no way to get 23 digits of accuracy in a floating-point result at double precision. We can get to within 1e-23 of an answer, though in absolute precision. |
May I suggest that we adopt existing (closed-form) QF/CDF pairs for tests? Here are suggested QFs and CDFs (where missing in Stan).
All of these distributions have closed form CDF and QF so we always have analytical solution to check against. |
Summary
Adds a wrapper for boost's Halley root finding algorithm along with a reverse mode specialization similar to our algebra solver. Note these functions are only meant to be used at the Stan math level. Specifically for the quantile functions in stan-dev/design-docs#42
This also works for forward mode, but all it does is iterate through the boost functions so it is probably very slow.
Tests
Tests for forward and reverse mode using the cubed and 5th root examples from boost's docs. Tests can be run with
Side Effects
Note that I had to write specializations for
frexp
andsign
forfvar
andvar
types because boost would throw compiler errors if they did not exist.The WIP here is because of some questions for @bgoodri
Release notes
Adds boost root finders with reverse mode specializations for
Checklist
Math issue #
Copyright holder: Steve Bronder
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested