-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify the reverse mode by having just one DiffMode for it #964
base: master
Are you sure you want to change the base?
Conversation
This is a big change so having different opinions on the PR would be great. We discussed the idea with @vgvassilev. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
8a643a5
to
67190a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
using type = void (C::*)(Args..., OutputParamType_t<Args, Args>..., \ | ||
double&) cv vol ref noex; \ | ||
}; | ||
#define GradientDerivedEstFnTraits_AddSPECS(var, cv, vol, ref, noex) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function-like macro 'GradientDerivedEstFnTraits_AddSPECS' used; consider a 'constexpr' template function [cppcoreguidelines-macro-usage]
#define GradientDerivedEstFnTraits_AddSPECS(var, cv, vol, ref, noex) \
^
for (const auto& dParam : DVI) { | ||
const clang::ValueDecl* arg = dParam.param; | ||
const auto* begin = request->param_begin(); | ||
const auto* end = std::next(begin, numParams); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: narrowing conversion from 'std::size_t' (aka 'unsigned long') to signed type 'typename iterator_traits<ParmVarDecl *const *>::difference_type' (aka 'long') is implementation-defined [cppcoreguidelines-narrowing-conversions]
const auto* end = std::next(begin, numParams);
^
This looks really good 👍🏼 |
This PR also Partially addresses #721. |
@@ -482,8 +482,7 @@ namespace clad { | |||
// GradientDerivedEstFnTraits specializations for pure function pointer types | |||
template <class ReturnType, class... Args> | |||
struct GradientDerivedEstFnTraits<ReturnType (*)(Args...)> { | |||
using type = void (*)(Args..., OutputParamType_t<Args, Args>..., | |||
double&); | |||
using type = void (*)(Args..., OutputParamType_t<Args, void>..., void*); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this became a void*
from a double&
? What is the benefit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary to enable overloads in error estimation. double&
cannot be converted to void*
so I replaced it with void*
in the overload and double*
in other places. This is the reason _final_error
became pointer-type.
@@ -15,9 +15,9 @@ template <typename T> struct Experiment { | |||
|
|||
// CHECK: void operator_call_hessian(double i, double j, double *hessianMatrix) { | |||
// CHECK-NEXT: Experiment<double> _d_this; | |||
// CHECK-NEXT: this->operator_call_darg0_grad(i, j, &_d_this, hessianMatrix + {{0U|0UL}}, hessianMatrix + {{1U|1UL}}); | |||
// CHECK-NEXT: this->operator_call_darg0_pullback(i, j, 1, &_d_this, hessianMatrix + {{0U|0UL}}, hessianMatrix + {{1U|1UL}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing 1
as hardcoded number for d_x
will probably be some sort of bugs/confusion. Can we make sure somehow we require literals either 0
or 1
? Maybe some enum constant...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always 1
here. Setting the _d_y
parameter of a pullback to 1
essentially makes it equivalent to the corresponding gradient. This is the idea I described in the PR message.
test/Misc/RunDemos.C
Outdated
//CHECK_FLOAT_SUM: for (; _t0; _t0--) { | ||
//CHECK_FLOAT_SUM: i--; | ||
//CHECK_FLOAT_SUM: { | ||
//CHECK_FLOAT_SUM: _final_error += std::abs(_d_sum * sum * 1.1920928955078125E-7); | ||
//CHECK_FLOAT_SUM: *_final_error += std::abs(_d_sum * sum * 1.1920928955078125E-7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we pass this by pointer now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable overloads in error estimation (I explained this in more detail above).
c37e97f
to
e9504a6
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #964 +/- ##
==========================================
- Coverage 93.80% 93.71% -0.10%
==========================================
Files 55 55
Lines 8065 7997 -68
==========================================
- Hits 7565 7494 -71
- Misses 500 503 +3
... and 1 file with indirect coverage changes
|
…th pullbacks when necessary.
@PetroZarytskyi, can you rebase this PR and then @kchristin22 can try it out in the context of cuda and provide a minimal example that should work... |
Currently, on master, we have two reverse diff modes:
DiffMode::reverse
for gradients andDiffMode::experimental_pullback
for pullbacks. In this PR, they are essentially merged intoDiffMode::reverse
. This has been achieved by placing a pullback in the gradient overload instead of a gradient function.Let's consider an example:
->
On master:
In this PR:
Note: To make this system work with error estimation, I had to enable overloads there. To do that, I had to change the type of
_final_error
parameters fromdouble&
todouble*
.Advantages:
Derive
andDerivePullback
do almost the same job. This PR removesDerivePullback
completely.Disadvantages:
void*
adjoint parameter types. e.g. for a functiondouble f(double a, double b)
, it doesn't make sense anymore to forward declarevoid f_grad(double a, double b, double *_d_a, double *_d_b)
. The optionsvoid f_grad(double a, double b, void *_d_a, void *_d_b)
andvoid f_pullback(double a, double b, double _d_y, void *_d_a, void *_d_b)
still work.However, forward declarations don't seem to be that widely used. For example, when we changed all array_ref adjoint types to pointers in the gradient signature, this didn't break a single ROOT test. The main way to execute derivatives (with
CladFunction
) works as before._d_y
parameter. This may make it harder to understand the derivative code. Moreover, every time the function has a parameter namedy
, the pullback parameter will be renamed to_d_y0
to avoid name collisions. This could make the code even more confusing. However, we can fix the last problem by giving the pullback parameter a different name.