Simplify the reverse mode by having just one DiffMode for it #964

PetroZarytskyi · 2024-07-02T11:55:06Z

Currently, on master, we have two reverse diff modes: DiffMode::reverse for gradients and DiffMode::experimental_pullback for pullbacks. In this PR, they are essentially merged into DiffMode::reverse. This has been achieved by placing a pullback in the gradient overload instead of a gradient function.
Let's consider an example:

// Original code:
double f(double a, double b) {
    ...
}

int main() {
    auto df = clad::gradient(f);
    ...
}

->
On master:

// Generated derivative code:
double f_grad(double a, double b, double *_d_a, double *_d_b) {
    ...
}

// This overload is placed in the clad::gradient call
double f_grad(double a, double b, void *_temp_d_a, void *_temp_d_b) {
    double *_d_a = (double *)_temp_d_a;
    double *_d_b = (double *)_temp_d_b;
    f_grad(a, b, _d_a, _d_b);
}

In this PR:

// Generated derivative code:
double f_pullback(double a, double b, double _d_y, double *_d_a, double *_d_b) {
    ...
}

// This is placed in the clad::gradient call
double f_grad(double a, double b, void *_temp_d_a, void *_temp_d_b) {
    double *_d_a = (double *)_temp_d_a;
    double *_d_b = (double *)_temp_d_b;
    f_pullback(a, b, 1, _d_a, _d_b);
}

Note: To make this system work with error estimation, I had to enable overloads there. To do that, I had to change the type of _final_error parameters from double& to double*.

Advantages:

On master, we have 11 DiffModes, many of which use the same visitors. Having a unified reverse DiffMode makes the system easier to understand.
In RMV, Derive and DerivePullback do almost the same job. This PR removes DerivePullback completely.
With this PR, clad does not use overloads for the reverse mode anymore: just one gradient function and one pullback function. This is a great step towards supporting C, which does not have overloads.
Differentiating recursive functions used to generate both the gradient and the pullback. Now only the pullback is generated.

Disadvantages:

Now gradient forward declaration is only supported with void* adjoint parameter types. e.g. for a function double f(double a, double b), it doesn't make sense anymore to forward declare void f_grad(double a, double b, double *_d_a, double *_d_b). The options void f_grad(double a, double b, void *_d_a, void *_d_b) and void f_pullback(double a, double b, double _d_y, void *_d_a, void *_d_b) still work.
However, forward declarations don't seem to be that widely used. For example, when we changed all array_ref adjoint types to pointers in the gradient signature, this didn't break a single ROOT test. The main way to execute derivatives (with CladFunction) works as before.
Now all differentiated functions have the pullback _d_y parameter. This may make it harder to understand the derivative code. Moreover, every time the function has a parameter named y, the pullback parameter will be renamed to _d_y0 to avoid name collisions. This could make the code even more confusing. However, we can fix the last problem by giving the pullback parameter a different name.

PetroZarytskyi · 2024-07-02T11:57:51Z

This is a big change so having different opinions on the PR would be great. We discussed the idea with @vgvassilev.
@vaithak I'd love to know your thoughts.

github-actions

clang-tidy made some suggestions

include/clad/Differentiator/ReverseModeVisitor.h

lib/Differentiator/ReverseModeVisitor.cpp

github-actions

clang-tidy made some suggestions

github-actions · 2024-07-02T12:58:47Z

include/clad/Differentiator/FunctionTraits.h

-    using type = void (C::*)(Args..., OutputParamType_t<Args, Args>...,           \
-                             double&) cv vol ref noex;                         \
-  };
+#define GradientDerivedEstFnTraits_AddSPECS(var, cv, vol, ref, noex)         \


warning: function-like macro 'GradientDerivedEstFnTraits_AddSPECS' used; consider a 'constexpr' template function [cppcoreguidelines-macro-usage]

#define GradientDerivedEstFnTraits_AddSPECS(var, cv, vol, ref, noex) \ ^

github-actions · 2024-07-02T12:58:48Z

include/clad/Differentiator/ReverseModeVisitor.h

+        for (const auto& dParam : DVI) {
+          const clang::ValueDecl* arg = dParam.param;
+          const auto* begin = request->param_begin();
+          const auto* end = std::next(begin, numParams);


warning: narrowing conversion from 'std::size_t' (aka 'unsigned long') to signed type 'typename iterator_traits<ParmVarDecl *const *>::difference_type' (aka 'long') is implementation-defined [cppcoreguidelines-narrowing-conversions]

const auto* end = std::next(begin, numParams); ^

vaithak · 2024-07-03T14:12:34Z

This looks really good 👍🏼
Thanks, @PetroZarytskyi, for improving this.

vgvassilev · 2024-07-06T15:39:48Z

This PR also Partially addresses #721.

vgvassilev · 2024-07-06T16:48:47Z

include/clad/Differentiator/FunctionTraits.h

@@ -482,8 +482,7 @@ namespace clad {
  // GradientDerivedEstFnTraits specializations for pure function pointer types
  template <class ReturnType, class... Args>
  struct GradientDerivedEstFnTraits<ReturnType (*)(Args...)> {
-    using type = void (*)(Args..., OutputParamType_t<Args, Args>...,
-                          double&);
+    using type = void (*)(Args..., OutputParamType_t<Args, void>..., void*);


Why this became a void* from a double&? What is the benefit?

This is necessary to enable overloads in error estimation. double& cannot be converted to void* so I replaced it with void* in the overload and double* in other places. This is the reason _final_error became pointer-type.

vgvassilev · 2024-07-06T17:15:23Z

test/Hessian/TemplateFunctors.C

@@ -15,9 +15,9 @@ template <typename T> struct Experiment {

 // CHECK: void operator_call_hessian(double i, double j, double *hessianMatrix) {
 // CHECK-NEXT:     Experiment<double> _d_this;
-// CHECK-NEXT:     this->operator_call_darg0_grad(i, j, &_d_this, hessianMatrix + {{0U|0UL}}, hessianMatrix + {{1U|1UL}});
+// CHECK-NEXT:     this->operator_call_darg0_pullback(i, j, 1, &_d_this, hessianMatrix + {{0U|0UL}}, hessianMatrix + {{1U|1UL}});


Passing 1 as hardcoded number for d_x will probably be some sort of bugs/confusion. Can we make sure somehow we require literals either 0 or 1? Maybe some enum constant...

It's always 1 here. Setting the _d_y parameter of a pullback to 1 essentially makes it equivalent to the corresponding gradient. This is the idea I described in the PR message.

vgvassilev · 2024-07-06T17:16:55Z

test/Misc/RunDemos.C

 //CHECK_FLOAT_SUM:     for (; _t0; _t0--) {
 //CHECK_FLOAT_SUM:         i--;
 //CHECK_FLOAT_SUM:         {
-//CHECK_FLOAT_SUM:             _final_error += std::abs(_d_sum * sum * 1.1920928955078125E-7);
+//CHECK_FLOAT_SUM:             *_final_error += std::abs(_d_sum * sum * 1.1920928955078125E-7);


Why do we pass this by pointer now?

To enable overloads in error estimation (I explained this in more detail above).

codecov · 2024-07-11T21:37:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.71%. Comparing base (f03fc98) to head (5fb5536).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #964      +/-   ##
==========================================
- Coverage   93.80%   93.71%   -0.10%     
==========================================
  Files          55       55              
  Lines        8065     7997      -68     
==========================================
- Hits         7565     7494      -71     
- Misses        500      503       +3

Files	Coverage Δ
include/clad/Differentiator/DiffMode.h	`0.00% <ø> (ø)`
include/clad/Differentiator/ReverseModeVisitor.h	`98.18% <100.00%> (+0.30%)`	⬆️
lib/Differentiator/DerivativeBuilder.cpp	`100.00% <100.00%> (ø)`
lib/Differentiator/DiffPlanner.cpp	`98.71% <100.00%> (ø)`
lib/Differentiator/ErrorEstimator.cpp	`99.15% <100.00%> (+<0.01%)`	⬆️
lib/Differentiator/HessianModeVisitor.cpp	`99.54% <100.00%> (+0.01%)`	⬆️
lib/Differentiator/ReverseModeVisitor.cpp	`97.03% <100.00%> (-0.29%)`	⬇️

... and 1 file with indirect coverage changes

Files	Coverage Δ
include/clad/Differentiator/DiffMode.h	`0.00% <ø> (ø)`
include/clad/Differentiator/ReverseModeVisitor.h	`98.18% <100.00%> (+0.30%)`	⬆️
lib/Differentiator/DerivativeBuilder.cpp	`100.00% <100.00%> (ø)`
lib/Differentiator/DiffPlanner.cpp	`98.71% <100.00%> (ø)`
lib/Differentiator/ErrorEstimator.cpp	`99.15% <100.00%> (+<0.01%)`	⬆️
lib/Differentiator/HessianModeVisitor.cpp	`99.54% <100.00%> (+0.01%)`	⬆️
lib/Differentiator/ReverseModeVisitor.cpp	`97.03% <100.00%> (-0.29%)`	⬇️

... and 1 file with indirect coverage changes

…imation.

…th pullbacks when necessary.

…with Derive.

vgvassilev · 2024-11-13T17:20:24Z

@PetroZarytskyi, can you rebase this PR and then @kchristin22 can try it out in the context of cuda and provide a minimal example that should work...

PetroZarytskyi requested review from vaithak and vgvassilev July 2, 2024 11:55

github-actions bot reviewed Jul 2, 2024

View reviewed changes

PetroZarytskyi force-pushed the no-grad branch 3 times, most recently from 8a643a5 to 67190a8 Compare July 2, 2024 12:58

github-actions bot reviewed Jul 2, 2024

View reviewed changes

vgvassilev reviewed Jul 6, 2024

View reviewed changes

PetroZarytskyi force-pushed the no-grad branch 3 times, most recently from c37e97f to e9504a6 Compare July 11, 2024 21:29

PetroZarytskyi force-pushed the no-grad branch from 0b018b3 to 986f3c0 Compare July 22, 2024 20:42

PetroZarytskyi added 6 commits July 22, 2024 23:46

Remove unnecessary forward declarations in tests.

f0f031f

Remove an unused TEST macro.

6385045

Change _final_error to pointer type and enable overloads in error est…

cab276d

…imation.

Implement gradient as a wrapper of the corresponding pullback.

9cd45b5

Create a central point to generate gradient postfixes. Use postfix wi…

a0eba51

…th pullbacks when necessary.

Remove DerivePullback from ReverseModeVisitor and generate pullbacks …

5fb5536

…with Derive.

PetroZarytskyi force-pushed the no-grad branch from 986f3c0 to 5fb5536 Compare July 22, 2024 20:48

vgvassilev added this to the v2.0 milestone Jul 24, 2024

vgvassilev requested a review from kchristin22 October 30, 2024 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify the reverse mode by having just one DiffMode for it #964

Simplify the reverse mode by having just one DiffMode for it #964

PetroZarytskyi commented Jul 2, 2024 •

edited by vgvassilev

Loading

PetroZarytskyi commented Jul 2, 2024

github-actions bot left a comment

github-actions bot left a comment

github-actions bot Jul 2, 2024

github-actions bot Jul 2, 2024

vaithak commented Jul 3, 2024

vgvassilev commented Jul 6, 2024

vgvassilev Jul 6, 2024

PetroZarytskyi Jul 10, 2024

vgvassilev Jul 6, 2024

PetroZarytskyi Jul 10, 2024

vgvassilev Jul 6, 2024

PetroZarytskyi Jul 10, 2024

codecov bot commented Jul 11, 2024 •

edited

Loading

vgvassilev commented Nov 13, 2024

Simplify the reverse mode by having just one DiffMode for it #964

Are you sure you want to change the base?

Simplify the reverse mode by having just one DiffMode for it #964

Conversation

PetroZarytskyi commented Jul 2, 2024 • edited by vgvassilev Loading

PetroZarytskyi commented Jul 2, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot Jul 2, 2024

Choose a reason for hiding this comment

github-actions bot Jul 2, 2024

Choose a reason for hiding this comment

vaithak commented Jul 3, 2024

vgvassilev commented Jul 6, 2024

vgvassilev Jul 6, 2024

Choose a reason for hiding this comment

PetroZarytskyi Jul 10, 2024

Choose a reason for hiding this comment

vgvassilev Jul 6, 2024

Choose a reason for hiding this comment

PetroZarytskyi Jul 10, 2024

Choose a reason for hiding this comment

vgvassilev Jul 6, 2024

Choose a reason for hiding this comment

PetroZarytskyi Jul 10, 2024

Choose a reason for hiding this comment

codecov bot commented Jul 11, 2024 • edited Loading

Codecov Report

vgvassilev commented Nov 13, 2024

PetroZarytskyi commented Jul 2, 2024 •

edited by vgvassilev

Loading

codecov bot commented Jul 11, 2024 •

edited

Loading