Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the reverse mode by having just one DiffMode for it #964

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

PetroZarytskyi
Copy link
Collaborator

@PetroZarytskyi PetroZarytskyi commented Jul 2, 2024

Currently, on master, we have two reverse diff modes: DiffMode::reverse for gradients and DiffMode::experimental_pullback for pullbacks. In this PR, they are essentially merged into DiffMode::reverse. This has been achieved by placing a pullback in the gradient overload instead of a gradient function.
Let's consider an example:

// Original code:
double f(double a, double b) {
    ...
}

int main() {
    auto df = clad::gradient(f);
    ...
}

->
On master:

// Generated derivative code:
double f_grad(double a, double b, double *_d_a, double *_d_b) {
    ...
}

// This overload is placed in the clad::gradient call
double f_grad(double a, double b, void *_temp_d_a, void *_temp_d_b) {
    double *_d_a = (double *)_temp_d_a;
    double *_d_b = (double *)_temp_d_b;
    f_grad(a, b, _d_a, _d_b);
}

In this PR:

// Generated derivative code:
double f_pullback(double a, double b, double _d_y, double *_d_a, double *_d_b) {
    ...
}

// This is placed in the clad::gradient call
double f_grad(double a, double b, void *_temp_d_a, void *_temp_d_b) {
    double *_d_a = (double *)_temp_d_a;
    double *_d_b = (double *)_temp_d_b;
    f_pullback(a, b, 1, _d_a, _d_b);
}

Note: To make this system work with error estimation, I had to enable overloads there. To do that, I had to change the type of _final_error parameters from double& to double*.

Advantages:

  1. On master, we have 11 DiffModes, many of which use the same visitors. Having a unified reverse DiffMode makes the system easier to understand.
  2. In RMV, Derive and DerivePullback do almost the same job. This PR removes DerivePullback completely.
  3. With this PR, clad does not use overloads for the reverse mode anymore: just one gradient function and one pullback function. This is a great step towards supporting C, which does not have overloads.
  4. Differentiating recursive functions used to generate both the gradient and the pullback. Now only the pullback is generated.

Disadvantages:

  1. Now gradient forward declaration is only supported with void* adjoint parameter types. e.g. for a function double f(double a, double b), it doesn't make sense anymore to forward declare void f_grad(double a, double b, double *_d_a, double *_d_b). The options void f_grad(double a, double b, void *_d_a, void *_d_b) and void f_pullback(double a, double b, double _d_y, void *_d_a, void *_d_b) still work.
    However, forward declarations don't seem to be that widely used. For example, when we changed all array_ref adjoint types to pointers in the gradient signature, this didn't break a single ROOT test. The main way to execute derivatives (with CladFunction) works as before.
  2. Now all differentiated functions have the pullback _d_y parameter. This may make it harder to understand the derivative code. Moreover, every time the function has a parameter named y, the pullback parameter will be renamed to _d_y0 to avoid name collisions. This could make the code even more confusing. However, we can fix the last problem by giving the pullback parameter a different name.

@PetroZarytskyi
Copy link
Collaborator Author

This is a big change so having different opinions on the PR would be great. We discussed the idea with @vgvassilev.
@vaithak I'd love to know your thoughts.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

include/clad/Differentiator/ReverseModeVisitor.h Outdated Show resolved Hide resolved
include/clad/Differentiator/ReverseModeVisitor.h Outdated Show resolved Hide resolved
include/clad/Differentiator/ReverseModeVisitor.h Outdated Show resolved Hide resolved
include/clad/Differentiator/ReverseModeVisitor.h Outdated Show resolved Hide resolved
include/clad/Differentiator/ReverseModeVisitor.h Outdated Show resolved Hide resolved
lib/Differentiator/ReverseModeVisitor.cpp Outdated Show resolved Hide resolved
lib/Differentiator/ReverseModeVisitor.cpp Show resolved Hide resolved
lib/Differentiator/ReverseModeVisitor.cpp Show resolved Hide resolved
@PetroZarytskyi PetroZarytskyi force-pushed the no-grad branch 3 times, most recently from 8a643a5 to 67190a8 Compare July 2, 2024 12:58
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

using type = void (C::*)(Args..., OutputParamType_t<Args, Args>..., \
double&) cv vol ref noex; \
};
#define GradientDerivedEstFnTraits_AddSPECS(var, cv, vol, ref, noex) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function-like macro 'GradientDerivedEstFnTraits_AddSPECS' used; consider a 'constexpr' template function [cppcoreguidelines-macro-usage]

#define GradientDerivedEstFnTraits_AddSPECS(var, cv, vol, ref, noex)         \
        ^

for (const auto& dParam : DVI) {
const clang::ValueDecl* arg = dParam.param;
const auto* begin = request->param_begin();
const auto* end = std::next(begin, numParams);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: narrowing conversion from 'std::size_t' (aka 'unsigned long') to signed type 'typename iterator_traits<ParmVarDecl *const *>::difference_type' (aka 'long') is implementation-defined [cppcoreguidelines-narrowing-conversions]

          const auto* end = std::next(begin, numParams);
                                             ^

@vaithak
Copy link
Collaborator

vaithak commented Jul 3, 2024

This looks really good 👍🏼
Thanks, @PetroZarytskyi, for improving this.

@vgvassilev
Copy link
Owner

This PR also Partially addresses #721.

@@ -482,8 +482,7 @@ namespace clad {
// GradientDerivedEstFnTraits specializations for pure function pointer types
template <class ReturnType, class... Args>
struct GradientDerivedEstFnTraits<ReturnType (*)(Args...)> {
using type = void (*)(Args..., OutputParamType_t<Args, Args>...,
double&);
using type = void (*)(Args..., OutputParamType_t<Args, void>..., void*);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this became a void* from a double&? What is the benefit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary to enable overloads in error estimation. double& cannot be converted to void* so I replaced it with void* in the overload and double* in other places. This is the reason _final_error became pointer-type.

@@ -15,9 +15,9 @@ template <typename T> struct Experiment {

// CHECK: void operator_call_hessian(double i, double j, double *hessianMatrix) {
// CHECK-NEXT: Experiment<double> _d_this;
// CHECK-NEXT: this->operator_call_darg0_grad(i, j, &_d_this, hessianMatrix + {{0U|0UL}}, hessianMatrix + {{1U|1UL}});
// CHECK-NEXT: this->operator_call_darg0_pullback(i, j, 1, &_d_this, hessianMatrix + {{0U|0UL}}, hessianMatrix + {{1U|1UL}});
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing 1 as hardcoded number for d_x will probably be some sort of bugs/confusion. Can we make sure somehow we require literals either 0 or 1? Maybe some enum constant...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always 1 here. Setting the _d_y parameter of a pullback to 1 essentially makes it equivalent to the corresponding gradient. This is the idea I described in the PR message.

//CHECK_FLOAT_SUM: for (; _t0; _t0--) {
//CHECK_FLOAT_SUM: i--;
//CHECK_FLOAT_SUM: {
//CHECK_FLOAT_SUM: _final_error += std::abs(_d_sum * sum * 1.1920928955078125E-7);
//CHECK_FLOAT_SUM: *_final_error += std::abs(_d_sum * sum * 1.1920928955078125E-7);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we pass this by pointer now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To enable overloads in error estimation (I explained this in more detail above).

@PetroZarytskyi PetroZarytskyi force-pushed the no-grad branch 3 times, most recently from c37e97f to e9504a6 Compare July 11, 2024 21:29
Copy link

codecov bot commented Jul 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.71%. Comparing base (f03fc98) to head (5fb5536).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #964      +/-   ##
==========================================
- Coverage   93.80%   93.71%   -0.10%     
==========================================
  Files          55       55              
  Lines        8065     7997      -68     
==========================================
- Hits         7565     7494      -71     
- Misses        500      503       +3     
Files Coverage Δ
include/clad/Differentiator/DiffMode.h 0.00% <ø> (ø)
include/clad/Differentiator/ReverseModeVisitor.h 98.18% <100.00%> (+0.30%) ⬆️
lib/Differentiator/DerivativeBuilder.cpp 100.00% <100.00%> (ø)
lib/Differentiator/DiffPlanner.cpp 98.71% <100.00%> (ø)
lib/Differentiator/ErrorEstimator.cpp 99.15% <100.00%> (+<0.01%) ⬆️
lib/Differentiator/HessianModeVisitor.cpp 99.54% <100.00%> (+0.01%) ⬆️
lib/Differentiator/ReverseModeVisitor.cpp 97.03% <100.00%> (-0.29%) ⬇️

... and 1 file with indirect coverage changes

Files Coverage Δ
include/clad/Differentiator/DiffMode.h 0.00% <ø> (ø)
include/clad/Differentiator/ReverseModeVisitor.h 98.18% <100.00%> (+0.30%) ⬆️
lib/Differentiator/DerivativeBuilder.cpp 100.00% <100.00%> (ø)
lib/Differentiator/DiffPlanner.cpp 98.71% <100.00%> (ø)
lib/Differentiator/ErrorEstimator.cpp 99.15% <100.00%> (+<0.01%) ⬆️
lib/Differentiator/HessianModeVisitor.cpp 99.54% <100.00%> (+0.01%) ⬆️
lib/Differentiator/ReverseModeVisitor.cpp 97.03% <100.00%> (-0.29%) ⬇️

... and 1 file with indirect coverage changes

@vgvassilev
Copy link
Owner

@PetroZarytskyi, can you rebase this PR and then @kchristin22 can try it out in the context of cuda and provide a minimal example that should work...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants