Fix derivative initialization of void functions in reverse mode #823

kchristin22 · 2024-03-14T17:17:00Z

Fixes #822.

github-actions · 2024-03-14T17:22:57Z

clang-tidy review says "All clean, LGTM! 👍"

codecov · 2024-03-14T17:23:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.98%. Comparing base (d2df900) to head (0479442).
Report is 258 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #823   +/-   ##
=======================================
  Coverage   94.97%   94.98%           
=======================================
  Files          49       49           
  Lines        7543     7553   +10     
=======================================
+ Hits         7164     7174   +10     
  Misses        379      379

Files with missing lines	Coverage Δ
lib/Differentiator/ReverseModeVisitor.cpp	`96.61% <100.00%> (+0.01%)`	⬆️

Files with missing lines	Coverage Δ
lib/Differentiator/ReverseModeVisitor.cpp	`96.61% <100.00%> (+0.01%)`	⬆️

github-actions · 2024-03-14T20:52:16Z

clang-tidy review says "All clean, LGTM! 👍"

kchristin22 · 2024-03-14T21:10:04Z

Please do not merge yet, I'd like to see if I can add coverage of more cases (f.i. void foo(double *in, double *out); ). I will let you know by tomorrow.

Edit: Since I'm thinking of changing the gradient function to include another argument for this specific case, it could be a separate PR. Let me know what you think.

PetroZarytskyi · 2024-03-15T20:29:47Z

Hi, @kchristin22. Thank you for your work. However, I'd like to express my doubts on whether the behavior of gradients in this PR is intuitive. When the original function has only one reference/pointer type parameter, this makes sense to some extent. Like in your example:

void pointerArgOut(double* p) {
  *p *= *p;
}

void pointerArgOut_grad(double *p, clad::array_ref<double> _d_p) {
    * _d_p = 1;
    double _t0;
    _t0 = *p;
    *p *= *p;
    {
        *p = _t0;
        double _r_d0 = * _d_p;
        * _d_p -= _r_d0;
        * _d_p += _r_d0 * *p;
        * _d_p += *p * _r_d0;
    }
}

You basically consider *p an output parameter (i.e. *p represents the function value).
Even though I'm not sure such behavior should be default, let's consider the case when the number of parameters is bigger than 1:

void do_nothing(double x, double y, double z) {}

void do_nothing_grad(double x, double y, double z, clad::array_ref<double> _d_x, clad::array_ref<double> _d_y, clad::array_ref<double> _d_z) {
    * _d_x = 1;
    * _d_y = 1;
    * _d_z = 1;
}

What is the meaning of the gradient in this case? Why would a user expect the gradient to be {1, 1, 1}?
Mathematically, in this PR we consider "the value of a void function" to be the sum of its parameters. But why?
Could you give a bigger explanation of why the gradient of void functions should be computed this way?

kchristin22 · 2024-03-16T10:36:09Z

Hi @PetroZarytskyi!

So I was thinking that when we want to derive a function based on an argument, f.i. a, then da/da = 1 which is defined in the code as _d_a. For the rest of the parameters, f.i. db/da, it would be 0, which is already performed as a step in the code.

Now for the more technical part, this addition should be well used. Meaning that if the user wants to differentiate the function based on all the parameters, then their assignments in the code should be independent, otherwise the user should have used sth like the jacobian version in two different functions I suppose. But I'm not sure if "protecting" the user this way should be part of clad as it would be just bad usage of this API (you can't expect a memset when you call malloc). So, regarding your example, I think it falls into this category, of what the user wants to achieve. Without the initialization, the code produces a Segmentation Fault. Hence, the user must set the derivative initialization themselves before executing the gradient function. This PR aims to protect the user in case such an omission occurs, which is also guaranteed for functions with other return types. It is also an extra capability for the user as they can perform two otherwise independent function derivatives in a single function.

Some may wonder that you can always use a function with a return statement instead so no need to improve anything here. However, CUDA kernels are void functions so if we want to support kernel differentiation, this PR is worth it because it protects the user and overly I believe it does more good than harm. Although, since I want to dive into supporting the case of void foo(double *in, double *out) for CUDA kernels, there may be more safe and sophisticated ways to add this initialization in the future. So I would really like your opinion on this.

PetroZarytskyi · 2024-03-19T14:11:02Z

Hi, @kchristin22. I agree that it often makes sense to consider some parameters as output (like in your example with CUDA kernels). This does mean initializing the adjoint of the output parameter to 1. My biggest concern is what happens when the function has multiple parameters. In that case, we need to choose a parameter to consider output (we can't just initialize all parameter adjoints to 1). This decision should be given to the user and it's not obvious how to do this. Considering you can already achieve this by setting the adjoint to 1 by hand before passing it to the gradient, I'm not sure we should introduce new interfaces. At least, this is worth a bigger discussion. And yes, I think this concern is relevant to most of the use cases, even your void foo example has two parameters only one of which is output.

kchristin22 · 2024-03-19T17:40:18Z

Hello!

Yes, I can see how it can be problematic. The example I gave was to underline that more work needs to be done as there are loop holes. @parth-07 also gave a very nice example of why initializing in the derived function is not always efficient.

I had started working on a way to potentially support the case of void foo(double *in, double *out) (branch). So far, I have included an extra argument in the API (clad::gradient) and store its name in the ReverseModeVisitor.h in order to manipulate the derivation accordingly, though, due to our conversation and the one in the issue, I'm not sure if continuing working on this is worth it. What do you think?

parth-07 · 2024-03-24T06:44:16Z

Hi @kchristin22

Thank you for being so proactive in fixing issues.

I had started working on a way to potentially support the case of void foo(double *in, double *out) (branch). So far, I have included an extra argument in the API (clad::gradient) and store its name in the ReverseModeVisitor.h in order to manipulate the derivation accordingly, though, due to our conversation and the one in the issue, I'm not sure if continuing working on this is worth it. What do you think?

How are you passing the argument name to clad::gradient? This would be very difficult to achieve because the adjoint types and the function signature would differ for different argument name, and we need to compute the function signature at compile-time without any help from the Clang plugin infrastructure. Can you please briefly show how the user would use this extra argument of clad::gradient?

kchristin22 · 2024-03-25T16:32:58Z

No need to thank me, I really enjoy it.

This view of the changes really helps in pinpointing the additions. I basically add another argument in gradient that is initialized to nullptr and can only be assigned when the function to be derived is of void type. This argument is assigned to an expr stored as a class member of the request. When this request gets processed and the return argument has been specified, the function's parameters are scanned to ensure that the user gave a valid parameter name and the name is stored in ReverseMode.

I have some ideas on how the correct derivation will be achieved that I included in my proposal.

vgvassilev · 2024-04-08T20:10:11Z

@kchristin22, @PetroZarytskyi, what is the fate of this PR?

PetroZarytskyi · 2024-04-23T22:52:11Z

@vgvassilev Even though the PR itself looks good, I'm not convinced we need this. I think changing the interface this way will only make it less consistent and more confusing.

vgvassilev · 2024-09-09T06:18:58Z

@kchristin22, what is the fate of this PR?

kchristin22 and others added 5 commits March 7, 2024 12:08

Move gradient inside kernel and add result evaluation

ae7cc7d

Replace std vector with std array in GradientCuda

20be1f3

Add initialization of derived argument's derivative

fb1b8da

Expand void function support to pointer args

24e55e4

Merge branch 'vgvassilev:master' into void-function-support

80bbdae

vgvassilev requested a review from PetroZarytskyi March 14, 2024 17:18

kchristin22 added 3 commits March 14, 2024 22:40

Fix formatting and add comment

bafdfb7

Add Test of void function with pointer arg

02fa7d1

Fix comment format

0479442

parth-07 mentioned this pull request Mar 16, 2024

Fix derivative initialization in void functions in reverse mode #822

Closed

kchristin22 closed this Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix derivative initialization of void functions in reverse mode #823

Fix derivative initialization of void functions in reverse mode #823

kchristin22 commented Mar 14, 2024

github-actions bot commented Mar 14, 2024

codecov bot commented Mar 14, 2024 •

edited

Loading

github-actions bot commented Mar 14, 2024

kchristin22 commented Mar 14, 2024 •

edited

Loading

PetroZarytskyi commented Mar 15, 2024

kchristin22 commented Mar 16, 2024

PetroZarytskyi commented Mar 19, 2024

kchristin22 commented Mar 19, 2024

parth-07 commented Mar 24, 2024

kchristin22 commented Mar 25, 2024

vgvassilev commented Apr 8, 2024

PetroZarytskyi commented Apr 23, 2024

vgvassilev commented Sep 9, 2024

Fix derivative initialization of void functions in reverse mode #823

Fix derivative initialization of void functions in reverse mode #823

Conversation

kchristin22 commented Mar 14, 2024

github-actions bot commented Mar 14, 2024

codecov bot commented Mar 14, 2024 • edited Loading

Codecov Report

github-actions bot commented Mar 14, 2024

kchristin22 commented Mar 14, 2024 • edited Loading

PetroZarytskyi commented Mar 15, 2024

kchristin22 commented Mar 16, 2024

PetroZarytskyi commented Mar 19, 2024

kchristin22 commented Mar 19, 2024

parth-07 commented Mar 24, 2024

kchristin22 commented Mar 25, 2024

vgvassilev commented Apr 8, 2024

PetroZarytskyi commented Apr 23, 2024

vgvassilev commented Sep 9, 2024

codecov bot commented Mar 14, 2024 •

edited

Loading

kchristin22 commented Mar 14, 2024 •

edited

Loading