Fix derivative initialization in void functions in reverse mode #822

kchristin22 · 2024-03-14T17:15:07Z

When there is no return statement and a value/pointer is modified inside the function we want to derive, the user has to initialize the derivative of the argument before passing it to the execute function, which is not necessary when there's a return statement.

Example:

void foo(int *a){
  *a *= *a;
}

Derived code:

Before:

void foo_grad(int *a, clad::array_ref<int> _d_a) {
    int _t0;
    _t0 = *a;
    *a *= *a;
    {
        *a = _t0;
        int _r_d0 = * _d_a;
        * _d_a -= _r_d0;
        * _d_a += _r_d0 * *a;
        * _d_a += *a * _r_d0;
    }
}

After fix:

void foo_grad(int *a, clad::array_ref<int> _d_a) {
    * _d_a = 1;
    int _t0;
    _t0 = *a;
    *a *= *a;
    {
        *a = _t0;
        int _r_d0 = * _d_a;
        * _d_a -= _r_d0;
        * _d_a += _r_d0 * *a;
        * _d_a += *a * _r_d0;
    }
}

The text was updated successfully, but these errors were encountered:

parth-07 · 2024-03-16T08:41:07Z

When there is no return statement and a value/pointer is modified inside the function we want to derive, the user has to initialize the derivative of the argument before passing it to the execute function, which is not necessary when there's a return statement.

If there are multiple out parameters, how can we automatically determine the output variable for the differentiation?

kchristin22 · 2024-03-16T10:46:43Z

Hi @parth-07!
So we have two cases:

single output:
I was thinking of adding an extra argument to the gradient function to declare the return argument only for void functions. I'm not sure yet of how I could handle it (an idea would be to add a fake return statement to do this, if this can happen).
all are outputs:
In the PR, if multiple parameters are present and the user wants to derive based on all of them (meaning they're all output variables in a way and independent from each other), then the user can derive the otherwise multiple functions at once and using a single function. If the parameters are not independent from each other, then the user should have separated them as they would have done in paper.

parth-07 · 2024-03-16T11:28:09Z

I was thinking of adding an extra argument to the gradient function to declare the return argument only for void functions. I'm not sure yet of how I could handle it (an idea would be to add a fake return statement to do this, if this can happen).

I am not quite sure what you mean by this. Is this what you have implemented in the PR #823?

In the PR, if multiple parameters are present and the user wants to derive based on all of them (meaning they're all output variables in a way and independent from each other), then the user can derive the otherwise multiple functions at once and using a single function.

Reverse-mode cannot compute derivatives for multiple output parameters. Reverse-mode is used for computing derivatives of an output parameter with respect to multiple input parameters. Standard reverse-mode cannot be used for multiple output parameters.

Clad expects users to correctly set seed values for derivative arguments. This is even true when there is a return statement. For example, we have to set seed value to 0 for derivatives of input values of a function. For more general case such as directional derivatives, the seed value would not always be 1 (which we typically have for output value) or 0 (which we typically have for input values).

I don't think this issue needs fixing. @vgvassilev @PetroZarytskyi @vaithak can comment more.

kchristin22 · 2024-03-16T12:08:35Z

I am not quite sure what you mean by this. Is this what you have implemented in the PR #823?

No no, in that PR I just initialize the derivative (the seed) for the input arg which is treated as both an input and output in the void function. But I want to look into it as it will be needed for CUDA kernel differentiation. See this comment where I talk a bit about this.

Reverse-mode cannot compute derivatives for multiple output parameters. Reverse-mode is used for computing derivatives of an output parameter with respect to multiple input parameters. Standard reverse-mode cannot be used for multiple output parameters.

Yes, I agree. It is kind of unorthodox in the pure mathematical way but efficient. So with gradient, you can derive this function for instance:

void foo1(int *i, int *j){
   *i *= *i;
   *j *= *j;
}

These variables are independent and their derivative can be computed at once (one function instead of two) by using gradient and not specifying the argument for these expressions to be derived by. This is already supported using clad, you just needed to set the seed (as in all void functions). It's true that on paper you wouldn't mix two functions but in reality it would be more practical I guess.

Clad expects users to correctly set seed values for derivative arguments. This is even true when there is a return statement. For example, we have to set seed value to 0 for derivatives of input values of a function. For more general case such as directional derivatives, the seed value would not always be 1 (which we typically have for output value) or 0 (which we typically have for input values).

I see. If there are cases using gradient that the user would specify the derivative as other than 0 or 1, then indeed we should let the user have this freedom. I cannot think of a more specific example where this would make sense though as I'm thinking it too mathematically I guess and my knowledge of derivative applications is limited. Do you have anything in mind?

vgvassilev · 2024-03-16T15:18:33Z

@kchristin22, while we are discussing if this is an issue or not, maybe you can take a look at either of #696, #710 or #717.

kchristin22 · 2024-03-17T09:19:37Z

Sure!

parth-07 · 2024-03-17T14:42:39Z

No no, in that PR I just initialize the derivative (the seed) for the input arg which is treated as both an input and output in the void function. But I want to look into it as it will be needed for CUDA kernel differentiation.

Oh, okay. Thank you for the information.

These variables are independent and their derivative can be computed at once (one function instead of two) by using gradient and not specifying the argument for these expressions to be derived by. This is already supported using clad, you just needed to set the seed (as in all void functions). It's true that on paper you wouldn't mix two functions but in reality it would be more practical I guess.

That was an interesting example. Can you please tell more about the benefits of mixing the two functions as compared to having two distinct functions and separately differentiating them?

If there are cases using gradient that the user would specify the derivative as other than 0 or 1, then indeed we should let the user have this freedom. I cannot think of a more specific example where this would make sense though as I'm thinking it too mathematically I guess and my knowledge of derivative applications is limited. Do you have anything in mind?

One typical example is computing the gradient of a specific function of the output vector, such as a weighted sum of the outputs. For example, if there's a function with 3 inputs x1, x2, and x3 and 3 outputs out1, out2 and out3, such as shown below:

void fn(double x1, double x2, double x3, double &out1, double &out2, double &out3) { 
  // ... 
}

// grad function of fn
void fn_grad(double x1, double x2, double x3, double &out1, double &out2, double &out3, 
             clad::array_ref<double> dx1, clad::array_ref<double> dx2, clad::array_ref<double> dx3,
             clad::array_ref<double> dout1, clad::array_ref<double> dout2, clad::array_ref<double> dout3);

// weighted sum of the outputs
y = a1 * out1 + a2 * out2 + a3 * out3;

To compute the gradient of a weighted sum of the outputs, as shown above, we can call the grad function as follows:

fn_grad(x1, x2, x3, out1, out2, out3, dx1, dx2, dx3, /*dout1=*/ a1, /*dout2=*/ a2, /*dout3=*/ a3);

Appropriately setting the seeds for dout1, dout2, and dout3 allows the user to compute the gradient for y (the weighted sum of the outputs) in one pass of the gradient function. This is more efficient than the alternative of separately computing gradients of out1, out2 and out3 and then using these gradients to compute the gradient of y.

kchristin22 · 2024-03-19T17:23:18Z

That was an interesting example. Can you please tell more about the benefits of mixing the two functions as compared to having two distinct functions and separately differentiating them?

Actually your example below is what I had in mind, computing many derivatives at once (in the example I gave a1 and a2 were equal to 1).

Thank you very much for the specification, it is indeed more convenient this way. Do you think that a "warning" of the user to initialize the derivatives should be included in the README?

parth-07 · 2024-03-19T17:38:20Z

Do you think that a "warning" of the user to initialize the derivatives should be included in the README?

Sure. However, I don't think it should be a warning per se. I think it should be added as simply a guide to correctly using Clad-generated functions.

kchristin22 · 2024-03-19T17:44:24Z

Before closing this issue, would you mind giving your opinion on continuing relevant work?

vgvassilev · 2024-04-08T20:08:52Z

I was wondering what's the fate of this issue?

parth-07 · 2024-04-09T18:53:39Z

Before closing this issue, would you mind giving your opinion on #823 (comment)?

There are still a few issues even if the user can specify the output parameter as an argument to clad::gradient:

What's the benefit of explicitly specifying the output parameter as opposed to specifying the output parameter by setting its adjoint to 1?
It will be difficult to deduce the derivative function type if we allow the user to specify the output parameter. Consider the below case for a concrete example:

double fn(double x, double *out1, float *out2) {
  // ...
}

auto fn_grad_out1 = clad::gradient(fn, "x", /*outParameter=*/ "out1");
auto fn_grad_out2 = clad::gradient(fn, "x", /*outParameter=*/ "out2");

The type of derivative function underlying fn_grad_out1 and fn_grad_out2 would be different now because we are changing the output parameter. However, due to the complexity of Clang plugin infrastructure, we need to know the derivative function type at compile-time independently of Clad plugin-specific functionality. FunctionTraits file handles the deduction of derivative functions. We cannot deduce the derivative function type without knowing the output parameter type. But if the output parameter is specified as a string literal, then we cannot know its type at compile-time.

Please let me know if you think I misunderstood your solution, as in that case, the above analysis may not stand.

If you are satisfied with the reasonings, then can you please close this ticket?

kchristin22 mentioned this issue Mar 14, 2024

Fix derivative initialization of void functions in reverse mode #823

Closed

vgvassilev assigned kchristin22 Mar 14, 2024

vgvassilev added this to the v1.5 milestone Mar 14, 2024

vgvassilev modified the milestones: v1.5, v1.6 Apr 24, 2024

parth-07 closed this as not planned Won't fix, can't repro, duplicate, stale May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix derivative initialization in void functions in reverse mode #822

Fix derivative initialization in void functions in reverse mode #822

kchristin22 commented Mar 14, 2024

parth-07 commented Mar 16, 2024

kchristin22 commented Mar 16, 2024

parth-07 commented Mar 16, 2024

kchristin22 commented Mar 16, 2024 •

edited

Loading

vgvassilev commented Mar 16, 2024

kchristin22 commented Mar 17, 2024

parth-07 commented Mar 17, 2024 •

edited

Loading

kchristin22 commented Mar 19, 2024

parth-07 commented Mar 19, 2024

kchristin22 commented Mar 19, 2024

vgvassilev commented Apr 8, 2024 •

edited

Loading

parth-07 commented Apr 9, 2024

Fix derivative initialization in void functions in reverse mode #822

Fix derivative initialization in void functions in reverse mode #822

Comments

kchristin22 commented Mar 14, 2024

parth-07 commented Mar 16, 2024

kchristin22 commented Mar 16, 2024

parth-07 commented Mar 16, 2024

kchristin22 commented Mar 16, 2024 • edited Loading

vgvassilev commented Mar 16, 2024

kchristin22 commented Mar 17, 2024

parth-07 commented Mar 17, 2024 • edited Loading

kchristin22 commented Mar 19, 2024

parth-07 commented Mar 19, 2024

kchristin22 commented Mar 19, 2024

vgvassilev commented Apr 8, 2024 • edited Loading

parth-07 commented Apr 9, 2024

kchristin22 commented Mar 16, 2024 •

edited

Loading

parth-07 commented Mar 17, 2024 •

edited

Loading

vgvassilev commented Apr 8, 2024 •

edited

Loading