Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix derivative initialization in void functions in reverse mode #822

Closed
kchristin22 opened this issue Mar 14, 2024 · 12 comments
Closed

Fix derivative initialization in void functions in reverse mode #822

kchristin22 opened this issue Mar 14, 2024 · 12 comments
Assignees
Milestone

Comments

@kchristin22
Copy link
Collaborator

When there is no return statement and a value/pointer is modified inside the function we want to derive, the user has to initialize the derivative of the argument before passing it to the execute function, which is not necessary when there's a return statement.

Example:

void foo(int *a){
  *a *= *a;
}

Derived code:

  • Before:
void foo_grad(int *a, clad::array_ref<int> _d_a) {
    int _t0;
    _t0 = *a;
    *a *= *a;
    {
        *a = _t0;
        int _r_d0 = * _d_a;
        * _d_a -= _r_d0;
        * _d_a += _r_d0 * *a;
        * _d_a += *a * _r_d0;
    }
}
  • After fix:
void foo_grad(int *a, clad::array_ref<int> _d_a) {
    * _d_a = 1;
    int _t0;
    _t0 = *a;
    *a *= *a;
    {
        *a = _t0;
        int _r_d0 = * _d_a;
        * _d_a -= _r_d0;
        * _d_a += _r_d0 * *a;
        * _d_a += *a * _r_d0;
    }
}
@parth-07
Copy link
Collaborator

When there is no return statement and a value/pointer is modified inside the function we want to derive, the user has to initialize the derivative of the argument before passing it to the execute function, which is not necessary when there's a return statement.

If there are multiple out parameters, how can we automatically determine the output variable for the differentiation?

@kchristin22
Copy link
Collaborator Author

Hi @parth-07!
So we have two cases:

  • single output:
    I was thinking of adding an extra argument to the gradient function to declare the return argument only for void functions. I'm not sure yet of how I could handle it (an idea would be to add a fake return statement to do this, if this can happen).
  • all are outputs:
    In the PR, if multiple parameters are present and the user wants to derive based on all of them (meaning they're all output variables in a way and independent from each other), then the user can derive the otherwise multiple functions at once and using a single function. If the parameters are not independent from each other, then the user should have separated them as they would have done in paper.

@parth-07
Copy link
Collaborator

I was thinking of adding an extra argument to the gradient function to declare the return argument only for void functions. I'm not sure yet of how I could handle it (an idea would be to add a fake return statement to do this, if this can happen).

I am not quite sure what you mean by this. Is this what you have implemented in the PR #823?

In the PR, if multiple parameters are present and the user wants to derive based on all of them (meaning they're all output variables in a way and independent from each other), then the user can derive the otherwise multiple functions at once and using a single function.

Reverse-mode cannot compute derivatives for multiple output parameters. Reverse-mode is used for computing derivatives of an output parameter with respect to multiple input parameters. Standard reverse-mode cannot be used for multiple output parameters.

Clad expects users to correctly set seed values for derivative arguments. This is even true when there is a return statement. For example, we have to set seed value to 0 for derivatives of input values of a function. For more general case such as directional derivatives, the seed value would not always be 1 (which we typically have for output value) or 0 (which we typically have for input values).

I don't think this issue needs fixing. @vgvassilev @PetroZarytskyi @vaithak can comment more.

@kchristin22
Copy link
Collaborator Author

kchristin22 commented Mar 16, 2024

I am not quite sure what you mean by this. Is this what you have implemented in the PR #823?

No no, in that PR I just initialize the derivative (the seed) for the input arg which is treated as both an input and output in the void function. But I want to look into it as it will be needed for CUDA kernel differentiation. See this comment where I talk a bit about this.

Reverse-mode cannot compute derivatives for multiple output parameters. Reverse-mode is used for computing derivatives of an output parameter with respect to multiple input parameters. Standard reverse-mode cannot be used for multiple output parameters.

Yes, I agree. It is kind of unorthodox in the pure mathematical way but efficient. So with gradient, you can derive this function for instance:

void foo1(int *i, int *j){
   *i *= *i;
   *j *= *j;
}

These variables are independent and their derivative can be computed at once (one function instead of two) by using gradient and not specifying the argument for these expressions to be derived by. This is already supported using clad, you just needed to set the seed (as in all void functions). It's true that on paper you wouldn't mix two functions but in reality it would be more practical I guess.

Clad expects users to correctly set seed values for derivative arguments. This is even true when there is a return statement. For example, we have to set seed value to 0 for derivatives of input values of a function. For more general case such as directional derivatives, the seed value would not always be 1 (which we typically have for output value) or 0 (which we typically have for input values).

I see. If there are cases using gradient that the user would specify the derivative as other than 0 or 1, then indeed we should let the user have this freedom. I cannot think of a more specific example where this would make sense though as I'm thinking it too mathematically I guess and my knowledge of derivative applications is limited. Do you have anything in mind?

@vgvassilev
Copy link
Owner

@kchristin22, while we are discussing if this is an issue or not, maybe you can take a look at either of #696, #710 or #717.

@kchristin22
Copy link
Collaborator Author

Sure!

@parth-07
Copy link
Collaborator

parth-07 commented Mar 17, 2024

No no, in that PR I just initialize the derivative (the seed) for the input arg which is treated as both an input and output in the void function. But I want to look into it as it will be needed for CUDA kernel differentiation.

Oh, okay. Thank you for the information.

These variables are independent and their derivative can be computed at once (one function instead of two) by using gradient and not specifying the argument for these expressions to be derived by. This is already supported using clad, you just needed to set the seed (as in all void functions). It's true that on paper you wouldn't mix two functions but in reality it would be more practical I guess.

That was an interesting example. Can you please tell more about the benefits of mixing the two functions as compared to having two distinct functions and separately differentiating them?

If there are cases using gradient that the user would specify the derivative as other than 0 or 1, then indeed we should let the user have this freedom. I cannot think of a more specific example where this would make sense though as I'm thinking it too mathematically I guess and my knowledge of derivative applications is limited. Do you have anything in mind?

One typical example is computing the gradient of a specific function of the output vector, such as a weighted sum of the outputs. For example, if there's a function with 3 inputs x1, x2, and x3 and 3 outputs out1, out2 and out3, such as shown below:

void fn(double x1, double x2, double x3, double &out1, double &out2, double &out3) { 
  // ... 
}

// grad function of fn
void fn_grad(double x1, double x2, double x3, double &out1, double &out2, double &out3, 
             clad::array_ref<double> dx1, clad::array_ref<double> dx2, clad::array_ref<double> dx3,
             clad::array_ref<double> dout1, clad::array_ref<double> dout2, clad::array_ref<double> dout3);
// weighted sum of the outputs
y = a1 * out1 + a2 * out2 + a3 * out3;

To compute the gradient of a weighted sum of the outputs, as shown above, we can call the grad function as follows:

fn_grad(x1, x2, x3, out1, out2, out3, dx1, dx2, dx3, /*dout1=*/ a1, /*dout2=*/ a2, /*dout3=*/ a3);

Appropriately setting the seeds for dout1, dout2, and dout3 allows the user to compute the gradient for y (the weighted sum of the outputs) in one pass of the gradient function. This is more efficient than the alternative of separately computing gradients of out1, out2 and out3 and then using these gradients to compute the gradient of y.

@kchristin22
Copy link
Collaborator Author

That was an interesting example. Can you please tell more about the benefits of mixing the two functions as compared to having two distinct functions and separately differentiating them?

Actually your example below is what I had in mind, computing many derivatives at once (in the example I gave a1 and a2 were equal to 1).

Thank you very much for the specification, it is indeed more convenient this way. Do you think that a "warning" of the user to initialize the derivatives should be included in the README?

@parth-07
Copy link
Collaborator

Do you think that a "warning" of the user to initialize the derivatives should be included in the README?

Sure. However, I don't think it should be a warning per se. I think it should be added as simply a guide to correctly using Clad-generated functions.

@kchristin22
Copy link
Collaborator Author

Before closing this issue, would you mind giving your opinion on continuing relevant work?

@vgvassilev
Copy link
Owner

vgvassilev commented Apr 8, 2024

I was wondering what's the fate of this issue?

@parth-07
Copy link
Collaborator

parth-07 commented Apr 9, 2024

Before closing this issue, would you mind giving your opinion on #823 (comment)?

There are still a few issues even if the user can specify the output parameter as an argument to clad::gradient:

  1. What's the benefit of explicitly specifying the output parameter as opposed to specifying the output parameter by setting its adjoint to 1?

  2. It will be difficult to deduce the derivative function type if we allow the user to specify the output parameter. Consider the below case for a concrete example:

double fn(double x, double *out1, float *out2) {
  // ...
}

auto fn_grad_out1 = clad::gradient(fn, "x", /*outParameter=*/ "out1");
auto fn_grad_out2 = clad::gradient(fn, "x", /*outParameter=*/ "out2");

The type of derivative function underlying fn_grad_out1 and fn_grad_out2 would be different now because we are changing the output parameter. However, due to the complexity of Clang plugin infrastructure, we need to know the derivative function type at compile-time independently of Clad plugin-specific functionality. FunctionTraits file handles the deduction of derivative functions. We cannot deduce the derivative function type without knowing the output parameter type. But if the output parameter is specified as a string literal, then we cannot know its type at compile-time.

Please let me know if you think I misunderstood your solution, as in that case, the above analysis may not stand.

If you are satisfied with the reasonings, then can you please close this ticket?

@vgvassilev vgvassilev modified the milestones: v1.5, v1.6 Apr 24, 2024
@parth-07 parth-07 closed this as not planned Won't fix, can't repro, duplicate, stale May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants