-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix derivative initialization in void functions in reverse mode #822
Comments
If there are multiple out parameters, how can we automatically determine the output variable for the differentiation? |
Hi @parth-07!
|
I am not quite sure what you mean by this. Is this what you have implemented in the PR #823?
Reverse-mode cannot compute derivatives for multiple output parameters. Reverse-mode is used for computing derivatives of an output parameter with respect to multiple input parameters. Standard reverse-mode cannot be used for multiple output parameters. Clad expects users to correctly set seed values for derivative arguments. This is even true when there is a return statement. For example, we have to set seed value to I don't think this issue needs fixing. @vgvassilev @PetroZarytskyi @vaithak can comment more. |
No no, in that PR I just initialize the derivative (the seed) for the input arg which is treated as both an input and output in the void function. But I want to look into it as it will be needed for CUDA kernel differentiation. See this comment where I talk a bit about this.
Yes, I agree. It is kind of unorthodox in the pure mathematical way but efficient. So with
These variables are independent and their derivative can be computed at once (one function instead of two) by using gradient and not specifying the argument for these expressions to be derived by. This is already supported using clad, you just needed to set the seed (as in all void functions). It's true that on paper you wouldn't mix two functions but in reality it would be more practical I guess.
I see. If there are cases using gradient that the user would specify the derivative as other than 0 or 1, then indeed we should let the user have this freedom. I cannot think of a more specific example where this would make sense though as I'm thinking it too mathematically I guess and my knowledge of derivative applications is limited. Do you have anything in mind? |
@kchristin22, while we are discussing if this is an issue or not, maybe you can take a look at either of #696, #710 or #717. |
Sure! |
Oh, okay. Thank you for the information.
That was an interesting example. Can you please tell more about the benefits of mixing the two functions as compared to having two distinct functions and separately differentiating them?
One typical example is computing the gradient of a specific function of the output vector, such as a weighted sum of the outputs. For example, if there's a function with 3 inputs void fn(double x1, double x2, double x3, double &out1, double &out2, double &out3) {
// ...
}
// grad function of fn
void fn_grad(double x1, double x2, double x3, double &out1, double &out2, double &out3,
clad::array_ref<double> dx1, clad::array_ref<double> dx2, clad::array_ref<double> dx3,
clad::array_ref<double> dout1, clad::array_ref<double> dout2, clad::array_ref<double> dout3);
To compute the gradient of a weighted sum of the outputs, as shown above, we can call the grad function as follows: fn_grad(x1, x2, x3, out1, out2, out3, dx1, dx2, dx3, /*dout1=*/ a1, /*dout2=*/ a2, /*dout3=*/ a3); Appropriately setting the seeds for |
Actually your example below is what I had in mind, computing many derivatives at once (in the example I gave a1 and a2 were equal to 1). Thank you very much for the specification, it is indeed more convenient this way. Do you think that a "warning" of the user to initialize the derivatives should be included in the README? |
Sure. However, I don't think it should be a warning per se. I think it should be added as simply a guide to correctly using Clad-generated functions. |
Before closing this issue, would you mind giving your opinion on continuing relevant work? |
I was wondering what's the fate of this issue? |
There are still a few issues even if the user can specify the output parameter as an argument to
double fn(double x, double *out1, float *out2) {
// ...
}
auto fn_grad_out1 = clad::gradient(fn, "x", /*outParameter=*/ "out1");
auto fn_grad_out2 = clad::gradient(fn, "x", /*outParameter=*/ "out2"); The type of derivative function underlying Please let me know if you think I misunderstood your solution, as in that case, the above analysis may not stand. If you are satisfied with the reasonings, then can you please close this ticket? |
When there is no return statement and a value/pointer is modified inside the function we want to derive, the user has to initialize the derivative of the argument before passing it to the
execute
function, which is not necessary when there's a return statement.Example:
Derived code:
The text was updated successfully, but these errors were encountered: