Understanding reverse mode automatic differentiation

Gradients are computed using reverse-mode autodiff. All computations are representated as a graph of tensors with each tensor holding a reference to a function which can compute the local gradient of that tensor. The calculation of the partial derivative for each tensor is completed when the entire graph is traversed.

The fundamental idea behind autodiff is that it calculates the local derivative for each variable rather than its partial derivative. This way traversing through the computational graph is simple and modular, i.e we could calculate the partial derivative of any variable with respect to the output with just one traversal, with a complexity of .

The difference between partial and local derivative is the way each variable is treated in each equation. When calculating the partial derivative of a function, the expression is broken down into variables, for example and , instead of using , we say in the . On the other hand, when calculating the local derivative of a function, each element in the expression is considered a variable. I understand this might not be clear, so refer to the following example:

# Lets consider two variables `a` and `b`
a = 2
b = 3

# Function c which is a multiplied by b
c = a*b

# where d is a function of `a`, `b`, `c` but infact it is just a function of two variables `a` and `b`
d = a+c+b

And if we work out the partial derivative of

The important part to note is that c is written as a*b when calculating the derivative. Now, let's try the same computation using the local derivative method.

Local derivative of d w.r.t a is expressed as , note the bar over the d.

What does this really mean?

Local derivative treats c as a variable instead of a function of a and b. Therefore when calculating , we treat c as a constant rather than a function of a and b. This gives us the

Similarly,

Therefore, to calculate we have to multiply edges of a path and add the different path to a node.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding reverse mode automatic differentiation

What does this really mean?

Clone this wiki locally