Skip to content

Commit

Permalink
Update index.md
Browse files Browse the repository at this point in the history
  • Loading branch information
JacobReynolds authored Feb 17, 2024
1 parent 509612c commit 256fbc2
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions beep boop/foundation/gradient-descent/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ $$

If we increase $$p.data$$, that will lower the loss function. And just like the previous example $$-0.1 * -0.41 = 0.041$$ which will end up increasing $$p.data$$ and lowering the resulting loss function. The sign of $$p.data$$ actually has no effect here, it's only the sign of $$p.grad$$ that matters. And we manage that by basically inverting it by multiplying with $$-0.1$$. If we were instead looking to maximize the loss function, we'd multiple by $$+0.1$$.

### How gradients relate to the loss

I got confused here for a bit trying to understand how we *know* that decreasing $$p.data$$ would decrease the loss function. What if the output is too low, wouldn't we want to increase the data? It's important to remember the loss function is almost like a continuation of the neural network. You take the outputs from the network and calculate the loss functions with those. So the final item in the equation is actually the output of the loss function, not the output of the neural net. That means our gradients are now directly tied to the loss function, not the outputs of the NN, due to performing back propogation starting with the loss function.

### Zero grad
Expand Down

0 comments on commit 256fbc2

Please sign in to comment.