-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f4ad963
commit fe0ef92
Showing
15 changed files
with
363 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
--- | ||
title: Gradient Descent | ||
parent: Foundation | ||
grand_parent: beep boop | ||
layout: default | ||
math: katex | ||
--- | ||
|
||
# Gradient Descent | ||
|
||
Gradient descent is the process fine tuning the weights and biases of a neural network to minimize our [loss function](../loss/). | ||
|
||
## Example | ||
|
||
We do this by performing [back propagation](../back-propagation/) across something like a [multi-layer perceptron](../multi-layer-perceptron/) to [calculate](../derivatives/) the gradients of each [neuron](../neuron/). We do this so when we do a forward-pass through the MLP, we can compare the expected outputs against the actual outputs using a [loss function](../loss/). Gradient descent is then the process of adjusting the weights and biases of each neuron, to get our loss function as low as possible. The gradient of each neuron helps us understand whether to change the weights/biases of that neuron in a positive or negative direction to achieve the output we want. | ||
|
||
Building off the [multi-layer perceptron](../multi-layer-perceptron/) implementation, we can perform gradient descent with the following: | ||
|
||
```python | ||
n = MLP(3, [4, 4, 1]) | ||
xs = [ | ||
[2.0, 3.0, -1.0], | ||
[3.0, -1.0, 0.5], | ||
[0.5, 1.0, 1.0], | ||
[1.0, 1.0, -1.0], | ||
] | ||
ys = [1.0, -1.0, -1.0, 1.0] | ||
|
||
for k in range(20): | ||
|
||
# forward pass | ||
ypred = [n(x) for x in xs] | ||
loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred)) | ||
|
||
# backward pass | ||
for p in n.parameters(): | ||
p.grad = 0.0 | ||
loss.backward() | ||
|
||
# update | ||
for p in n.parameters(): | ||
p.data += -0.1 * p.grad | ||
|
||
print(k, loss.data) | ||
``` | ||
|
||
The reasoning for `-0.1` here is actually super important. We have to remember the goal of this gradient descent is to lower the value of the loss function as much as possible. So when tuning our weights, we want to tune them such that they _decrease_ the loss function. Luckily we know that the gradient will tell us how much that value will change the output. Let's look at some examples: | ||
|
||
$$ | ||
p.grad = 0.41\newline | ||
p.data = 0.88 | ||
$$ | ||
|
||
In this case, if we want to decrease the loss function, we want to decrease $$p.data$$, because $$p.grad$$ tells us that for every $$n$$ we increase $$p.data$$ the loss function changes by $$n \cdot 0.41$$. So it makes sense to instead do $$-0.1 * p.grad$$ here. | ||
|
||
But what if the signs are different? | ||
|
||
$$ | ||
p.grad = -0.41\newline | ||
p.data = 0.88 | ||
$$ | ||
|
||
In this case, increasing $$p.data$$ decreases the loss function. If we do $$-0.1 \cdot -0.41$$ we get $$0.041$$ which will increase $$p.data$$ and further decrease the loss function. | ||
|
||
One more | ||
|
||
$$ | ||
p.grad = -0.41\newline | ||
p.data = -0.88 | ||
$$ | ||
|
||
If we increase $$p.data$$, that will lower the loss function. And just like the previous example $$-0.1 * -0.41 = 0.041$$ which will end up increasing $$p.data$$ and lowering the resulting loss function. The sign of $$p.data$$ actually has no effect here, it's only the sign of $$p.grad$$ that matters. And we manage that by basically inverting it by multiplying with $$-0.1$$. If we were instead looking to maximize the loss function, we'd multiple by $$+0.1$$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
<mxfile host="app.diagrams.net" modified="2024-02-17T16:01:54.058Z" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36" etag="hwVI0leB8-ZAtJqbm0MF" version="23.1.5" type="device"> | ||
<diagram name="Page-1" id="_gvUFp_ucttC2tlLnHjO"> | ||
<mxGraphModel dx="855" dy="570" grid="0" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="1" shadow="0"> | ||
<root> | ||
<mxCell id="0" /> | ||
<mxCell id="1" parent="0" /> | ||
<mxCell id="TGnNtdPKK5nN1VZu4Hb8-1" value="$$
\frac{1}{N} \cdot \sum_{i=0}^{N} (actual_i - expected_i)^2
$$" style="text;strokeColor=none;align=center;fillColor=none;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=25;fontColor=default;" vertex="1" parent="1"> | ||
<mxGeometry x="215" y="210" width="420" height="210" as="geometry" /> | ||
</mxCell> | ||
<mxCell id="TGnNtdPKK5nN1VZu4Hb8-2" value="<font style="font-size: 24px;">Loss</font>" style="shape=curlyBracket;whiteSpace=wrap;html=1;rounded=1;labelPosition=left;verticalLabelPosition=middle;align=right;verticalAlign=middle;rotation=-90;" vertex="1" parent="1"> | ||
<mxGeometry x="460" y="230" width="20" height="260" as="geometry" /> | ||
</mxCell> | ||
<mxCell id="TGnNtdPKK5nN1VZu4Hb8-3" value="<font style="font-size: 24px;">Squared</font>" style="shape=curlyBracket;whiteSpace=wrap;html=1;rounded=1;labelPosition=left;verticalLabelPosition=middle;align=right;verticalAlign=middle;rotation=90;" vertex="1" parent="1"> | ||
<mxGeometry x="602.5" y="257.5" width="20" height="45" as="geometry" /> | ||
</mxCell> | ||
<mxCell id="TGnNtdPKK5nN1VZu4Hb8-8" value="<font style="font-size: 24px;">Mean</font>" style="shape=curlyBracket;whiteSpace=wrap;html=1;rounded=1;labelPosition=left;verticalLabelPosition=middle;align=right;verticalAlign=middle;rotation=-90;" vertex="1" parent="1"> | ||
<mxGeometry x="425" y="230" width="20" height="415" as="geometry" /> | ||
</mxCell> | ||
</root> | ||
</mxGraphModel> | ||
</diagram> | ||
</mxfile> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
--- | ||
title: Loss | ||
layout: default | ||
parent: Foundation | ||
grand_parent: beep boop | ||
--- | ||
|
||
<h1><pre> | ||
| || | ||
|| |_ | ||
</pre></h1> | ||
|
||
The loss is a single number that helps us understand the performance of the neural network. The loss function is how we calculate that number. A lot of the time in training a neural network is spent optimizing this loss function. | ||
|
||
## Mean-squared error loss | ||
|
||
You calculate this by subtracting the actual output from the neural network with the expected output, squaring them, and then taking the mean of all values you tested. I _think_ this helps exaggerate values that are far from correct and shrink values that are closer to correct. But it also has the primary benefit of getting rid of the sign of the values, similar to $$abs$$. | ||
|
||
The curious thing to me is that we don't actually take the mean of the summated squared losses, at least not in anything I've seen so far. So I'm hoping to figure that out. It seems like the division by $$N$$ doesn't really matter, it's the squaring of the loss values that actually give us our metrics. Everything else is just syntactic sugar. | ||
|
||
![Mathematical expression of mean squared loss](./mean-squared-loss.png) | ||
|
||
## Example | ||
|
||
If we use our [multi-layer perceptron](../multi-layer-perceptron/) we can provide it with our initial inputs `xs` and our expected outputs `ys` for 4 passes, feed those through the MLP, and then calculate the loss. | ||
|
||
```python | ||
n = MLP(3, [4, 4, 1]) | ||
xs = [ | ||
[2.0, 3.0, -1.0], | ||
[3.0, -1.0, 0.5], | ||
[0.5, 1.0, 1.0], | ||
[1.0, 1.0, -1.0], | ||
] | ||
ys = [1.0, -1.0, -1.0, 1.0] | ||
ypred = [n(x) for x in xs] | ||
|
||
loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred)) | ||
|
||
# 7.817821598365237 | ||
``` |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
title: Frameworks | ||
has_children: true | ||
layout: default | ||
parent: beep boop | ||
--- | ||
|
||
Notes on various frameworks available for machine learning. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.