Skip to content

Commit

Permalink
ml and seo
Browse files Browse the repository at this point in the history
  • Loading branch information
JacobReynolds committed Feb 17, 2024
1 parent f4ad963 commit fe0ef92
Show file tree
Hide file tree
Showing 15 changed files with 363 additions and 5 deletions.
2 changes: 2 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ gem "jekyll", "~> 4.3.3" # installed by `gem jekyll`

gem "just-the-docs", "0.7.0" # pinned to the current release
# gem "just-the-docs" # always download the latest release

gem 'jekyll-sitemap'
3 changes: 3 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ GEM
sass-embedded (~> 1.54)
jekyll-seo-tag (2.8.0)
jekyll (>= 3.8, < 5.0)
jekyll-sitemap (1.4.0)
jekyll (>= 3.7, < 5.0)
jekyll-watch (2.2.1)
listen (~> 3.0)
just-the-docs (0.7.0)
Expand Down Expand Up @@ -83,6 +85,7 @@ PLATFORMS

DEPENDENCIES
jekyll (~> 4.3.3)
jekyll-sitemap
just-the-docs (= 0.7.0)

BUNDLED WITH
Expand Down
4 changes: 2 additions & 2 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ aux_links:
color_scheme: custom

baseurl: /


plugins:
- jekyll-sitemap
callouts_level: loud # or loud
callouts:
note-title:
Expand Down
104 changes: 104 additions & 0 deletions beep boop/foundation/back-propagation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,107 @@ grand_parent: beep boop
# Back Propagation

Back propagation is the process of taking a series of nodes (equations), starting at the end, and calculating the effect each node has on the outcome of the equations. We do this by calculating the gradient ([derivative](../derivatives/)) of each node.

## Code

Here's an example of an individual value node that would exist inside of a chain of nodes and the functions it needs for back propagation.

```python
class Value:

def __init__(self, data, _children=(), _op='', label=''):
self.data = data
self.grad = 0.0
self._backward = lambda: None
self._prev = set(_children)
self._op = _op
self.label = label

def __repr__(self):
return f"Value(data={self.data})"

def __add__(self, other):
other = other if isinstance(other, Value) else Value(other)
out = Value(self.data + other.data, (self, other), '+')

def _backward():
self.grad += 1.0 * out.grad
other.grad += 1.0 * out.grad
out._backward = _backward

return out

def __mul__(self, other):
other = other if isinstance(other, Value) else Value(other)
out = Value(self.data * other.data, (self, other), '*')

def _backward():
self.grad += other.data * out.grad
other.grad += self.data * out.grad
out._backward = _backward

return out

def __pow__(self, other):
assert isinstance(other, (int, float)), "only supporting int/float powers for now"
out = Value(self.data**other, (self,), f'**{other}')

def _backward():
self.grad += other * (self.data ** (other - 1)) * out.grad
out._backward = _backward

return out

def __rmul__(self, other): # other * self
return self * other

def __truediv__(self, other): # self / other
return self * other**-1

def __neg__(self): # -self
return self * -1

def __sub__(self, other): # self - other
return self + (-other)

def __radd__(self, other): # other + self
return self + other

def tanh(self):
x = self.data
t = (math.exp(2*x) - 1)/(math.exp(2*x) + 1)
out = Value(t, (self, ), 'tanh')

def _backward():
self.grad += (1 - t**2) * out.grad
out._backward = _backward

return out

def exp(self):
x = self.data
out = Value(math.exp(x), (self, ), 'exp')

def _backward():
self.grad += out.data * out.grad # NOTE: in the video I incorrectly used = instead of +=. Fixed here.
out._backward = _backward

return out


def backward(self):

topo = []
visited = set()
def build_topo(v):
if v not in visited:
visited.add(v)
for child in v._prev:
build_topo(child)
topo.append(v)
build_topo(self)

self.grad = 1.0
for node in reversed(topo):
node._backward()
```
72 changes: 72 additions & 0 deletions beep boop/foundation/gradient-descent/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: Gradient Descent
parent: Foundation
grand_parent: beep boop
layout: default
math: katex
---

# Gradient Descent

Gradient descent is the process fine tuning the weights and biases of a neural network to minimize our [loss function](../loss/).

## Example

We do this by performing [back propagation](../back-propagation/) across something like a [multi-layer perceptron](../multi-layer-perceptron/) to [calculate](../derivatives/) the gradients of each [neuron](../neuron/). We do this so when we do a forward-pass through the MLP, we can compare the expected outputs against the actual outputs using a [loss function](../loss/). Gradient descent is then the process of adjusting the weights and biases of each neuron, to get our loss function as low as possible. The gradient of each neuron helps us understand whether to change the weights/biases of that neuron in a positive or negative direction to achieve the output we want.

Building off the [multi-layer perceptron](../multi-layer-perceptron/) implementation, we can perform gradient descent with the following:

```python
n = MLP(3, [4, 4, 1])
xs = [
[2.0, 3.0, -1.0],
[3.0, -1.0, 0.5],
[0.5, 1.0, 1.0],
[1.0, 1.0, -1.0],
]
ys = [1.0, -1.0, -1.0, 1.0]

for k in range(20):

# forward pass
ypred = [n(x) for x in xs]
loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred))

# backward pass
for p in n.parameters():
p.grad = 0.0
loss.backward()

# update
for p in n.parameters():
p.data += -0.1 * p.grad

print(k, loss.data)
```

The reasoning for `-0.1` here is actually super important. We have to remember the goal of this gradient descent is to lower the value of the loss function as much as possible. So when tuning our weights, we want to tune them such that they _decrease_ the loss function. Luckily we know that the gradient will tell us how much that value will change the output. Let's look at some examples:

$$
p.grad = 0.41\newline
p.data = 0.88
$$

In this case, if we want to decrease the loss function, we want to decrease $$p.data$$, because $$p.grad$$ tells us that for every $$n$$ we increase $$p.data$$ the loss function changes by $$n \cdot 0.41$$. So it makes sense to instead do $$-0.1 * p.grad$$ here.

But what if the signs are different?

$$
p.grad = -0.41\newline
p.data = 0.88
$$

In this case, increasing $$p.data$$ decreases the loss function. If we do $$-0.1 \cdot -0.41$$ we get $$0.041$$ which will increase $$p.data$$ and further decrease the loss function.

One more

$$
p.grad = -0.41\newline
p.data = -0.88
$$

If we increase $$p.data$$, that will lower the loss function. And just like the previous example $$-0.1 * -0.41 = 0.041$$ which will end up increasing $$p.data$$ and lowering the resulting loss function. The sign of $$p.data$$ actually has no effect here, it's only the sign of $$p.grad$$ that matters. And we manage that by basically inverting it by multiplying with $$-0.1$$. If we were instead looking to maximize the loss function, we'd multiple by $$+0.1$$
22 changes: 22 additions & 0 deletions beep boop/foundation/loss/LossFunction.drawio
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<mxfile host="app.diagrams.net" modified="2024-02-17T16:01:54.058Z" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36" etag="hwVI0leB8-ZAtJqbm0MF" version="23.1.5" type="device">
<diagram name="Page-1" id="_gvUFp_ucttC2tlLnHjO">
<mxGraphModel dx="855" dy="570" grid="0" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="1" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<mxCell id="TGnNtdPKK5nN1VZu4Hb8-1" value="$$&#xa;\frac{1}{N} \cdot \sum_{i=0}^{N} (actual_i - expected_i)^2&#xa;$$" style="text;strokeColor=none;align=center;fillColor=none;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontSize=25;fontColor=default;" vertex="1" parent="1">
<mxGeometry x="215" y="210" width="420" height="210" as="geometry" />
</mxCell>
<mxCell id="TGnNtdPKK5nN1VZu4Hb8-2" value="&lt;font style=&quot;font-size: 24px;&quot;&gt;Loss&lt;/font&gt;" style="shape=curlyBracket;whiteSpace=wrap;html=1;rounded=1;labelPosition=left;verticalLabelPosition=middle;align=right;verticalAlign=middle;rotation=-90;" vertex="1" parent="1">
<mxGeometry x="460" y="230" width="20" height="260" as="geometry" />
</mxCell>
<mxCell id="TGnNtdPKK5nN1VZu4Hb8-3" value="&lt;font style=&quot;font-size: 24px;&quot;&gt;Squared&lt;/font&gt;" style="shape=curlyBracket;whiteSpace=wrap;html=1;rounded=1;labelPosition=left;verticalLabelPosition=middle;align=right;verticalAlign=middle;rotation=90;" vertex="1" parent="1">
<mxGeometry x="602.5" y="257.5" width="20" height="45" as="geometry" />
</mxCell>
<mxCell id="TGnNtdPKK5nN1VZu4Hb8-8" value="&lt;font style=&quot;font-size: 24px;&quot;&gt;Mean&lt;/font&gt;" style="shape=curlyBracket;whiteSpace=wrap;html=1;rounded=1;labelPosition=left;verticalLabelPosition=middle;align=right;verticalAlign=middle;rotation=-90;" vertex="1" parent="1">
<mxGeometry x="425" y="230" width="20" height="415" as="geometry" />
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>
41 changes: 41 additions & 0 deletions beep boop/foundation/loss/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: Loss
layout: default
parent: Foundation
grand_parent: beep boop
---

<h1><pre>
| ||
|| |_
</pre></h1>

The loss is a single number that helps us understand the performance of the neural network. The loss function is how we calculate that number. A lot of the time in training a neural network is spent optimizing this loss function.

## Mean-squared error loss

You calculate this by subtracting the actual output from the neural network with the expected output, squaring them, and then taking the mean of all values you tested. I _think_ this helps exaggerate values that are far from correct and shrink values that are closer to correct. But it also has the primary benefit of getting rid of the sign of the values, similar to $$abs$$.

The curious thing to me is that we don't actually take the mean of the summated squared losses, at least not in anything I've seen so far. So I'm hoping to figure that out. It seems like the division by $$N$$ doesn't really matter, it's the squaring of the loss values that actually give us our metrics. Everything else is just syntactic sugar.

![Mathematical expression of mean squared loss](./mean-squared-loss.png)

## Example

If we use our [multi-layer perceptron](../multi-layer-perceptron/) we can provide it with our initial inputs `xs` and our expected outputs `ys` for 4 passes, feed those through the MLP, and then calculate the loss.

```python
n = MLP(3, [4, 4, 1])
xs = [
[2.0, 3.0, -1.0],
[3.0, -1.0, 0.5],
[0.5, 1.0, 1.0],
[1.0, 1.0, -1.0],
]
ys = [1.0, -1.0, -1.0, 1.0]
ypred = [n(x) for x in xs]

loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred))

# 7.817821598365237
```
Binary file added beep boop/foundation/loss/mean-squared-loss.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
55 changes: 54 additions & 1 deletion beep boop/foundation/multi-layer-perceptron/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,57 @@ grand_parent: beep boop

# Multi-layer Perceptron (MLP)

A MLP consists of many [neurons](../neuron/) lined up in order and feeding values between each other.
An MLP consists of many layers of [neurons](../neuron/) lined up in order and feeding values between each other.

Since I'm very code inclined, here's the python that implements the following image:

![a multilayer perceptron](./mlp.jpeg)

The following code also uses the `Value` class from [Back Propagation](../back-propagation/)

```python
class Neuron:

def __init__(self, nin):
self.w = [Value(random.uniform(-1,1)) for _ in range(nin)]
self.b = Value(random.uniform(-1,1))

def __call__(self, x):
# w * x + b
act = sum((wi*xi for wi, xi in zip(self.w, x)), self.b)
out = act.tanh()
return out

def parameters(self):
return self.w + [self.b]

class Layer:

def __init__(self, nin, nout):
self.neurons = [Neuron(nin) for _ in range(nout)]

def __call__(self, x):
outs = [n(x) for n in self.neurons]
return outs[0] if len(outs) == 1 else outs

def parameters(self):
return [p for neuron in self.neurons for p in neuron.parameters()]

class MLP:

def __init__(self, nin, nouts):
sz = [nin] + nouts
self.layers = [Layer(sz[i], sz[i+1]) for i in range(len(nouts))]

def __call__(self, x):
for layer in self.layers:
x = layer(x)
return x

def parameters(self):
return [p for layer in self.layers for p in layer.parameters()]

x = [2.0, 3.0, -1.0]
n = MLP(3, [4, 4, 1])
n(x)
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions beep boop/foundation/neuron/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,11 @@ Neurons are exactly what they sound like, the things in our brain!
![Diagram of a neuron](./neuron_model.jpeg)

In machine learning, we model these in neural networks to simulate how the brain works. Neurons take in a series of values (x) and weights (w), which are individually multiplied and then added. The neuron fires by taking these and adding the bias of the neuron (how trigger happy it is) and passing it through an activation function which helps squash the values to something like -1 to 1. This is usually `tanh` or a `sigmoid` function.

## Weights

The weights for each input of a neuron are arbitrarily chosen. There's probably a whole field of mathematics that goes into determining the best starting weights, but at this point for me, it's random. Then through the process of training, these weights get adjusted to try and fit our loss function.

## Biases

Much like [weights](#weights), biases are also randomly chosen and updated throughout the training process to try and adjust the activation of that neuron to fit our loss function.
8 changes: 8 additions & 0 deletions beep boop/frameworks/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Frameworks
has_children: true
layout: default
parent: beep boop
---

Notes on various frameworks available for machine learning.
Binary file added beep boop/frameworks/pytorch/equation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit fe0ef92

Please sign in to comment.