Skip to content

Commit

Permalink
slides for week 3
Browse files Browse the repository at this point in the history
  • Loading branch information
akki2825 committed Oct 23, 2024
1 parent 316d955 commit b5726f5
Show file tree
Hide file tree
Showing 9 changed files with 108 additions and 14 deletions.
3 changes: 3 additions & 0 deletions 2024/weeks/week02/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/sh

clang -o nn nn.c
Binary file added 2024/weeks/week02/nn
Binary file not shown.
21 changes: 21 additions & 0 deletions 2024/weeks/week02/nn.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#include <stdio.h>

float train[][2] = {
{0, 0},
{1, 2},
{2, 4},
{3, 6},
{4, 8}
};

float rand_float() {
return (float) rand()/ (float) RAND_MAX;
}

int main() {
y = x*w;
w = rand_float();
printf("Hello, World!\n");
return 0;
}

41 changes: 41 additions & 0 deletions 2024/weeks/week02/nn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import random
import time

train = [
[0, 0],
[1, 2],
[2, 4],
[3, 6],
[4, 8],
]
train_count = len(train)

def loss(w, b):
result = 0.0
for i in range(train_count):
x = train[i][0]
y = x * w + b
d = y - train[i][1]
result = result + d * d
result = result/train_count
return result

if __name__ == "__main__":
random.seed(int(time.time()))
w = random.random() * 10.0
b = random.random() * 5.0

eps = 1e-3
rate = 1e-3

print(f"{loss(w, b)}")
for i in range(500):
c = loss(w, b)
dw = (loss(w + eps, b) - c) / eps
db = (loss(w, b + eps) - c) / eps
w = w - rate * dw
b = b - rate * db
print(f"loss = {loss(w, b)}, w = {w}, b = {b}")

print("------------------------------")
print(f"w = {w}, b = {b}")
Binary file modified 2024/weeks/week03/gradient_descent_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified 2024/weeks/week03/gradient_descent_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 0 additions & 4 deletions 2024/weeks/week03/page.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
title: "🗓️ Week 01 - Admin, Architectures: Statistical and Probabilistic Language Models"
---

In this first week, we will cover what you can expect to learn from this course and the course logistics: all you need to know about the structure of the lectures, classes, assessments, and how we will interact throughout this course.

## 👨‍🏫 Lecture Slides

Expand All @@ -15,8 +14,5 @@ Use your keypad to navigate the slides.

## ✍️ Homework

<a href="https://llm4linguists.xyz/2024/homework/homework01.html">Week 01 - Homework</a>

## 📚 Recommended Reading

- Speech and Language Processing, Chapter 3: N-gram language models [PDF](https://web.stanford.edu/~jurafsky/slp3/3.pdf). Authors: Dan Jurafsky and James H. Martin.
47 changes: 38 additions & 9 deletions 2024/weeks/week03/slides.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ format:
css: /css/styles_slides.css
footer: 'LLMs in Lingustic Research WiSe 2024/25'
---
## Activation functions {.smaller}


# Activation functions {.smaller}

- Activation functions are used to introduce non-linearity to the output of a neuron.

Expand All @@ -34,6 +36,9 @@ f(x) = \frac{1}{1 + e^{-x}}
$$

Example: $f(0) = 0.5$

where:

- f(x): This represents the output of the sigmoid function for a given input x.
- e: This is the euler's number (approximately 2.71828).
- x: This is the input to the sigmoid function.
Expand All @@ -53,6 +58,7 @@ $$
Example: $f(2) = 2$

where:

- f(x): This represents the output of the ReLU function for a given input x.
- x: This is the input to the ReLU function.
- max: This function returns the maximum of the two values.
Expand All @@ -62,19 +68,24 @@ where:
- The output of the ReLU function is between 0 and infinity.
- It is a popular activation function used in deep learning models.

## Loss functions {.smaller}
# Loss functions {.smaller}

- During forward pass, the neural network makes predictions based on input data.
- The loss function compares these predictions to the true values and calculates a loss score.
- The loss score is a measure of how well the network is performing.
- The goal of training is to minimize the loss function.
- For regression problems, use MSE or MAE.
- For classification problems, use cross-entropy loss.
- For multi-class classification problems, use categorical cross-entropy loss.
- You can use different loss functions for different set of tasks:
- For regression problems, use MSE or MAE.
- For classification problems, use cross-entropy loss.
- For multi-class classification problems, use categorical cross-entropy loss.

## Gradient descent {.smaller}
# Gradient descent {.smaller}

- **Gradient descent** is a optimization algorithm used in machine learning to minimize the loss function of a model.
- The algorithm works by iteratively adjusting the model parameters (weights and biases) to reduce the loss.
- The key idea behind gradient descent is to move in the direction of the negative gradient of the loss function.
- A negative gradient indicates the direction of steepest descent, i.e., the direction in which the loss decreases the fastest.
- By following the gradient, the algorithm can find the optimal values of the model parameters that minimize the loss function.

## {.smaller}

Expand Down Expand Up @@ -108,14 +119,15 @@ where:
- The algorithm may not always reach the exact theoretical minima due to factors like step size (learning rate) and the complexity of the loss landscape.
- But, it typically converges to a point close enough to be practically useful for model optimization.

## Learning rate {.smaller}
# Learning rate {.smaller}

- The **learning rate** is a hyperparameter that controls how much the model parameters are adjusted during training.
- It is a critical parameter that can affect the convergence of the optimization algorithm.
- A hyperparameter is a parameter whose value is set before the learning process begins.
- Learning rate is a critical parameter that can affect the convergence of the optimization algorithm.
- A high learning rate can cause the model to overshoot the minima, leading to instability and divergence.
- A low learning rate can slow down the training process and may get stuck in local minima.

## Neural Network {.smaller}
# Single-layer Neural Network {.smaller}

- A neural network is a collection of interconnected nodes (neurons) that process input data to produce output predictions.
- The nodes are organized into layers, with each layer performing specific computations.
Expand Down Expand Up @@ -181,4 +193,21 @@ flowchart LR

## {.smaller}

- The input layer consists of three nodes (I1, I2, I3) representing the input features.
- The hidden layer consists of three nodes (H1, H2, H3) that process the input data.
- The output layer consists of two nodes (O1, O2) that produce the final predictions.
- The connections between nodes are represented by weights (w11, w12, ..., v32) and biases (b1, b2, ..., b5).
- The weights and biases are adjusted during training to optimize the model.
- The model makes predictions by passing the input data through the network and computing the output.

## Training, development and test datasets {.smaller}

- The training dataset is used to optimize the model parameters (weights and biases) using gradient descent.
- The development dataset is used to tune the hyperparameters of the model, such as the learning rate and the number of hidden units.
- The test dataset is used to evaluate the performance of the model on unseen data.
- In order to avoid overfitting, it is important to have separate datasets for training, development, and testing.
- The training dataset is typically the largest, followed by the development and test datasets.
- The development and test datasets should be representative of the data the model will encounter in the real world.
- The datasets should be randomly sampled to avoid bias and ensure that the model generalizes well.

## Thank you! {.smaller}
6 changes: 5 additions & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,11 @@ website:
contents:
- href: 2024/weeks/week02/page.qmd
text: 👨‍🏫 Lecture Material

- href: 2024/weeks/week03/page.qmd
text: Week 03 - NN basics II
contents:
- href: 2024/weeks/week03/page.qmd
text: 👨‍🏫 Lecture Material

- section: "🧩 Homework"
contents:
Expand Down

0 comments on commit b5726f5

Please sign in to comment.