slides for week 3

ansost · Oct 23, 2024 · b5726f5 · b5726f5
1 parent 316d955
commit b5726f5
Show file tree

Hide file tree

Showing 9 changed files with 108 additions and 14 deletions.
diff --git a/2024/weeks/week02/build.sh b/2024/weeks/week02/build.sh
@@ -0,0 +1,3 @@
+#!/bin/sh
+
+clang -o nn nn.c
diff --git a/2024/weeks/week02/nn b/2024/weeks/week02/nn
diff --git a/2024/weeks/week02/nn.c b/2024/weeks/week02/nn.c
@@ -0,0 +1,21 @@
+#include <stdio.h>
+
+float train[][2] = {
+	{0, 0},
+	{1, 2},
+	{2, 4},
+	{3, 6},
+	{4, 8}
+};
+
+float rand_float() {
+    return (float) rand()/ (float) RAND_MAX;
+}
+
+int main() {
+	y = x*w;
+	w = rand_float();
+	printf("Hello, World!\n");
+	return 0;
+}
+
diff --git a/2024/weeks/week02/nn.py b/2024/weeks/week02/nn.py
@@ -0,0 +1,41 @@
+import random
+import time
+
+train = [
+    [0, 0],
+    [1, 2],
+    [2, 4],
+    [3, 6],
+    [4, 8],
+]
+train_count = len(train)
+
+def loss(w, b):
+    result = 0.0
+    for i in range(train_count):
+        x = train[i][0]
+        y = x * w + b
+        d = y - train[i][1]
+        result = result + d * d
+    result = result/train_count
+    return result
+
+if __name__ == "__main__":
+    random.seed(int(time.time()))
+    w = random.random() * 10.0
+    b = random.random() * 5.0
+
+    eps = 1e-3
+    rate = 1e-3
+
+    print(f"{loss(w, b)}")
+    for i in range(500):
+        c = loss(w, b)
+        dw = (loss(w + eps, b) - c) / eps
+        db = (loss(w, b + eps) - c) / eps
+        w = w - rate * dw
+        b = b - rate * db
+        print(f"loss = {loss(w, b)}, w = {w}, b = {b}")
+
+    print("------------------------------")
+    print(f"w = {w}, b = {b}")
diff --git a/2024/weeks/week03/gradient_descent_1.png b/2024/weeks/week03/gradient_descent_1.png
diff --git a/2024/weeks/week03/gradient_descent_2.png b/2024/weeks/week03/gradient_descent_2.png
diff --git a/2024/weeks/week03/page.qmd b/2024/weeks/week03/page.qmd
@@ -2,7 +2,6 @@
 title: "🗓️ Week 01 - Admin, Architectures: Statistical and Probabilistic Language Models"
 ---
 
-In this first week, we will cover what you can expect to learn from this course and the course logistics: all you need to know about the structure of the lectures, classes, assessments, and how we will interact throughout this course.
 
 ## 👨‍🏫 Lecture Slides
 
@@ -15,8 +14,5 @@ Use your keypad to navigate the slides.
 
 ## ✍️ Homework
 
-<a href="https://llm4linguists.xyz/2024/homework/homework01.html">Week 01 - Homework</a>
 
 ## 📚 Recommended Reading
-
-- Speech and Language Processing, Chapter 3: N-gram language models [PDF](https://web.stanford.edu/~jurafsky/slp3/3.pdf). Authors: Dan Jurafsky and James H. Martin.
diff --git a/2024/weeks/week03/slides.qmd b/2024/weeks/week03/slides.qmd
@@ -24,7 +24,9 @@ format:
     css: /css/styles_slides.css
     footer: 'LLMs in Lingustic Research WiSe 2024/25'
 ---
-## Activation functions {.smaller}
+
+
+# Activation functions {.smaller}
 
 - Activation functions are used to introduce non-linearity to the output of a neuron.
 
@@ -34,6 +36,9 @@ f(x) = \frac{1}{1 + e^{-x}}
 $$
 
 Example: $f(0) = 0.5$
+
+where:
+
 	- f(x): This represents the output of the sigmoid function for a given input x.
 	- e: This is the euler's number (approximately 2.71828).
 	- x: This is the input to the sigmoid function.
@@ -53,6 +58,7 @@ $$
 Example: $f(2) = 2$
 
 where:
+
 	- f(x): This represents the output of the ReLU function for a given input x.
 	- x: This is the input to the ReLU function.
 	- max: This function returns the maximum of the two values.
@@ -62,19 +68,24 @@ where:
 - The output of the ReLU function is between 0 and infinity.
 - It is a popular activation function used in deep learning models.
 
-## Loss functions {.smaller}
+# Loss functions {.smaller}
 
 - During forward pass, the neural network makes predictions based on input data.
 - The loss function compares these predictions to the true values and calculates a loss score.
 - The loss score is a measure of how well the network is performing.
 - The goal of training is to minimize the loss function.
-- For regression problems, use MSE or MAE.
-- For classification problems, use cross-entropy loss.
-- For multi-class classification problems, use categorical cross-entropy loss.
+- You can use different loss functions for different set of tasks:
+	- For regression problems, use MSE or MAE.
+	- For classification problems, use cross-entropy loss.
+	- For multi-class classification problems, use categorical cross-entropy loss.
 
-## Gradient descent {.smaller}
+# Gradient descent {.smaller}
 
 - **Gradient descent** is a optimization algorithm used in machine learning to minimize the loss function of a model.
+- The algorithm works by iteratively adjusting the model parameters (weights and biases) to reduce the loss.
+- The key idea behind gradient descent is to move in the direction of the negative gradient of the loss function.
+- A negative gradient indicates the direction of steepest descent, i.e., the direction in which the loss decreases the fastest.
+- By following the gradient, the algorithm can find the optimal values of the model parameters that minimize the loss function.
 
 ## {.smaller}
 
@@ -108,14 +119,15 @@ where:
 - The algorithm may not always reach the exact theoretical minima due to factors like step size (learning rate) and the complexity of the loss landscape.
 - But, it typically converges to a point close enough to be practically useful for model optimization.
 
-## Learning rate {.smaller}
+# Learning rate {.smaller}
 
 - The **learning rate** is a hyperparameter that controls how much the model parameters are adjusted during training.
-- It is a critical parameter that can affect the convergence of the optimization algorithm.
+- A hyperparameter is a parameter whose value is set before the learning process begins.
+- Learning rate is a critical parameter that can affect the convergence of the optimization algorithm.
 - A high learning rate can cause the model to overshoot the minima, leading to instability and divergence.
 - A low learning rate can slow down the training process and may get stuck in local minima.
 
-## Neural Network {.smaller}
+# Single-layer Neural Network {.smaller}
 
 - A neural network is a collection of interconnected nodes (neurons) that process input data to produce output predictions.
 - The nodes are organized into layers, with each layer performing specific computations.
@@ -181,4 +193,21 @@ flowchart LR
 
 ## {.smaller}
 
+- The input layer consists of three nodes (I1, I2, I3) representing the input features.
+- The hidden layer consists of three nodes (H1, H2, H3) that process the input data.
+- The output layer consists of two nodes (O1, O2) that produce the final predictions.
+- The connections between nodes are represented by weights (w11, w12, ..., v32) and biases (b1, b2, ..., b5).
+- The weights and biases are adjusted during training to optimize the model.
+- The model makes predictions by passing the input data through the network and computing the output.
+
+## Training, development and test datasets {.smaller}
+
+- The training dataset is used to optimize the model parameters (weights and biases) using gradient descent.
+- The development dataset is used to tune the hyperparameters of the model, such as the learning rate and the number of hidden units.
+- The test dataset is used to evaluate the performance of the model on unseen data.
+- In order to avoid overfitting, it is important to have separate datasets for training, development, and testing.
+- The training dataset is typically the largest, followed by the development and test datasets.
+- The development and test datasets should be representative of the data the model will encounter in the real world.
+- The datasets should be randomly sampled to avoid bias and ensure that the model generalizes well.
+
 ## Thank you! {.smaller}
diff --git a/_quarto.yml b/_quarto.yml
@@ -56,7 +56,11 @@ website:
            contents:
              - href: 2024/weeks/week02/page.qmd
                text: 👨‍🏫 Lecture Material
-
+         - href: 2024/weeks/week03/page.qmd
+           text: Week 03 - NN basics II
+           contents:
+             - href: 2024/weeks/week03/page.qmd
+               text: 👨‍🏫 Lecture Material
 
       - section: "🧩 Homework"
         contents: