diff --git a/ESM_01_introduction_to_pytorch.ipynb b/ESM_01_introduction_to_pytorch.ipynb
new file mode 100644
index 0000000..e96863b
--- /dev/null
+++ b/ESM_01_introduction_to_pytorch.ipynb
@@ -0,0 +1,5884 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 5,
+ "metadata": {
+ "jupytext": {
+ "cell_metadata_filter": "id,colab,colab_type,-all",
+ "formats": "ipynb,py:percent",
+ "main_language": "python"
+ },
+ "papermill": {
+ "default_parameters": {},
+ "duration": 21.925345,
+ "end_time": "2021-09-16T12:33:06.344225",
+ "environment_variables": {},
+ "exception": null,
+ "input_path": "course_UvA-DL/01-introduction-to-pytorch/Introduction_to_PyTorch.ipynb",
+ "output_path": ".notebooks/course_UvA-DL/01-introduction-to-pytorch.ipynb",
+ "parameters": {},
+ "start_time": "2021-09-16T12:32:44.418880",
+ "version": "2.3.3"
+ },
+ "colab": {
+ "name": "ESM 01-introduction-to-pytorch.ipynb",
+ "provenance": [],
+ "include_colab_link": true
+ },
+ "language_info": {
+ "name": "python"
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.050411,
+ "end_time": "2021-09-16T12:32:45.750290",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:45.699879",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "eaced3d8"
+ },
+ "source": [
+ "\n",
+ "# Tutorial 1: Introduction to PyTorch\n",
+ "\n",
+ "* **Author:** Phillip Lippe\n",
+ "* **License:** CC BY-SA\n",
+ "* **Generated:** 2021-09-16T14:32:16.770882\n",
+ "\n",
+ "This tutorial will give a short introduction to PyTorch basics, and get you setup for writing your own neural networks.\n",
+ "This notebook is part of a lecture series on Deep Learning at the University of Amsterdam.\n",
+ "The full list of tutorials can be found at https://uvadlc-notebooks.rtfd.io.\n",
+ "\n",
+ "\n",
+ "---\n",
+ "Open in [![Open In Colab](){height=\"20px\" width=\"117px\"}](https://colab.research.google.com/github/PytorchLightning/lightning-tutorials/blob/publication/.notebooks/course_UvA-DL/01-introduction-to-pytorch.ipynb)\n",
+ "\n",
+ "Give us a ⭐ [on Github](https://www.github.com/PytorchLightning/pytorch-lightning/)\n",
+ "| Check out [the documentation](https://pytorch-lightning.readthedocs.io/en/latest/)\n",
+ "| Join us [on Slack](https://join.slack.com/t/pytorch-lightning/shared_invite/zt-pw5v393p-qRaDgEk24~EjiZNBpSQFgQ)"
+ ],
+ "id": "eaced3d8"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.042559,
+ "end_time": "2021-09-16T12:32:45.835806",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:45.793247",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "fa480fc6"
+ },
+ "source": [
+ "## Setup\n",
+ "This notebook requires some packages besides pytorch-lightning."
+ ],
+ "id": "fa480fc6"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.043036,
+ "end_time": "2021-09-16T12:32:46.013859",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:45.970823",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "9473f942"
+ },
+ "source": [
+ "
\n",
+ "Welcome to our PyTorch tutorial for the Deep Learning course 2020 at the University of Amsterdam!\n",
+ "The following notebook is meant to give a short introduction to PyTorch basics, and get you setup for writing your own neural networks.\n",
+ "PyTorch is an open source machine learning framework that allows you to write your own neural networks and optimize them efficiently.\n",
+ "However, PyTorch is not the only framework of its kind.\n",
+ "Alternatives to PyTorch include [TensorFlow](https://www.tensorflow.org/), [JAX](https://github.com/google/jax#quickstart-colab-in-the-cloud) and [Caffe](http://caffe.berkeleyvision.org/).\n",
+ "We choose to teach PyTorch at the University of Amsterdam because it is well established, has a huge developer community (originally developed by Facebook), is very flexible and especially used in research.\n",
+ "Many current papers publish their code in PyTorch, and thus it is good to be familiar with PyTorch as well.\n",
+ "Meanwhile, TensorFlow (developed by Google) is usually known for being a production-grade deep learning library.\n",
+ "Still, if you know one machine learning framework in depth, it is very easy to learn another one because many of them use the same concepts and ideas.\n",
+ "For instance, TensorFlow's version 2 was heavily inspired by the most popular features of PyTorch, making the frameworks even more similar.\n",
+ "If you are already familiar with PyTorch and have created your own neural network projects, feel free to just skim this notebook.\n",
+ "\n",
+ "We are of course not the first ones to create a PyTorch tutorial.\n",
+ "There are many great tutorials online, including the [\"60-min blitz\"](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) on the official [PyTorch website](https://pytorch.org/tutorials/).\n",
+ "Yet, we choose to create our own tutorial which is designed to give you the basics particularly necessary for the practicals, but still understand how PyTorch works under the hood.\n",
+ "Over the next few weeks, we will also keep exploring new PyTorch features in the series of Jupyter notebook tutorials about deep learning.\n",
+ "\n",
+ "We will use a set of standard libraries that are often used in machine learning projects.\n",
+ "If you are running this notebook on Google Colab, all libraries should be pre-installed.\n",
+ "If you are running this notebook locally, make sure you have installed our `dl2020` environment ([link](https://github.com/uvadlc/uvadlc_practicals_2020/blob/master/environment.yml)) and have activated it."
+ ],
+ "id": "9473f942"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:45.924885Z",
+ "iopub.status.busy": "2021-09-16T12:32:45.924409Z",
+ "iopub.status.idle": "2021-09-16T12:32:45.927196Z",
+ "shell.execute_reply": "2021-09-16T12:32:45.926697Z"
+ },
+ "id": "a1f58dc1",
+ "lines_to_next_cell": 0,
+ "papermill": {
+ "duration": 0.048784,
+ "end_time": "2021-09-16T12:32:45.927310",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:45.878526",
+ "status": "completed"
+ },
+ "tags": []
+ },
+ "source": [
+ "# ! pip install --quiet \"torchmetrics>=0.3\" \"matplotlib\" \"torch>=1.6, <1.9\" \"pytorch-lightning>=1.3\""
+ ],
+ "id": "a1f58dc1",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:46.106552Z",
+ "iopub.status.busy": "2021-09-16T12:32:46.106081Z",
+ "iopub.status.idle": "2021-09-16T12:32:46.889364Z",
+ "shell.execute_reply": "2021-09-16T12:32:46.889833Z"
+ },
+ "papermill": {
+ "duration": 0.833305,
+ "end_time": "2021-09-16T12:32:46.889977",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:46.056672",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "edcfda76"
+ },
+ "source": [
+ "import time\n",
+ "\n",
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import torch\n",
+ "import torch.nn as nn\n",
+ "import torch.utils.data as data\n",
+ "\n",
+ "# %matplotlib inline\n",
+ "from IPython.display import set_matplotlib_formats\n",
+ "from matplotlib.colors import to_rgba\n",
+ "from tqdm.notebook import tqdm # Progress bar\n",
+ "\n",
+ "#EM HAD TO UPDATE and import the following 2 new lines to handle the deprecated \"set_matplotlib_inline\" by importing new packages (line 16) and attributes (line 17), showm immediately below, and THEN running it properly on the next line \n",
+ "\n",
+ "import matplotlib_inline\n",
+ "import matplotlib_inline.backend_inline\n",
+ "\n",
+ "matplotlib_inline.backend_inline.set_matplotlib_formats(\"svg\",\"pdf\")\n",
+ "#EM DEPRECATED: set_matplotlib_formats(\"svg\", \"pdf\")"
+ ],
+ "id": "edcfda76",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.042985,
+ "end_time": "2021-09-16T12:32:46.977014",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:46.934029",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "d9625e8f"
+ },
+ "source": [
+ "## The Basics of PyTorch\n",
+ "\n",
+ "We will start with reviewing the very basic concepts of PyTorch.\n",
+ "As a prerequisite, we recommend to be familiar with the `numpy` package as most machine learning frameworks are based on very similar concepts.\n",
+ "If you are not familiar with numpy yet, don't worry: here is a [tutorial](https://numpy.org/devdocs/user/quickstart.html) to go through.\n",
+ "\n",
+ "So, let's start with importing PyTorch.\n",
+ "The package is called `torch`, based on its original framework [Torch](http://torch.ch/).\n",
+ "As a first step, we can check its version:"
+ ],
+ "id": "d9625e8f"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:47.065771Z",
+ "iopub.status.busy": "2021-09-16T12:32:47.065287Z",
+ "iopub.status.idle": "2021-09-16T12:32:47.067941Z",
+ "shell.execute_reply": "2021-09-16T12:32:47.067546Z"
+ },
+ "papermill": {
+ "duration": 0.048411,
+ "end_time": "2021-09-16T12:32:47.068040",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:47.019629",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "eb6179df",
+ "outputId": "b5feae1c-ec97-4692-81f8-2421ea095835"
+ },
+ "source": [
+ "print(\"Using torch\", torch.__version__)"
+ ],
+ "id": "eb6179df",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Using torch 1.8.1+cu102\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.04343,
+ "end_time": "2021-09-16T12:32:47.154839",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:47.111409",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "8a8171cd"
+ },
+ "source": [
+ "At the time of writing this tutorial (mid of August 2021), the current stable version is 1.9.\n",
+ "You should therefore see the output `Using torch 1.9.0`, eventually with some extension for the CUDA version on Colab.\n",
+ "In case you use the `dl2020` environment, you should see `Using torch 1.6.0` since the environment was provided in October 2020.\n",
+ "It is recommended to update the PyTorch version to the newest one.\n",
+ "If you see a lower version number than 1.6, make sure you have installed the correct the environment, or ask one of your TAs.\n",
+ "In case PyTorch 1.10 or newer will be published during the time of the course, don't worry.\n",
+ "The interface between PyTorch versions doesn't change too much, and hence all code should also be runnable with newer versions.\n",
+ "\n",
+ "As in every machine learning framework, PyTorch provides functions that are stochastic like generating random numbers.\n",
+ "However, a very good practice is to setup your code to be reproducible with the exact same random numbers.\n",
+ "This is why we set a seed below."
+ ],
+ "id": "8a8171cd"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:47.245406Z",
+ "iopub.status.busy": "2021-09-16T12:32:47.244938Z",
+ "iopub.status.idle": "2021-09-16T12:32:47.249667Z",
+ "shell.execute_reply": "2021-09-16T12:32:47.250066Z"
+ },
+ "papermill": {
+ "duration": 0.050609,
+ "end_time": "2021-09-16T12:32:47.250178",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:47.199569",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "5d3e8fcb",
+ "outputId": "207fbd40-2db7-4673-9090-9531f80e1042"
+ },
+ "source": [
+ "torch.manual_seed(42) # Setting the seed"
+ ],
+ "id": "5d3e8fcb",
+ "execution_count": null,
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.043829,
+ "end_time": "2021-09-16T12:32:47.337473",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:47.293644",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "6f28366d"
+ },
+ "source": [
+ "### Tensors\n",
+ "\n",
+ "Tensors are the PyTorch equivalent to Numpy arrays, with the addition to also have support for GPU acceleration (more on that later).\n",
+ "The name \"tensor\" is a generalization of concepts you already know.\n",
+ "For instance, a vector is a 1-D tensor, and a matrix a 2-D tensor.\n",
+ "When working with neural networks, we will use tensors of various shapes and number of dimensions.\n",
+ "\n",
+ "Most common functions you know from numpy can be used on tensors as well.\n",
+ "Actually, since numpy arrays are so similar to tensors, we can convert most tensors to numpy arrays (and back) but we don't need it too often.\n",
+ "\n",
+ "#### Initialization\n",
+ "\n",
+ "Let's first start by looking at different ways of creating a tensor.\n",
+ "There are many possible options, the most simple one is to call\n",
+ "`torch.Tensor` passing the desired shape as input argument:"
+ ],
+ "id": "6f28366d"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:47.428074Z",
+ "iopub.status.busy": "2021-09-16T12:32:47.427607Z",
+ "iopub.status.idle": "2021-09-16T12:32:47.431411Z",
+ "shell.execute_reply": "2021-09-16T12:32:47.430883Z"
+ },
+ "papermill": {
+ "duration": 0.050551,
+ "end_time": "2021-09-16T12:32:47.431515",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:47.380964",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "7d6bab63",
+ "outputId": "c758273b-e053-4d33-b4f1-8972fe8cd99f"
+ },
+ "source": [
+ "x = torch.Tensor(2, 3, 4)\n",
+ "print(x)"
+ ],
+ "id": "7d6bab63",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "tensor([[[7.3697e+28, 2.7869e+29, 4.3059e+21, 6.9768e+22],\n",
+ " [6.8612e+22, 4.6114e+24, 3.0186e+32, 4.5434e+30],\n",
+ " [1.9519e-19, 7.4934e+28, 8.9068e-15, 5.6284e-14]],\n",
+ "\n",
+ " [[2.0618e-19, 1.0901e+27, 2.0532e-19, 1.7440e+28],\n",
+ " [1.2997e+34, 6.8608e+22, 4.7473e+27, 2.0532e-19],\n",
+ " [3.1771e+30, 7.2442e+22, 1.6931e+22, 1.1022e+24]]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.043882,
+ "end_time": "2021-09-16T12:32:47.519551",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:47.475669",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "2b19df67"
+ },
+ "source": [
+ "The function `torch.Tensor` allocates memory for the desired tensor, but reuses any values that have already been in the memory.\n",
+ "To directly assign values to the tensor during initialization, there are many alternatives including:\n",
+ "\n",
+ "* `torch.zeros`: Creates a tensor filled with zeros\n",
+ "* `torch.ones`: Creates a tensor filled with ones\n",
+ "* `torch.rand`: Creates a tensor with random values uniformly sampled between 0 and 1\n",
+ "* `torch.randn`: Creates a tensor with random values sampled from a normal distribution with mean 0 and variance 1\n",
+ "* `torch.arange`: Creates a tensor containing the values $N,N+1,N+2,...,M$\n",
+ "* `torch.Tensor` (input list): Creates a tensor from the list elements you provide"
+ ],
+ "id": "2b19df67"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:47.611131Z",
+ "iopub.status.busy": "2021-09-16T12:32:47.610668Z",
+ "iopub.status.idle": "2021-09-16T12:32:47.623382Z",
+ "shell.execute_reply": "2021-09-16T12:32:47.622915Z"
+ },
+ "papermill": {
+ "duration": 0.060116,
+ "end_time": "2021-09-16T12:32:47.623485",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:47.563369",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "45fe1b7e",
+ "outputId": "f9c44389-af89-485f-9136-1993b41f98b4"
+ },
+ "source": [
+ "# Create a tensor from a (nested) list\n",
+ "x = torch.Tensor([[1, 2], [3, 4]])\n",
+ "print(x)"
+ ],
+ "id": "45fe1b7e",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "tensor([[1., 2.],\n",
+ " [3., 4.]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:47.717600Z",
+ "iopub.status.busy": "2021-09-16T12:32:47.717128Z",
+ "iopub.status.idle": "2021-09-16T12:32:47.720139Z",
+ "shell.execute_reply": "2021-09-16T12:32:47.719670Z"
+ },
+ "papermill": {
+ "duration": 0.052738,
+ "end_time": "2021-09-16T12:32:47.720244",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:47.667506",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "76f8e1f5",
+ "outputId": "5e507b04-8239-4761-ecdc-3209b6d26279"
+ },
+ "source": [
+ "# Create a tensor with random values between 0 and 1 with the shape [2, 3, 4]\n",
+ "x = torch.rand(2, 3, 4)\n",
+ "print(x)"
+ ],
+ "id": "76f8e1f5",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "tensor([[[0.8823, 0.9150, 0.3829, 0.9593],\n",
+ " [0.3904, 0.6009, 0.2566, 0.7936],\n",
+ " [0.9408, 0.1332, 0.9346, 0.5936]],\n",
+ "\n",
+ " [[0.8694, 0.5677, 0.7411, 0.4294],\n",
+ " [0.8854, 0.5739, 0.2666, 0.6274],\n",
+ " [0.2696, 0.4414, 0.2969, 0.8317]]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.044337,
+ "end_time": "2021-09-16T12:32:47.809374",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:47.765037",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "f2c84d7c"
+ },
+ "source": [
+ "You can obtain the shape of a tensor in the same way as in numpy (`x.shape`), or using the `.size` method:"
+ ],
+ "id": "f2c84d7c"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:47.902716Z",
+ "iopub.status.busy": "2021-09-16T12:32:47.900874Z",
+ "iopub.status.idle": "2021-09-16T12:32:47.906006Z",
+ "shell.execute_reply": "2021-09-16T12:32:47.905588Z"
+ },
+ "papermill": {
+ "duration": 0.05197,
+ "end_time": "2021-09-16T12:32:47.906110",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:47.854140",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "b9738fb0",
+ "outputId": "eccc9bc4-660a-40bd-d539-1757d4caef32"
+ },
+ "source": [
+ "shape = x.shape\n",
+ "print(\"Shape:\", x.shape)\n",
+ "\n",
+ "size = x.size()\n",
+ "print(\"Size:\", size)\n",
+ "\n",
+ "dim1, dim2, dim3 = x.size()\n",
+ "print(\"Size:\", dim1, dim2, dim3)"
+ ],
+ "id": "b9738fb0",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Shape: torch.Size([2, 3, 4])\n",
+ "Size: torch.Size([2, 3, 4])\n",
+ "Size: 2 3 4\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.045873,
+ "end_time": "2021-09-16T12:32:47.996974",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:47.951101",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "0e9401d0"
+ },
+ "source": [
+ "#### Tensor to Numpy, and Numpy to Tensor\n",
+ "\n",
+ "Tensors can be converted to numpy arrays, and numpy arrays back to tensors.\n",
+ "To transform a numpy array into a tensor, we can use the function `torch.from_numpy`:"
+ ],
+ "id": "0e9401d0"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:48.091606Z",
+ "iopub.status.busy": "2021-09-16T12:32:48.091139Z",
+ "iopub.status.idle": "2021-09-16T12:32:48.093695Z",
+ "shell.execute_reply": "2021-09-16T12:32:48.094094Z"
+ },
+ "papermill": {
+ "duration": 0.052501,
+ "end_time": "2021-09-16T12:32:48.094216",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:48.041715",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "e0670ce6",
+ "outputId": "02c97d56-0c83-460a-83bd-9f992491095d"
+ },
+ "source": [
+ "np_arr = np.array([[1, 2], [3, 4]])\n",
+ "tensor = torch.from_numpy(np_arr)\n",
+ "\n",
+ "print(\"Numpy array:\", np_arr)\n",
+ "print(\"PyTorch tensor:\", tensor)"
+ ],
+ "id": "e0670ce6",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Numpy array: [[1 2]\n",
+ " [3 4]]\n",
+ "PyTorch tensor: tensor([[1, 2],\n",
+ " [3, 4]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.045581,
+ "end_time": "2021-09-16T12:32:49.246779",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:49.201198",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "98f64f88"
+ },
+ "source": [
+ "To transform a PyTorch tensor back to a numpy array, we can use the function `.numpy()` on tensors:"
+ ],
+ "id": "98f64f88"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:49.340766Z",
+ "iopub.status.busy": "2021-09-16T12:32:49.340292Z",
+ "iopub.status.idle": "2021-09-16T12:32:49.343538Z",
+ "shell.execute_reply": "2021-09-16T12:32:49.343139Z"
+ },
+ "papermill": {
+ "duration": 0.05169,
+ "end_time": "2021-09-16T12:32:49.343640",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:49.291950",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "fb2c4b46",
+ "outputId": "f62d0660-5b03-4785-d05e-97500270a22d"
+ },
+ "source": [
+ "tensor = torch.arange(4)\n",
+ "np_arr = tensor.numpy()\n",
+ "\n",
+ "print(\"PyTorch tensor:\", tensor)\n",
+ "print(\"Numpy array:\", np_arr)"
+ ],
+ "id": "fb2c4b46",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "PyTorch tensor: tensor([0, 1, 2, 3])\n",
+ "Numpy array: [0 1 2 3]\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.051428,
+ "end_time": "2021-09-16T12:32:49.440442",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:49.389014",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "d31dc267"
+ },
+ "source": [
+ "The conversion of tensors to numpy require the tensor to be on the CPU, and not the GPU (more on GPU support in a later section).\n",
+ "In case you have a tensor on GPU, you need to call `.cpu()` on the tensor beforehand.\n",
+ "Hence, you get a line like `np_arr = tensor.cpu().numpy()`."
+ ],
+ "id": "d31dc267"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.045401,
+ "end_time": "2021-09-16T12:32:49.530975",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:49.485574",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "46a6ef01"
+ },
+ "source": [
+ "#### Operations\n",
+ "\n",
+ "Most operations that exist in numpy, also exist in PyTorch.\n",
+ "A full list of operations can be found in the [PyTorch documentation](https://pytorch.org/docs/stable/tensors.html#), but we will review the most important ones here.\n",
+ "\n",
+ "The simplest operation is to add two tensors:"
+ ],
+ "id": "46a6ef01"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:49.625900Z",
+ "iopub.status.busy": "2021-09-16T12:32:49.625408Z",
+ "iopub.status.idle": "2021-09-16T12:32:49.629396Z",
+ "shell.execute_reply": "2021-09-16T12:32:49.629792Z"
+ },
+ "papermill": {
+ "duration": 0.053783,
+ "end_time": "2021-09-16T12:32:49.629915",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:49.576132",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "13f957c9",
+ "outputId": "503387cb-4c45-4a9b-ea9e-212268a017ff"
+ },
+ "source": [
+ "x1 = torch.rand(2, 3)\n",
+ "x2 = torch.rand(2, 3)\n",
+ "y = x1 + x2\n",
+ "\n",
+ "print(\"X1\", x1)\n",
+ "print(\"X2\", x2)\n",
+ "print(\"Y\", y)"
+ ],
+ "id": "13f957c9",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "X1 tensor([[0.1053, 0.2695, 0.3588],\n",
+ " [0.1994, 0.5472, 0.0062]])\n",
+ "X2 tensor([[0.9516, 0.0753, 0.8860],\n",
+ " [0.5832, 0.3376, 0.8090]])\n",
+ "Y tensor([[1.0569, 0.3448, 1.2448],\n",
+ " [0.7826, 0.8848, 0.8151]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.047584,
+ "end_time": "2021-09-16T12:32:49.724517",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:49.676933",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "4fbd2538"
+ },
+ "source": [
+ "Calling `x1 + x2` creates a new tensor containing the sum of the two inputs.\n",
+ "However, we can also use in-place operations that are applied directly on the memory of a tensor.\n",
+ "We therefore change the values of `x2` without the chance to re-accessing the values of `x2` before the operation.\n",
+ "An example is shown below:"
+ ],
+ "id": "4fbd2538"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:49.823070Z",
+ "iopub.status.busy": "2021-09-16T12:32:49.822399Z",
+ "iopub.status.idle": "2021-09-16T12:32:49.828109Z",
+ "shell.execute_reply": "2021-09-16T12:32:49.827637Z"
+ },
+ "papermill": {
+ "duration": 0.055272,
+ "end_time": "2021-09-16T12:32:49.828214",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:49.772942",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "0e6a7497",
+ "outputId": "d0aab7ca-4ac1-42c9-94d7-214c4d52d30a"
+ },
+ "source": [
+ "x1 = torch.rand(2, 3)\n",
+ "x2 = torch.rand(2, 3)\n",
+ "print(\"X1 (before)\", x1)\n",
+ "print(\"X2 (before)\", x2)\n",
+ "\n",
+ "x2.add_(x1)\n",
+ "print(\"X1 (after)\", x1)\n",
+ "print(\"X2 (after)\", x2)"
+ ],
+ "id": "0e6a7497",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "X1 (before) tensor([[0.5779, 0.9040, 0.5547],\n",
+ " [0.3423, 0.6343, 0.3644]])\n",
+ "X2 (before) tensor([[0.7104, 0.9464, 0.7890],\n",
+ " [0.2814, 0.7886, 0.5895]])\n",
+ "X1 (after) tensor([[0.5779, 0.9040, 0.5547],\n",
+ " [0.3423, 0.6343, 0.3644]])\n",
+ "X2 (after) tensor([[1.2884, 1.8504, 1.3437],\n",
+ " [0.6237, 1.4230, 0.9539]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.046964,
+ "end_time": "2021-09-16T12:32:49.921617",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:49.874653",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "5cbaa92f"
+ },
+ "source": [
+ "In-place operations are usually marked with a underscore postfix (e.g. \"add_\" instead of \"add\").\n",
+ "\n",
+ "Another common operation aims at changing the shape of a tensor.\n",
+ "A tensor of size (2,3) can be re-organized to any other shape with the same number of elements (e.g. a tensor of size (6), or (3,2), ...).\n",
+ "In PyTorch, this operation is called `view`:"
+ ],
+ "id": "5cbaa92f"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:50.026878Z",
+ "iopub.status.busy": "2021-09-16T12:32:50.026353Z",
+ "iopub.status.idle": "2021-09-16T12:32:50.029094Z",
+ "shell.execute_reply": "2021-09-16T12:32:50.028625Z"
+ },
+ "papermill": {
+ "duration": 0.06096,
+ "end_time": "2021-09-16T12:32:50.029199",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:49.968239",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "6907851d",
+ "outputId": "06cd570d-7f0b-478f-c72a-6cf1eb1f4bc2"
+ },
+ "source": [
+ "x = torch.arange(6)\n",
+ "print(\"X\", x)"
+ ],
+ "id": "6907851d",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "X tensor([0, 1, 2, 3, 4, 5])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:50.136129Z",
+ "iopub.status.busy": "2021-09-16T12:32:50.135661Z",
+ "iopub.status.idle": "2021-09-16T12:32:50.138426Z",
+ "shell.execute_reply": "2021-09-16T12:32:50.137963Z"
+ },
+ "papermill": {
+ "duration": 0.054742,
+ "end_time": "2021-09-16T12:32:50.138528",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:50.083786",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "252fa33f",
+ "outputId": "223dccac-9220-47cb-88f2-0f5fd135618a"
+ },
+ "source": [
+ "x = x.view(2, 3)\n",
+ "print(\"X\", x)"
+ ],
+ "id": "252fa33f",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "X tensor([[0, 1, 2],\n",
+ " [3, 4, 5]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:50.239003Z",
+ "iopub.status.busy": "2021-09-16T12:32:50.238537Z",
+ "iopub.status.idle": "2021-09-16T12:32:50.241342Z",
+ "shell.execute_reply": "2021-09-16T12:32:50.240878Z"
+ },
+ "papermill": {
+ "duration": 0.053431,
+ "end_time": "2021-09-16T12:32:50.241439",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:50.188008",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "72e32ecb",
+ "outputId": "e9acaf48-2821-4d8e-b89b-2ae76d7f0d2e"
+ },
+ "source": [
+ "x = x.permute(1, 0) # Swapping dimension 0 and 1\n",
+ "print(\"X\", x)"
+ ],
+ "id": "72e32ecb",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "X tensor([[0, 3],\n",
+ " [1, 4],\n",
+ " [2, 5]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.04706,
+ "end_time": "2021-09-16T12:32:50.335409",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:50.288349",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "cde67e9c"
+ },
+ "source": [
+ "Other commonly used operations include matrix multiplications, which are essential for neural networks.\n",
+ "Quite often, we have an input vector $\\mathbf{x}$, which is transformed using a learned weight matrix $\\mathbf{W}$.\n",
+ "There are multiple ways and functions to perform matrix multiplication, some of which we list below:\n",
+ "\n",
+ "* `torch.matmul`: Performs the matrix product over two tensors, where the specific behavior depends on the dimensions.\n",
+ "If both inputs are matrices (2-dimensional tensors), it performs the standard matrix product.\n",
+ "For higher dimensional inputs, the function supports broadcasting (for details see the [documentation](https://pytorch.org/docs/stable/generated/torch.matmul.html?highlight=matmul#torch.matmul)).\n",
+ "Can also be written as `a @ b`, similar to numpy.\n",
+ "* `torch.mm`: Performs the matrix product over two matrices, but doesn't support broadcasting (see [documentation](https://pytorch.org/docs/stable/generated/torch.mm.html?highlight=torch%20mm#torch.mm))\n",
+ "* `torch.bmm`: Performs the matrix product with a support batch dimension.\n",
+ "If the first tensor $T$ is of shape ($b\\times n\\times m$), and the second tensor $R$ ($b\\times m\\times p$), the output $O$ is of shape ($b\\times n\\times p$), and has been calculated by performing $b$ matrix multiplications of the submatrices of $T$ and $R$: $O_i = T_i @ R_i$\n",
+ "* `torch.einsum`: Performs matrix multiplications and more (i.e. sums of products) using the Einstein summation convention.\n",
+ "Explanation of the Einstein sum can be found in assignment 1.\n",
+ "\n",
+ "Usually, we use `torch.matmul` or `torch.bmm`. We can try a matrix multiplication with `torch.matmul` below."
+ ],
+ "id": "cde67e9c"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:50.432080Z",
+ "iopub.status.busy": "2021-09-16T12:32:50.431615Z",
+ "iopub.status.idle": "2021-09-16T12:32:50.434705Z",
+ "shell.execute_reply": "2021-09-16T12:32:50.434244Z"
+ },
+ "papermill": {
+ "duration": 0.052861,
+ "end_time": "2021-09-16T12:32:50.434804",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:50.381943",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "ff386c27",
+ "outputId": "5fc6dccf-55bd-4ed8-b06d-17f0567da5ac"
+ },
+ "source": [
+ "x = torch.arange(6)\n",
+ "x = x.view(2, 3)\n",
+ "print(\"X\", x)"
+ ],
+ "id": "ff386c27",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "X tensor([[0, 1, 2],\n",
+ " [3, 4, 5]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:50.533427Z",
+ "iopub.status.busy": "2021-09-16T12:32:50.532966Z",
+ "iopub.status.idle": "2021-09-16T12:32:50.535803Z",
+ "shell.execute_reply": "2021-09-16T12:32:50.535338Z"
+ },
+ "papermill": {
+ "duration": 0.054221,
+ "end_time": "2021-09-16T12:32:50.535901",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:50.481680",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "8c1795af",
+ "outputId": "0e937cff-487b-467b-890e-8b91cccf4913"
+ },
+ "source": [
+ "W = torch.arange(9).view(3, 3) # We can also stack multiple operations in a single line\n",
+ "print(\"W\", W)"
+ ],
+ "id": "8c1795af",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "W tensor([[0, 1, 2],\n",
+ " [3, 4, 5],\n",
+ " [6, 7, 8]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:50.633621Z",
+ "iopub.status.busy": "2021-09-16T12:32:50.633158Z",
+ "iopub.status.idle": "2021-09-16T12:32:50.635999Z",
+ "shell.execute_reply": "2021-09-16T12:32:50.635506Z"
+ },
+ "papermill": {
+ "duration": 0.052906,
+ "end_time": "2021-09-16T12:32:50.636097",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:50.583191",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "4dddd17e",
+ "outputId": "e0123356-0b94-487b-b600-d02f396d5075"
+ },
+ "source": [
+ "h = torch.matmul(x, W) # Verify the result by calculating it by hand too!\n",
+ "print(\"h\", h)"
+ ],
+ "id": "4dddd17e",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "h tensor([[15, 18, 21],\n",
+ " [42, 54, 66]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.048142,
+ "end_time": "2021-09-16T12:32:50.732093",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:50.683951",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "72d026f9"
+ },
+ "source": [
+ "#### Indexing\n",
+ "\n",
+ "We often have the situation where we need to select a part of a tensor.\n",
+ "Indexing works just like in numpy, so let's try it:"
+ ],
+ "id": "72d026f9"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:50.831818Z",
+ "iopub.status.busy": "2021-09-16T12:32:50.831358Z",
+ "iopub.status.idle": "2021-09-16T12:32:50.834223Z",
+ "shell.execute_reply": "2021-09-16T12:32:50.833827Z"
+ },
+ "papermill": {
+ "duration": 0.054078,
+ "end_time": "2021-09-16T12:32:50.834321",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:50.780243",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "b44382eb",
+ "outputId": "aed72501-e9e9-4996-dab0-66f2211f390c"
+ },
+ "source": [
+ "x = torch.arange(12).view(3, 4)\n",
+ "print(\"X\", x)"
+ ],
+ "id": "b44382eb",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "X tensor([[ 0, 1, 2, 3],\n",
+ " [ 4, 5, 6, 7],\n",
+ " [ 8, 9, 10, 11]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:50.933203Z",
+ "iopub.status.busy": "2021-09-16T12:32:50.932738Z",
+ "iopub.status.idle": "2021-09-16T12:32:50.935142Z",
+ "shell.execute_reply": "2021-09-16T12:32:50.934747Z"
+ },
+ "papermill": {
+ "duration": 0.05308,
+ "end_time": "2021-09-16T12:32:50.935240",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:50.882160",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "e797f3da",
+ "outputId": "c53f9120-9e75-4b30-907a-206679e82791"
+ },
+ "source": [
+ "print(x[:, 1]) # Second column"
+ ],
+ "id": "e797f3da",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "tensor([1, 5, 9])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:51.035597Z",
+ "iopub.status.busy": "2021-09-16T12:32:51.035133Z",
+ "iopub.status.idle": "2021-09-16T12:32:51.037860Z",
+ "shell.execute_reply": "2021-09-16T12:32:51.037378Z"
+ },
+ "papermill": {
+ "duration": 0.053815,
+ "end_time": "2021-09-16T12:32:51.037961",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:50.984146",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "832fa534",
+ "outputId": "578ac154-c19e-4acd-9e5d-8a2d2dd5b8a2"
+ },
+ "source": [
+ "print(x[0]) # First row"
+ ],
+ "id": "832fa534",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "tensor([0, 1, 2, 3])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:51.138719Z",
+ "iopub.status.busy": "2021-09-16T12:32:51.138254Z",
+ "iopub.status.idle": "2021-09-16T12:32:51.140664Z",
+ "shell.execute_reply": "2021-09-16T12:32:51.140201Z"
+ },
+ "papermill": {
+ "duration": 0.053829,
+ "end_time": "2021-09-16T12:32:51.140762",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:51.086933",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "554196e9",
+ "outputId": "10730a61-9499-4cb4-966b-7beb994a547d"
+ },
+ "source": [
+ "print(x[:2, -1]) # First two rows, last column"
+ ],
+ "id": "554196e9",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "tensor([3, 7])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:51.242836Z",
+ "iopub.status.busy": "2021-09-16T12:32:51.242376Z",
+ "iopub.status.idle": "2021-09-16T12:32:51.245113Z",
+ "shell.execute_reply": "2021-09-16T12:32:51.244657Z"
+ },
+ "papermill": {
+ "duration": 0.054275,
+ "end_time": "2021-09-16T12:32:51.245210",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:51.190935",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "2efaee3a",
+ "outputId": "4589b195-ef4d-4fb1-a51a-64c374e68484"
+ },
+ "source": [
+ "print(x[1:3, :]) # Middle two rows"
+ ],
+ "id": "2efaee3a",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "tensor([[ 4, 5, 6, 7],\n",
+ " [ 8, 9, 10, 11]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.049087,
+ "end_time": "2021-09-16T12:32:51.343540",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:51.294453",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "e1a9591f"
+ },
+ "source": [
+ "### Dynamic Computation Graph and Backpropagation\n",
+ "\n",
+ "One of the main reasons for using PyTorch in Deep Learning projects is that we can automatically get **gradients/derivatives** of functions that we define.\n",
+ "We will mainly use PyTorch for implementing neural networks, and they are just fancy functions.\n",
+ "If we use weight matrices in our function that we want to learn, then those are called the **parameters** or simply the **weights**.\n",
+ "\n",
+ "If our neural network would output a single scalar value, we would talk about taking the **derivative**, but you will see that quite often we will have **multiple** output variables (\"values\"); in that case we talk about **gradients**.\n",
+ "It's a more general term.\n",
+ "\n",
+ "Given an input $\\mathbf{x}$, we define our function by **manipulating** that input, usually by matrix-multiplications with weight matrices and additions with so-called bias vectors.\n",
+ "As we manipulate our input, we are automatically creating a **computational graph**.\n",
+ "This graph shows how to arrive at our output from our input.\n",
+ "PyTorch is a **define-by-run** framework; this means that we can just do our manipulations, and PyTorch will keep track of that graph for us.\n",
+ "Thus, we create a dynamic computation graph along the way.\n",
+ "\n",
+ "So, to recap: the only thing we have to do is to compute the **output**, and then we can ask PyTorch to automatically get the **gradients**.\n",
+ "\n",
+ "> **Note: Why do we want gradients?\n",
+ "** Consider that we have defined a function, a neural net, that is supposed to compute a certain output $y$ for an input vector $\\mathbf{x}$.\n",
+ "We then define an **error measure** that tells us how wrong our network is; how bad it is in predicting output $y$ from input $\\mathbf{x}$.\n",
+ "Based on this error measure, we can use the gradients to **update** the weights $\\mathbf{W}$ that were responsible for the output, so that the next time we present input $\\mathbf{x}$ to our network, the output will be closer to what we want.\n",
+ "\n",
+ "The first thing we have to do is to specify which tensors require gradients.\n",
+ "By default, when we create a tensor, it does not require gradients."
+ ],
+ "id": "e1a9591f"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:51.444570Z",
+ "iopub.status.busy": "2021-09-16T12:32:51.443320Z",
+ "iopub.status.idle": "2021-09-16T12:32:51.447126Z",
+ "shell.execute_reply": "2021-09-16T12:32:51.446665Z"
+ },
+ "papermill": {
+ "duration": 0.054607,
+ "end_time": "2021-09-16T12:32:51.447227",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:51.392620",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "9f94399d",
+ "outputId": "9a315d48-5385-4e97-953d-574848a420e4"
+ },
+ "source": [
+ "x = torch.ones((3,))\n",
+ "print(x.requires_grad)"
+ ],
+ "id": "9f94399d",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "False\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.049292,
+ "end_time": "2021-09-16T12:32:51.546032",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:51.496740",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "12697ab1"
+ },
+ "source": [
+ "We can change this for an existing tensor using the function `requires_grad_()` (underscore indicating that this is a in-place operation).\n",
+ "Alternatively, when creating a tensor, you can pass the argument\n",
+ "`requires_grad=True` to most initializers we have seen above."
+ ],
+ "id": "12697ab1"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:51.647892Z",
+ "iopub.status.busy": "2021-09-16T12:32:51.647430Z",
+ "iopub.status.idle": "2021-09-16T12:32:51.649913Z",
+ "shell.execute_reply": "2021-09-16T12:32:51.649498Z"
+ },
+ "papermill": {
+ "duration": 0.05454,
+ "end_time": "2021-09-16T12:32:51.650014",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:51.595474",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "7d565264",
+ "outputId": "d1359e2c-57c8-4e76-9632-dedeb388e9cc"
+ },
+ "source": [
+ "x.requires_grad_(True)\n",
+ "print(x.requires_grad)"
+ ],
+ "id": "7d565264",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "True\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.050272,
+ "end_time": "2021-09-16T12:32:51.750030",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:51.699758",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "a16f495e"
+ },
+ "source": [
+ "In order to get familiar with the concept of a computation graph, we will create one for the following function:\n",
+ "\n",
+ "$$y = \\frac{1}{|x|}\\sum_i \\left[(x_i + 2)^2 + 3\\right]$$\n",
+ "\n",
+ "You could imagine that $x$ are our parameters, and we want to optimize (either maximize or minimize) the output $y$.\n",
+ "For this, we want to obtain the gradients $\\partial y / \\partial \\mathbf{x}$.\n",
+ "For our example, we'll use $\\mathbf{x}=[0,1,2]$ as our input."
+ ],
+ "id": "a16f495e"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:51.853091Z",
+ "iopub.status.busy": "2021-09-16T12:32:51.852629Z",
+ "iopub.status.idle": "2021-09-16T12:32:51.855635Z",
+ "shell.execute_reply": "2021-09-16T12:32:51.855175Z"
+ },
+ "papermill": {
+ "duration": 0.055874,
+ "end_time": "2021-09-16T12:32:51.855735",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:51.799861",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "abd9c738",
+ "outputId": "e7e68aea-7afb-4d35-9e83-eb98cbc0c1cc"
+ },
+ "source": [
+ "x = torch.arange(3, dtype=torch.float32, requires_grad=True) # Only float tensors can have gradients\n",
+ "print(\"X\", x)"
+ ],
+ "id": "abd9c738",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "X tensor([0., 1., 2.], requires_grad=True)\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.05566,
+ "end_time": "2021-09-16T12:32:51.961765",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:51.906105",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "548b420c"
+ },
+ "source": [
+ "Now let's build the computation graph step by step.\n",
+ "You can combine multiple operations in a single line, but we will\n",
+ "separate them here to get a better understanding of how each operation\n",
+ "is added to the computation graph."
+ ],
+ "id": "548b420c"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:52.072120Z",
+ "iopub.status.busy": "2021-09-16T12:32:52.071647Z",
+ "iopub.status.idle": "2021-09-16T12:32:52.074610Z",
+ "shell.execute_reply": "2021-09-16T12:32:52.074989Z"
+ },
+ "papermill": {
+ "duration": 0.056246,
+ "end_time": "2021-09-16T12:32:52.075114",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:52.018868",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "50d91bf7",
+ "outputId": "9851aea1-fb05-482b-b55f-1ab70bc0bb57"
+ },
+ "source": [
+ "a = x + 2\n",
+ "b = a ** 2\n",
+ "c = b + 3\n",
+ "y = c.mean()\n",
+ "print(\"Y\", y)"
+ ],
+ "id": "50d91bf7",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Y tensor(12.6667, grad_fn=)\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.049612,
+ "end_time": "2021-09-16T12:32:52.175001",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:52.125389",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "e0d2f0ae"
+ },
+ "source": [
+ "Using the statements above, we have created a computation graph that looks similar to the figure below:\n",
+ "\n",
+ "
\n",
+ "\n",
+ "We calculate $a$ based on the inputs $x$ and the constant $2$, $b$ is $a$ squared, and so on.\n",
+ "The visualization is an abstraction of the dependencies between inputs and outputs of the operations we have applied.\n",
+ "Each node of the computation graph has automatically defined a function for calculating the gradients with respect to its inputs, `grad_fn`.\n",
+ "You can see this when we printed the output tensor $y$.\n",
+ "This is why the computation graph is usually visualized in the reverse direction (arrows point from the result to the inputs).\n",
+ "We can perform backpropagation on the computation graph by calling the\n",
+ "function `backward()` on the last output, which effectively calculates\n",
+ "the gradients for each tensor that has the property\n",
+ "`requires_grad=True`:"
+ ],
+ "id": "e0d2f0ae"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:52.278438Z",
+ "iopub.status.busy": "2021-09-16T12:32:52.277977Z",
+ "iopub.status.idle": "2021-09-16T12:32:52.363356Z",
+ "shell.execute_reply": "2021-09-16T12:32:52.362899Z"
+ },
+ "papermill": {
+ "duration": 0.137892,
+ "end_time": "2021-09-16T12:32:52.363476",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:52.225584",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "d7c2de18"
+ },
+ "source": [
+ "y.backward()"
+ ],
+ "id": "d7c2de18",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.050494,
+ "end_time": "2021-09-16T12:32:52.465208",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:52.414714",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "44d67068"
+ },
+ "source": [
+ "`x.grad` will now contain the gradient $\\partial y/ \\partial \\mathcal{x}$, and this gradient indicates how a change in $\\mathbf{x}$ will affect output $y$ given the current input $\\mathbf{x}=[0,1,2]$:"
+ ],
+ "id": "44d67068"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:52.569520Z",
+ "iopub.status.busy": "2021-09-16T12:32:52.569055Z",
+ "iopub.status.idle": "2021-09-16T12:32:52.572034Z",
+ "shell.execute_reply": "2021-09-16T12:32:52.571551Z"
+ },
+ "papermill": {
+ "duration": 0.056348,
+ "end_time": "2021-09-16T12:32:52.572135",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:52.515787",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "58d14f6c",
+ "outputId": "97e37bad-a05c-4028-d7b5-99cfc931f671"
+ },
+ "source": [
+ "print(x.grad)"
+ ],
+ "id": "58d14f6c",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "tensor([1.3333, 2.0000, 2.6667])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.050295,
+ "end_time": "2021-09-16T12:32:52.673692",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:52.623397",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "a180ccfc"
+ },
+ "source": [
+ "We can also verify these gradients by hand.\n",
+ "We will calculate the gradients using the chain rule, in the same way as PyTorch did it:\n",
+ "\n",
+ "$$\\frac{\\partial y}{\\partial x_i} = \\frac{\\partial y}{\\partial c_i}\\frac{\\partial c_i}{\\partial b_i}\\frac{\\partial b_i}{\\partial a_i}\\frac{\\partial a_i}{\\partial x_i}$$\n",
+ "\n",
+ "Note that we have simplified this equation to index notation, and by using the fact that all operation besides the mean do not combine the elements in the tensor.\n",
+ "The partial derivatives are:\n",
+ "\n",
+ "$$\n",
+ "\\frac{\\partial a_i}{\\partial x_i} = 1,\\hspace{1cm}\n",
+ "\\frac{\\partial b_i}{\\partial a_i} = 2\\cdot a_i\\hspace{1cm}\n",
+ "\\frac{\\partial c_i}{\\partial b_i} = 1\\hspace{1cm}\n",
+ "\\frac{\\partial y}{\\partial c_i} = \\frac{1}{3}\n",
+ "$$\n",
+ "\n",
+ "Hence, with the input being $\\mathbf{x}=[0,1,2]$, our gradients are $\\partial y/\\partial \\mathbf{x}=[4/3,2,8/3]$.\n",
+ "The previous code cell should have printed the same result."
+ ],
+ "id": "a180ccfc"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.051077,
+ "end_time": "2021-09-16T12:32:52.777753",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:52.726676",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "804f38e2"
+ },
+ "source": [
+ "### GPU support\n",
+ "\n",
+ "A crucial feature of PyTorch is the support of GPUs, short for Graphics Processing Unit.\n",
+ "A GPU can perform many thousands of small operations in parallel, making it very well suitable for performing large matrix operations in neural networks.\n",
+ "When comparing GPUs to CPUs, we can list the following main differences (credit: [Kevin Krewell, 2009](https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/))\n",
+ "\n",
+ "
\n",
+ "\n",
+ "CPUs and GPUs have both different advantages and disadvantages, which is why many computers contain both components and use them for different tasks.\n",
+ "In case you are not familiar with GPUs, you can read up more details in this [NVIDIA blog post](https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/) or [here](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html).\n",
+ "\n",
+ "GPUs can accelerate the training of your network up to a factor of $100$ which is essential for large neural networks.\n",
+ "PyTorch implements a lot of functionality for supporting GPUs (mostly those of NVIDIA due to the libraries [CUDA](https://developer.nvidia.com/cuda-zone) and [cuDNN](https://developer.nvidia.com/cudnn)).\n",
+ "First, let's check whether you have a GPU available:"
+ ],
+ "id": "804f38e2"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:52.886128Z",
+ "iopub.status.busy": "2021-09-16T12:32:52.885628Z",
+ "iopub.status.idle": "2021-09-16T12:32:52.888249Z",
+ "shell.execute_reply": "2021-09-16T12:32:52.887851Z"
+ },
+ "papermill": {
+ "duration": 0.059327,
+ "end_time": "2021-09-16T12:32:52.888348",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:52.829021",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "be576156",
+ "outputId": "35e4b3dd-c780-4b96-81e7-d3b590c6322d"
+ },
+ "source": [
+ "gpu_avail = torch.cuda.is_available()\n",
+ "print(f\"Is the GPU available? {gpu_avail}\")"
+ ],
+ "id": "be576156",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Is the GPU available? True\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.051392,
+ "end_time": "2021-09-16T12:32:52.990937",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:52.939545",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "44ba6d0f"
+ },
+ "source": [
+ "If you have a GPU on your computer but the command above returns False, make sure you have the correct CUDA-version installed.\n",
+ "The `dl2020` environment comes with the CUDA-toolkit 10.1, which is selected for the Lisa supercomputer.\n",
+ "Please change it if necessary (CUDA 10.2 is currently common).\n",
+ "On Google Colab, make sure that you have selected a GPU in your runtime setup (in the menu, check under `Runtime -> Change runtime type`).\n",
+ "\n",
+ "By default, all tensors you create are stored on the CPU.\n",
+ "We can push a tensor to the GPU by using the function `.to(...)`, or `.cuda()`.\n",
+ "However, it is often a good practice to define a `device` object in your code which points to the GPU if you have one, and otherwise to the CPU.\n",
+ "Then, you can write your code with respect to this device object, and it allows you to run the same code on both a CPU-only system, and one with a GPU.\n",
+ "Let's try it below.\n",
+ "We can specify the device as follows:"
+ ],
+ "id": "44ba6d0f"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:53.096938Z",
+ "iopub.status.busy": "2021-09-16T12:32:53.096464Z",
+ "iopub.status.idle": "2021-09-16T12:32:53.099120Z",
+ "shell.execute_reply": "2021-09-16T12:32:53.098658Z"
+ },
+ "papermill": {
+ "duration": 0.057283,
+ "end_time": "2021-09-16T12:32:53.099221",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:53.041938",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "c1821da3",
+ "outputId": "bc003b3f-93d1-486a-a580-dd094dd0df95"
+ },
+ "source": [
+ "device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n",
+ "print(\"Device\", device)"
+ ],
+ "id": "c1821da3",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Device cuda\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.052772,
+ "end_time": "2021-09-16T12:32:53.204148",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:53.151376",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "c7b99a5d"
+ },
+ "source": [
+ "Now let's create a tensor and push it to the device:"
+ ],
+ "id": "c7b99a5d"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:53.312496Z",
+ "iopub.status.busy": "2021-09-16T12:32:53.312034Z",
+ "iopub.status.idle": "2021-09-16T12:32:55.885460Z",
+ "shell.execute_reply": "2021-09-16T12:32:55.884980Z"
+ },
+ "papermill": {
+ "duration": 2.629406,
+ "end_time": "2021-09-16T12:32:55.885574",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:53.256168",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "be1ce082",
+ "outputId": "dd451975-df98-4f52-8803-5310cca38bf6"
+ },
+ "source": [
+ "x = torch.zeros(2, 3)\n",
+ "x = x.to(device)\n",
+ "print(\"X\", x)"
+ ],
+ "id": "be1ce082",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "X tensor([[0., 0., 0.],\n",
+ " [0., 0., 0.]], device='cuda:0')\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.052118,
+ "end_time": "2021-09-16T12:32:55.989872",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:55.937754",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "e1bb4237"
+ },
+ "source": [
+ "In case you have a GPU, you should now see the attribute `device='cuda:0'` being printed next to your tensor.\n",
+ "The zero next to cuda indicates that this is the zero-th GPU device on your computer.\n",
+ "PyTorch also supports multi-GPU systems, but this you will only need once you have very big networks to train (if interested, see the [PyTorch documentation](https://pytorch.org/docs/stable/distributed.html#distributed-basics)).\n",
+ "We can also compare the runtime of a large matrix multiplication on the CPU with a operation on the GPU:"
+ ],
+ "id": "e1bb4237"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:56.099380Z",
+ "iopub.status.busy": "2021-09-16T12:32:56.098905Z",
+ "iopub.status.idle": "2021-09-16T12:32:56.633065Z",
+ "shell.execute_reply": "2021-09-16T12:32:56.632672Z"
+ },
+ "papermill": {
+ "duration": 0.59052,
+ "end_time": "2021-09-16T12:32:56.633183",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:56.042663",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "097f28c5",
+ "outputId": "afb6477a-8658-4a72-adcf-83cc58bda39e"
+ },
+ "source": [
+ "x = torch.randn(5000, 5000)\n",
+ "\n",
+ "# CPU version\n",
+ "start_time = time.time()\n",
+ "_ = torch.matmul(x, x)\n",
+ "end_time = time.time()\n",
+ "print(f\"CPU time: {(end_time - start_time):6.5f}s\")\n",
+ "\n",
+ "# GPU version\n",
+ "x = x.to(device)\n",
+ "# The first operation on a CUDA device can be slow as it has to establish a CPU-GPU communication first.\n",
+ "# Hence, we run an arbitrary command first without timing it for a fair comparison.\n",
+ "if torch.cuda.is_available():\n",
+ " _ = torch.matmul(x * 0.0, x)\n",
+ "start_time = time.time()\n",
+ "_ = torch.matmul(x, x)\n",
+ "end_time = time.time()\n",
+ "print(f\"GPU time: {(end_time - start_time):6.5f}s\")"
+ ],
+ "id": "097f28c5",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "CPU time: 0.25468s\n",
+ "GPU time: 0.00011s\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.054603,
+ "end_time": "2021-09-16T12:32:56.740740",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:56.686137",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "e6502b2e"
+ },
+ "source": [
+ "Depending on the size of the operation and the CPU/GPU in your system, the speedup of this operation can be >500x.\n",
+ "As `matmul` operations are very common in neural networks, we can already see the great benefit of training a NN on a GPU.\n",
+ "The time estimate can be relatively noisy here because we haven't run it for multiple times.\n",
+ "Feel free to extend this, but it also takes longer to run.\n",
+ "\n",
+ "When generating random numbers, the seed between CPU and GPU is not synchronized.\n",
+ "Hence, we need to set the seed on the GPU separately to ensure a reproducible code.\n",
+ "Note that due to different GPU architectures, running the same code on different GPUs does not guarantee the same random numbers.\n",
+ "Still, we don't want that our code gives us a different output every time we run it on the exact same hardware.\n",
+ "Hence, we also set the seed on the GPU:"
+ ],
+ "id": "e6502b2e"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:56.849316Z",
+ "iopub.status.busy": "2021-09-16T12:32:56.848847Z",
+ "iopub.status.idle": "2021-09-16T12:32:56.850935Z",
+ "shell.execute_reply": "2021-09-16T12:32:56.850475Z"
+ },
+ "papermill": {
+ "duration": 0.057334,
+ "end_time": "2021-09-16T12:32:56.851032",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:56.793698",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "5b767a95"
+ },
+ "source": [
+ "# GPU operations have a separate seed we also want to set\n",
+ "if torch.cuda.is_available():\n",
+ " torch.cuda.manual_seed(42)\n",
+ " torch.cuda.manual_seed_all(42)\n",
+ "\n",
+ "# Additionally, some operations on a GPU are implemented stochastic for efficiency\n",
+ "# We want to ensure that all operations are deterministic on GPU (if used) for reproducibility\n",
+ "torch.backends.cudnn.determinstic = True\n",
+ "torch.backends.cudnn.benchmark = False"
+ ],
+ "id": "5b767a95",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.051866,
+ "end_time": "2021-09-16T12:32:56.955066",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:56.903200",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "f4ca3f5b"
+ },
+ "source": [
+ "## Learning by example: Continuous XOR\n",
+ "
\n",
+ "\n",
+ "If we want to build a neural network in PyTorch, we could specify all our parameters (weight matrices, bias vectors) using `Tensors` (with `requires_grad=True`), ask PyTorch to calculate the gradients and then adjust the parameters.\n",
+ "But things can quickly get cumbersome if we have a lot of parameters.\n",
+ "In PyTorch, there is a package called `torch.nn` that makes building neural networks more convenient.\n",
+ "\n",
+ "We will introduce the libraries and all additional parts you might need to train a neural network in PyTorch, using a simple example classifier on a simple yet well known example: XOR.\n",
+ "Given two binary inputs $x_1$ and $x_2$, the label to predict is $1$ if either $x_1$ or $x_2$ is $1$ while the other is $0$, or the label is $0$ in all other cases.\n",
+ "The example became famous by the fact that a single neuron, i.e. a linear classifier, cannot learn this simple function.\n",
+ "Hence, we will learn how to build a small neural network that can learn this function.\n",
+ "To make it a little bit more interesting, we move the XOR into continuous space and introduce some gaussian noise on the binary inputs.\n",
+ "Our desired separation of an XOR dataset could look as follows:\n",
+ "\n",
+ "
"
+ ],
+ "id": "f4ca3f5b"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.051714,
+ "end_time": "2021-09-16T12:32:57.058731",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:57.007017",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "e23f8eac"
+ },
+ "source": [
+ "### The model\n",
+ "\n",
+ "The package `torch.nn` defines a series of useful classes like linear networks layers, activation functions, loss functions etc.\n",
+ "A full list can be found [here](https://pytorch.org/docs/stable/nn.html).\n",
+ "In case you need a certain network layer, check the documentation of the package first before writing the layer yourself as the package likely contains the code for it already.\n",
+ "We import it below:"
+ ],
+ "id": "e23f8eac"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "lines_to_next_cell": 0,
+ "papermill": {
+ "duration": 0.052216,
+ "end_time": "2021-09-16T12:32:57.162758",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:57.110542",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "8592c856"
+ },
+ "source": [
+ ""
+ ],
+ "id": "8592c856",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "papermill": {
+ "duration": 0.052415,
+ "end_time": "2021-09-16T12:32:57.268259",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:57.215844",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "bf8c706a"
+ },
+ "source": [
+ ""
+ ],
+ "id": "bf8c706a",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.051617,
+ "end_time": "2021-09-16T12:32:57.371727",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:57.320110",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "ba549835"
+ },
+ "source": [
+ "Additionally to `torch.nn`, there is also `torch.nn.functional`.\n",
+ "It contains functions that are used in network layers.\n",
+ "This is in contrast to `torch.nn` which defines them as `nn.Modules` (more on it below), and `torch.nn` actually uses a lot of functionalities from `torch.nn.functional`.\n",
+ "Hence, the functional package is useful in many situations, and so we import it as well here."
+ ],
+ "id": "ba549835"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 2,
+ "papermill": {
+ "duration": 0.052024,
+ "end_time": "2021-09-16T12:32:57.475971",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:57.423947",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "acc7d527"
+ },
+ "source": [
+ "#### nn.Module\n",
+ "\n",
+ "In PyTorch, a neural network is build up out of modules.\n",
+ "Modules can contain other modules, and a neural network is considered to be a module itself as well.\n",
+ "The basic template of a module is as follows:"
+ ],
+ "id": "acc7d527"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:57.583596Z",
+ "iopub.status.busy": "2021-09-16T12:32:57.583131Z",
+ "iopub.status.idle": "2021-09-16T12:32:57.585190Z",
+ "shell.execute_reply": "2021-09-16T12:32:57.584806Z"
+ },
+ "lines_to_next_cell": 2,
+ "papermill": {
+ "duration": 0.057057,
+ "end_time": "2021-09-16T12:32:57.585292",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:57.528235",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "34d1a3d7"
+ },
+ "source": [
+ "class MyModule(nn.Module):\n",
+ " def __init__(self):\n",
+ " super().__init__()\n",
+ " # Some init for my module\n",
+ "\n",
+ " def forward(self, x):\n",
+ " # Function for performing the calculation of the module.\n",
+ " pass"
+ ],
+ "id": "34d1a3d7",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 2,
+ "papermill": {
+ "duration": 0.051843,
+ "end_time": "2021-09-16T12:32:57.689235",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:57.637392",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "04b470dc"
+ },
+ "source": [
+ "The forward function is where the computation of the module is taken place, and is executed when you call the module (`nn = MyModule(); nn(x)`).\n",
+ "In the init function, we usually create the parameters of the module, using `nn.Parameter`, or defining other modules that are used in the forward function.\n",
+ "The backward calculation is done automatically, but could be overwritten as well if wanted.\n",
+ "\n",
+ "#### Simple classifier\n",
+ "We can now make use of the pre-defined modules in the `torch.nn` package, and define our own small neural network.\n",
+ "We will use a minimal network with a input layer, one hidden layer with tanh as activation function, and a output layer.\n",
+ "In other words, our networks should look something like this:\n",
+ "\n",
+ "
\n",
+ "\n",
+ "The input neurons are shown in blue, which represent the coordinates $x_1$ and $x_2$ of a data point.\n",
+ "The hidden neurons including a tanh activation are shown in white, and the output neuron in red.\n",
+ "In PyTorch, we can define this as follows:"
+ ],
+ "id": "04b470dc"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:57.800277Z",
+ "iopub.status.busy": "2021-09-16T12:32:57.799807Z",
+ "iopub.status.idle": "2021-09-16T12:32:57.801874Z",
+ "shell.execute_reply": "2021-09-16T12:32:57.801393Z"
+ },
+ "papermill": {
+ "duration": 0.057783,
+ "end_time": "2021-09-16T12:32:57.801973",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:57.744190",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "2606872c"
+ },
+ "source": [
+ "class SimpleClassifier(nn.Module):\n",
+ " def __init__(self, num_inputs, num_hidden, num_outputs):\n",
+ " super().__init__()\n",
+ " # Initialize the modules we need to build the network\n",
+ " self.linear1 = nn.Linear(num_inputs, num_hidden)\n",
+ " self.act_fn = nn.Tanh()\n",
+ " self.linear2 = nn.Linear(num_hidden, num_outputs)\n",
+ "\n",
+ " def forward(self, x):\n",
+ " # Perform the calculation of the model to determine the prediction\n",
+ " x = self.linear1(x)\n",
+ " x = self.act_fn(x)\n",
+ " x = self.linear2(x)\n",
+ " return x"
+ ],
+ "id": "2606872c",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.051977,
+ "end_time": "2021-09-16T12:32:57.905865",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:57.853888",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "708b426d"
+ },
+ "source": [
+ "For the examples in this notebook, we will use a tiny neural network with two input neurons and four hidden neurons.\n",
+ "As we perform binary classification, we will use a single output neuron.\n",
+ "Note that we do not apply a sigmoid on the output yet.\n",
+ "This is because other functions, especially the loss, are more efficient and precise to calculate on the original outputs instead of the sigmoid output.\n",
+ "We will discuss the detailed reason later."
+ ],
+ "id": "708b426d"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:58.014018Z",
+ "iopub.status.busy": "2021-09-16T12:32:58.013529Z",
+ "iopub.status.idle": "2021-09-16T12:32:58.016608Z",
+ "shell.execute_reply": "2021-09-16T12:32:58.016147Z"
+ },
+ "papermill": {
+ "duration": 0.058322,
+ "end_time": "2021-09-16T12:32:58.016706",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:57.958384",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "f8c99074",
+ "outputId": "472ac145-a50c-42e7-f25f-c05cc72be78c"
+ },
+ "source": [
+ "model = SimpleClassifier(num_inputs=2, num_hidden=4, num_outputs=1)\n",
+ "# Printing a module shows all its submodules\n",
+ "print(model)"
+ ],
+ "id": "f8c99074",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "SimpleClassifier(\n",
+ " (linear1): Linear(in_features=2, out_features=4, bias=True)\n",
+ " (act_fn): Tanh()\n",
+ " (linear2): Linear(in_features=4, out_features=1, bias=True)\n",
+ ")\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.0523,
+ "end_time": "2021-09-16T12:32:58.121280",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:58.068980",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "7b432a95"
+ },
+ "source": [
+ "Printing the model lists all submodules it contains.\n",
+ "The parameters of a module can be obtained by using its `parameters()` functions, or `named_parameters()` to get a name to each parameter object.\n",
+ "For our small neural network, we have the following parameters:"
+ ],
+ "id": "7b432a95"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:58.230366Z",
+ "iopub.status.busy": "2021-09-16T12:32:58.229902Z",
+ "iopub.status.idle": "2021-09-16T12:32:58.232813Z",
+ "shell.execute_reply": "2021-09-16T12:32:58.232352Z"
+ },
+ "papermill": {
+ "duration": 0.059317,
+ "end_time": "2021-09-16T12:32:58.232913",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:58.173596",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "52c7230d",
+ "outputId": "2333aa10-d564-4873-a505-3f73dbf6f76e"
+ },
+ "source": [
+ "for name, param in model.named_parameters():\n",
+ " print(f\"Parameter {name}, shape {param.shape}\")"
+ ],
+ "id": "52c7230d",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Parameter linear1.weight, shape torch.Size([4, 2])\n",
+ "Parameter linear1.bias, shape torch.Size([4])\n",
+ "Parameter linear2.weight, shape torch.Size([1, 4])\n",
+ "Parameter linear2.bias, shape torch.Size([1])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.053627,
+ "end_time": "2021-09-16T12:32:58.340801",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:58.287174",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "b2650ef9"
+ },
+ "source": [
+ "Each linear layer has a weight matrix of the shape `[output, input]`, and a bias of the shape `[output]`.\n",
+ "The tanh activation function does not have any parameters.\n",
+ "Note that parameters are only registered for `nn.Module` objects that are direct object attributes, i.e. `self.a = ...`.\n",
+ "If you define a list of modules, the parameters of those are not registered for the outer module and can cause some issues when you try to optimize your module.\n",
+ "There are alternatives, like `nn.ModuleList`, `nn.ModuleDict` and `nn.Sequential`, that allow you to have different data structures of modules.\n",
+ "We will use them in a few later tutorials and explain them there."
+ ],
+ "id": "b2650ef9"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.052923,
+ "end_time": "2021-09-16T12:32:58.446527",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:58.393604",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "463b7836"
+ },
+ "source": [
+ "### The data\n",
+ "\n",
+ "PyTorch also provides a few functionalities to load the training and\n",
+ "test data efficiently, summarized in the package `torch.utils.data`."
+ ],
+ "id": "463b7836"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "papermill": {
+ "duration": 0.052877,
+ "end_time": "2021-09-16T12:32:58.552525",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:58.499648",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "0ab84d11"
+ },
+ "source": [
+ ""
+ ],
+ "id": "0ab84d11",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.052744,
+ "end_time": "2021-09-16T12:32:58.658400",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:58.605656",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "21c14544"
+ },
+ "source": [
+ "The data package defines two classes which are the standard interface for handling data in PyTorch: `data.Dataset`, and `data.DataLoader`.\n",
+ "The dataset class provides an uniform interface to access the\n",
+ "training/test data, while the data loader makes sure to efficiently load\n",
+ "and stack the data points from the dataset into batches during training."
+ ],
+ "id": "21c14544"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.055595,
+ "end_time": "2021-09-16T12:32:58.767233",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:58.711638",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "fb7ac0a3"
+ },
+ "source": [
+ "#### The dataset class\n",
+ "\n",
+ "The dataset class summarizes the basic functionality of a dataset in a natural way.\n",
+ "To define a dataset in PyTorch, we simply specify two functions: `__getitem__`, and `__len__`.\n",
+ "The get-item function has to return the $i$-th data point in the dataset, while the len function returns the size of the dataset.\n",
+ "For the XOR dataset, we can define the dataset class as follows:"
+ ],
+ "id": "fb7ac0a3"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:58.880519Z",
+ "iopub.status.busy": "2021-09-16T12:32:58.880040Z",
+ "iopub.status.idle": "2021-09-16T12:32:58.881657Z",
+ "shell.execute_reply": "2021-09-16T12:32:58.882056Z"
+ },
+ "papermill": {
+ "duration": 0.061328,
+ "end_time": "2021-09-16T12:32:58.882175",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:58.820847",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "85adf0a4"
+ },
+ "source": [
+ "\n",
+ "\n",
+ "class XORDataset(data.Dataset):\n",
+ " def __init__(self, size, std=0.1):\n",
+ " \"\"\"\n",
+ " Inputs:\n",
+ " size - Number of data points we want to generate\n",
+ " std - Standard deviation of the noise (see generate_continuous_xor function)\n",
+ " \"\"\"\n",
+ " super().__init__()\n",
+ " self.size = size\n",
+ " self.std = std\n",
+ " self.generate_continuous_xor()\n",
+ "\n",
+ " def generate_continuous_xor(self):\n",
+ " # Each data point in the XOR dataset has two variables, x and y, that can be either 0 or 1\n",
+ " # The label is their XOR combination, i.e. 1 if only x or only y is 1 while the other is 0.\n",
+ " # If x=y, the label is 0.\n",
+ " data = torch.randint(low=0, high=2, size=(self.size, 2), dtype=torch.float32)\n",
+ " label = (data.sum(dim=1) == 1).to(torch.long)\n",
+ " # To make it slightly more challenging, we add a bit of gaussian noise to the data points.\n",
+ " data += self.std * torch.randn(data.shape)\n",
+ "\n",
+ " self.data = data\n",
+ " self.label = label\n",
+ "\n",
+ " def __len__(self):\n",
+ " # Number of data point we have. Alternatively self.data.shape[0], or self.label.shape[0]\n",
+ " return self.size\n",
+ "\n",
+ " def __getitem__(self, idx):\n",
+ " # Return the idx-th data point of the dataset\n",
+ " # If we have multiple things to return (data point and label), we can return them as tuple\n",
+ " data_point = self.data[idx]\n",
+ " data_label = self.label[idx]\n",
+ " return data_point, data_label"
+ ],
+ "id": "85adf0a4",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "papermill": {
+ "duration": 0.053132,
+ "end_time": "2021-09-16T12:32:58.988271",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:58.935139",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "82143473"
+ },
+ "source": [
+ "Let's try to create such a dataset and inspect it:"
+ ],
+ "id": "82143473"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:59.100298Z",
+ "iopub.status.busy": "2021-09-16T12:32:59.099829Z",
+ "iopub.status.idle": "2021-09-16T12:32:59.103186Z",
+ "shell.execute_reply": "2021-09-16T12:32:59.103565Z"
+ },
+ "papermill": {
+ "duration": 0.059959,
+ "end_time": "2021-09-16T12:32:59.103683",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:59.043724",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "d35a9331",
+ "outputId": "6af17e28-2bc7-4324-b13d-8a92bfd9508c"
+ },
+ "source": [
+ "dataset = XORDataset(size=200)\n",
+ "print(\"Size of dataset:\", len(dataset))\n",
+ "print(\"Data point 0:\", dataset[0])"
+ ],
+ "id": "d35a9331",
+ "execution_count": null,
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Size of dataset: 200\n",
+ "Data point 0: (tensor([0.9632, 0.1117]), tensor(1))\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lines_to_next_cell": 2,
+ "papermill": {
+ "duration": 0.053101,
+ "end_time": "2021-09-16T12:32:59.210237",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:59.157136",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "f8eeb814"
+ },
+ "source": [
+ "To better relate to the dataset, we visualize the samples below."
+ ],
+ "id": "f8eeb814"
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:59.324080Z",
+ "iopub.status.busy": "2021-09-16T12:32:59.323610Z",
+ "iopub.status.idle": "2021-09-16T12:32:59.325640Z",
+ "shell.execute_reply": "2021-09-16T12:32:59.325245Z"
+ },
+ "papermill": {
+ "duration": 0.060548,
+ "end_time": "2021-09-16T12:32:59.325755",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:59.265207",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "40b4cbff"
+ },
+ "source": [
+ "def visualize_samples(data, label):\n",
+ " if isinstance(data, torch.Tensor):\n",
+ " data = data.cpu().numpy()\n",
+ " if isinstance(label, torch.Tensor):\n",
+ " label = label.cpu().numpy()\n",
+ " data_0 = data[label == 0]\n",
+ " data_1 = data[label == 1]\n",
+ "\n",
+ " plt.figure(figsize=(4, 4))\n",
+ " plt.scatter(data_0[:, 0], data_0[:, 1], edgecolor=\"#333\", label=\"Class 0\")\n",
+ " plt.scatter(data_1[:, 0], data_1[:, 1], edgecolor=\"#333\", label=\"Class 1\")\n",
+ " plt.title(\"Dataset samples\")\n",
+ " plt.ylabel(r\"$x_2$\")\n",
+ " plt.xlabel(r\"$x_1$\")\n",
+ " plt.legend()"
+ ],
+ "id": "40b4cbff",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2021-09-16T12:32:59.448747Z",
+ "iopub.status.busy": "2021-09-16T12:32:59.448284Z",
+ "iopub.status.idle": "2021-09-16T12:32:59.938181Z",
+ "shell.execute_reply": "2021-09-16T12:32:59.938567Z"
+ },
+ "papermill": {
+ "duration": 0.560114,
+ "end_time": "2021-09-16T12:32:59.938710",
+ "exception": false,
+ "start_time": "2021-09-16T12:32:59.378596",
+ "status": "completed"
+ },
+ "tags": [],
+ "id": "44e7f18f",
+ "outputId": "f35672d9-f0e6-43e0-b163-e35e481a29a5"
+ },
+ "source": [
+ "visualize_samples(dataset.data, dataset.label)\n",
+ "plt.show()"
+ ],
+ "id": "44e7f18f",
+ "execution_count": null,
+ "outputs": [
+ {
+ "data": {
+ "application/pdf": "\n",
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "