wandb · tcapelle · May 3, 2024 · May 3, 2024
diff --git a/colabs/torchtune/torchtune_and_wandb.ipynb b/colabs/torchtune/torchtune_and_wandb.ipynb
@@ -0,0 +1,257 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/wandb/examples/blob/master/colabs/torchtune/torchtune_and_wandb.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
+    "<!--- @wandbcode{torchtune-colab} -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"http://wandb.me/logo-im-png\" width=\"400\" alt=\"Weights & Biases\" />\n",
+    "<!--- @wandbcode{torchtune-colab} -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Getting Started with torchtune and Weigths & Biases\n",
+    "\n",
+    "In this notebook you will learn how to use [torchtune](https://github.com/pytorch/torchtune) with [Weights & Biases](https://wandb.ai) to monitor your training runs."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> You need to select a machine a GPU, go to Runtime > Change runtime type > select a GPU (L40, A100 ideally)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup the libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!git clone https://github.com/pytorch/torchtune"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cd torchtune/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!python -m pip install -qqq \".[dev]\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import wandb\n",
+    "wandb.login()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Download a Model\n",
+    "We will download a model from the Hugging Face Hub.\n",
+    "> you will need to provide an access token or call `huggingface-cli login`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Download a model checkpoint using the provided `tune download` CLI\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!tune download mistralai/Mistral-7B-v0.1 \\\n",
+    "    --output-dir /tmp/Mistral-7B-v0.1/ \\\n",
+    "    --hf-token <HF_TOKEN>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's create a torchtune config that enables W&B, to do so, we can grab the original Mistral 7B LoRA recipe and change the following lines to use W&B as our `metric_logger`:\n",
+    "```yaml\n",
+    "# Logging\n",
+    "metric_logger:\n",
+    "  _component_: torchtune.utils.metric_logging.WandBLogger # <---You only need this to enable W&B\n",
+    "  project: mistral_lora # <--- The W&B project to save our logs to\n",
+    "log_every_n_steps: 1\n",
+    "\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's save a modified version of the recipe using `%%writefile`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile mistral_wandb_lora.yaml\n",
+    "tokenizer:\n",
+    "  _component_: torchtune.models.mistral.mistral_tokenizer\n",
+    "  path: /tmp/Mistral-7B-v0.1/tokenizer.model\n",
+    "\n",
+    "# Dataset\n",
+    "dataset:\n",
+    "  _component_: torchtune.datasets.alpaca_dataset\n",
+    "  train_on_input: True\n",
+    "seed: null\n",
+    "shuffle: True\n",
+    "\n",
+    "# Model Arguments\n",
+    "model:\n",
+    "  _component_: torchtune.models.mistral.lora_mistral_7b\n",
+    "  lora_attn_modules: ['q_proj', 'k_proj', 'v_proj']\n",
+    "  apply_lora_to_mlp: True\n",
+    "  apply_lora_to_output: True\n",
+    "  lora_rank: 64\n",
+    "  lora_alpha: 16\n",
+    "\n",
+    "checkpointer:\n",
+    "  _component_: torchtune.utils.FullModelHFCheckpointer\n",
+    "  checkpoint_dir: /tmp/Mistral-7B-v0.1\n",
+    "  checkpoint_files: [\n",
+    "    pytorch_model-00001-of-00002.bin,\n",
+    "    pytorch_model-00002-of-00002.bin\n",
+    "  ]\n",
+    "  recipe_checkpoint: null\n",
+    "  output_dir: /tmp/Mistral-7B-v0.1\n",
+    "  model_type: MISTRAL\n",
+    "resume_from_checkpoint: False\n",
+    "\n",
+    "optimizer:\n",
+    "  _component_: torch.optim.AdamW\n",
+    "  lr: 2e-5\n",
+    "\n",
+    "lr_scheduler:\n",
+    "  _component_: torchtune.modules.get_cosine_schedule_with_warmup\n",
+    "  num_warmup_steps: 100\n",
+    "\n",
+    "loss:\n",
+    "  _component_: torch.nn.CrossEntropyLoss\n",
+    "\n",
+    "# Fine-tuning arguments\n",
+    "batch_size: 2\n",
+    "epochs: 1\n",
+    "max_steps_per_epoch: 100\n",
+    "gradient_accumulation_steps: 2\n",
+    "compile: False\n",
+    "\n",
+    "# Training env\n",
+    "device: cuda\n",
+    "\n",
+    "# Memory management\n",
+    "enable_activation_checkpointing: True\n",
+    "\n",
+    "# Reduced precision\n",
+    "dtype: bf16\n",
+    "############################### Enable W&B #####################################\n",
+    "################################################################################\n",
+    "# Logging\n",
+    "metric_logger:\n",
+    "  _component_: torchtune.utils.metric_logging.WandBLogger # <---You only need this to enable W&B\n",
+    "  project: mistral_lora # <--- The W&B project to save our logs to\n",
+    "log_every_n_steps: 1\n",
+    "################################################################################\n",
+    "################################################################################\n",
+    "output_dir: /tmp/Mistral-7B-v0.1\n",
+    "log_peak_memory_stats: False\n",
+    "\n",
+    "# Profiler (disabled)\n",
+    "profiler:\n",
+    "  _component_: torchtune.utils.profiler\n",
+    "  enabled: False"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's run the recipe with this modified config with W&B enabled"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!tune run lora_finetune_single_device --config mistral_wandb_lora.yaml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "That's it! Now you can click on the URL and continue monitoring your training on the Weights & Biases UI"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "include_colab_link": true,
+   "provenance": [],
+   "toc_visible": true
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}