Skip to content

Commit

Permalink
Fixes per comments (NVIDIA#11280)
Browse files Browse the repository at this point in the history
* Fixes per comments

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>

* Update README

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>

---------

Signed-off-by: Gomathy Venkata Krishnan <[email protected]>
  • Loading branch information
gvenkatakris authored and HuiyingLi committed Nov 15, 2024
1 parent 23ec75c commit d8f3e9b
Show file tree
Hide file tree
Showing 10 changed files with 65 additions and 69 deletions.
4 changes: 2 additions & 2 deletions tutorials/llm/llama-3/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Getting Started with Llama 3 and Llama 3.1
==========================================

This repository contains jupyter notebook tutorials using NeMo Framework for Llama-3 and Llama-3.1 models by Meta.
This repository contains Jupyter Notebook tutorials using the NeMo Framework for Llama-3 and Llama-3.1 models by Meta.

.. list-table::
:widths: 100 25 100
Expand All @@ -16,7 +16,7 @@ This repository contains jupyter notebook tutorials using NeMo Framework for Lla
- Perform LoRA PEFT on Llama 3 8B Instruct using a dataset for bio-medical domain question answering. Deploy multiple LoRA adapters with NVIDIA NIM.
* - `Llama 3.1 Law-Domain LoRA Fine-Tuning and Deployment with NeMo Framework and NVIDIA NIM <./sdg-law-title-generation>`_
- `Law StackExchange <https://huggingface.co/datasets/ymoslem/Law-StackExchange>`_
- Perform LoRA PEFT on Llama 3.1 8B Instruct using a synthetically augmented version of Law StackExchange with NeMo Framework, followed by deployment with NVIDIA NIM. As a pre-requisite, follow the tutorial for `data curation using NeMo Curator <https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation-with-sdg>`__.
- Perform LoRA PEFT on Llama 3.1 8B Instruct using a synthetically augmented version of Law StackExchange with NeMo Framework, followed by deployment with NVIDIA NIM. As a prerequisite, follow the tutorial for `data curation using NeMo Curator <https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/peft-curation-with-sdg>`_.
* - `Llama 3.1 Pruning and Distillation with NeMo Framework <./pruning-distillation>`_
- `WikiText-103-v1 <https://huggingface.co/datasets/Salesforce/wikitext/viewer/wikitext-103-v1>`_
- Perform pruning and distillation on Llama 3.1 8B using the WikiText-103-v1 dataset with NeMo Framework.
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
"The dataset has to be preprocessed using the [preprocess_data_for_megatron.py](https://github.com/NVIDIA/NeMo/blob/main/scripts/nlp_language_modeling/preprocess_data_for_megatron.py) script included in the NeMo Framework. This step will also tokenize data using the `meta-llama/Meta-Llama-3.1-8B` tokenizer model to convert the data into a memory map format.\n",
"\n",
"> `NOTE:` In the block of code below, pass the paths to your train, test and validation data files."
"> `NOTE:` In the block of code below, pass the paths to your train, test, and validation data files."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@
"metadata": {},
"source": [
"\n",
"### Step 2: Finetune the teacher on the dataset\n",
"### Step 2: Fine-tune the teacher on the dataset\n",
"\n",
"NeMo framework includes a standard python script [megatron_gpt_pretraining.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_pretraining.py) for training a model. Once you have your model downloaded and the dataset ready, fine-tuning the teacher model with NeMo is essentially just running this script!\n",
"NeMo Framework includes a standard Python script, [megatron_gpt_pretraining.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_pretraining.py), for training a model. Once you have your model downloaded and the dataset ready, fine-tuning the teacher model with NeMo is essentially just running this script!\n",
"\n",
"We finetune the unpruned model on our dataset to correct the distribution shift across the original dataset the model was trained on. Per the [blog](https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/) and [tech report](https://arxiv.org/pdf/2408.11796), experiments showed that, without correcting for the distribution shift, the teacher provides suboptimal guidance on the dataset when being distilled.\n",
"We fine-tune the unpruned model on our dataset to correct the distribution shift from the original dataset the model was trained on. According to the [blog](https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/) and [tech report](https://arxiv.org/pdf/2408.11796), experiments showed that without correcting for this distribution shift, the teacher provides suboptimal guidance on the dataset during distillation.\n",
"\n",
"For this demonstration, this training run is capped by `STEPS`, and validation is carried out every `VAL_INTERVAL` steps.\n",
"\n",
"> `NOTE:` In the block of code below, pass the paths to your pre-processed train, test and validation data files as well as path to the teacher .nemo model."
"> `NOTE:` In the block of code below, pass the paths to your pre-processed train, test, and validation data files, as well as the path to the teacher .nemo model."
]
},
{
Expand Down Expand Up @@ -124,8 +124,8 @@
"id": "3040a993-8423-475f-8bc6-d1dd1ce16a83",
"metadata": {},
"source": [
"This will create a finetuned teacher model named `megatron_llama_ft.nemo` in `./distill_trainings/megatron_llama_ft/checkpoints/`. We'll use this later.\n",
"> `NOTE:`This script takes at least 20 minutes to run (depending on GPU) and will generate the finetuned teacher model."
"This will create a fine-tuned teacher model named `megatron_llama_ft.nemo` in `./distill_trainings/megatron_llama_ft/checkpoints/`. We'll use this later.\n",
"> `NOTE:`This script takes at least 20 minutes to run (depending on GPU) and will generate the fine-tuned teacher model."
]
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
"id": "8bc99d2f-9ac6-40c2-b072-12b6cb7b9aca",
"metadata": {},
"source": [
"### Step 3: Prune the finetuned-teacher model to create a student\n",
"In this step, we will explore two methods to prune the finetuned teacher model. Refer to the ``NOTE`` in the **_step-by-step instructions_** section of [introduction.ipynb](./introduction.ipynb) to decide which pruning techniques you would like to explore.\n",
"### Step 3: Prune the fine-tuned teacher model to create a student\n",
"In this step, we will explore two methods to prune the fine-tuned teacher model. Refer to the ``NOTE`` in the **_step-by-step instructions_** section of [introduction.ipynb](./introduction.ipynb) to decide which pruning techniques you would like to explore.\n",
"\n",
"In the first method, depth-pruning, we trim the layers of the model."
]
Expand All @@ -21,7 +21,7 @@
"\n",
"Per the [blog](https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/) and [tech report](https://arxiv.org/pdf/2408.11796), removing contiguous layers from the second last block (layers 16 to 31 continuously) yields the best overall results. \n",
"\n",
"> `NOTE:` In the block of code below, pass the paths to your finetuned teacher .nemo model."
"> `NOTE:` In the block of code below, pass the paths to your fine-tuned teacher .nemo model."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
"id": "8bc99d2f-9ac6-40c2-b072-12b6cb7b9aca",
"metadata": {},
"source": [
"### Step 3: Prune the finetuned-teacher model to create a student\n",
"In the second method, we will width-prune. In width-pruning, we trim the neurons, attention heads and embedding channels. \n",
"### Step 3: Step 3: Prune the fine-tuned teacher model to create a student\n",
"In the second method, we will width-prune. In width-pruning, we trim the neurons, attention heads, and embedding channels.\n",
"\n",
"Refer to the ``NOTE`` in the **_step-by-step instructions_** section of [introduction.ipynb](./introduction.ipynb) to decide which pruning techniques you would like to explore."
]
Expand All @@ -20,15 +20,15 @@
"source": [
"#### Step 3.b.: Using width-pruning\n",
"To width-prune the model, we do the following:\n",
"- prune (trim) the MLP intermediate dimension from 14336 to 9216.\n",
"- prune the hidden size from 4096 to 3072.\n",
"- and retrain the attention headcount and number of layers\n",
"- Prune (trim) the MLP intermediate dimension from 14336 to 9216.\n",
"- Prune the hidden size from 4096 to 3072.\n",
"- Retrain the attention headcount and number of layers\n",
"\n",
"For width-pruning we will use the [megatron_gpt_prune.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_prune.py) script in the NeMo Framework. To see the detailed list of parameters for width-pruning, you can view the [megatron_gpt_prune.yaml](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml) file.\n",
"For width-pruning, we will use the [megatron_gpt_prune.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_prune.py) script in the NeMo Framework. To see the detailed list of parameters for width-pruning, you can view the [megatron_gpt_prune.yaml](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml) file.\n",
"\n",
"We use the above parameters to get a competitive model for this demonstration. You can use other strategies or parameters from the [blog](https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/) or the [tech report](https://arxiv.org/pdf/2408.11796) for your experiments. \n",
"\n",
"> `NOTE:` In the block of code below, pass the paths to your finetuned teacher .nemo model.\n",
"> `NOTE:` In the block of code below, pass the paths to your fine-tuned teacher .nemo model.\n",
"\n",
"> `TIP:` You can increase the ``batch_size`` (upto 1024) to speed up the width-pruning script execution."
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
"metadata": {},
"source": [
"### Step 4: Distill knowledge from teacher into student\n",
"Distillation of a model with NeMo Framework is also possible using a python script: [megatron_gpt_distillation.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_distillation.py). In this notebook, we will explore distillation with the depth-pruned model as the `STUDENT` model. \n",
"Distillation of a model with NeMo Framework is also possible using a Python script: [megatron_gpt_distillation.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_distillation.py). In this notebook, we will explore distillation with the depth-pruned model as the `STUDENT` model.\n",
"\n",
"For this demonstration, the `TEACHER` would be the finetuned teacher model `megatron_llama_ft.nemo` and the `STUDENT` model would be the pruned 4B model. This training run is capped by `STEPS`, and validation is carried out every `VAL_INTERVAL` steps."
"For this demonstration, the `TEACHER` would be the fine-tuned teacher model `megatron_llama_ft.nemo` and the `STUDENT` model would be the pruned 4B model. This training run is capped by `STEPS`, and validation is carried out every `VAL_INTERVAL` steps."
]
},
{
Expand All @@ -19,7 +19,7 @@
"#### Step 4.a.: Using depth-pruned student\n",
"While distilling knowledge from the teacher to depth-pruned model, the `STUDENT` model would be `4b_depth_pruned_model.nemo` as produced by the [depth-pruning](./03_a_depth_pruning.ipynb) notebook. This training run is capped by `STEPS`, and validation is carried out every `VAL_INTERVAL` steps.\n",
"\n",
"> `NOTE:` In the block of code below, pass the paths to your pre-processed train, test and validation data files as well as path to the teacher and student .nemo models."
"> `NOTE:` In the block of code below, pass the paths to your pre-processed train, test, and validation data files, as well as path to the teacher and student .nemo models."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
"metadata": {},
"source": [
"### Step 4: Distill knowledge from teacher into student\n",
"Distillation of a model with NeMo Framework is also possible using a python script: [megatron_gpt_distillation.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_distillation.py). \n",
"Distillation of a model with NeMo Framework is also possible using a Python script: [megatron_gpt_distillation.py](https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_gpt_distillation.py). \n",
"In this notebook, we will explore distillation with the width-pruned model as the `STUDENT` model.\n",
"\n",
"For this demonstration, the `TEACHER` would be the finetuned teacher model `megatron_llama_ft.nemo` and the `STUDENT` model would be the pruned 4B model. This training run is capped by `STEPS`, and validation is carried out every `VAL_INTERVAL` steps."
"For this demonstration, the `TEACHER` would be the fine-tuned teacher model `megatron_llama_ft.nemo` and the `STUDENT` model would be the pruned 4B model. This training run is capped by `STEPS`, and validation is carried out every `VAL_INTERVAL` steps."
]
},
{
Expand All @@ -20,7 +20,7 @@
"#### Step 4.b.: Using width-pruned student\n",
"While distilling knowledge from the teacher to width-pruned model, the `STUDENT` model would be `4b_width_pruned_model.nemo` as produced by the [width-pruning](./03_b_width_pruning.ipynb) notebook. This training run is capped by `STEPS`, and validation is carried out every `VAL_INTERVAL` steps.\n",
"\n",
"> `NOTE:` In the block of code below, pass the paths to your pre-processed train, test and validation data files as well as path to the teacher and student .nemo models."
"> `NOTE:` In the block of code below, pass the paths to your pre-processed train, test, and validation data files, as well as path to the teacher and student .nemo models."
]
},
{
Expand Down
31 changes: 12 additions & 19 deletions tutorials/llm/llama-3/pruning-distillation/05_display_results.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,17 @@
"### Step 5: Display the validation loss\n",
"\n",
"Now that the results are in, let's visualize the validation loss of the two distilled models using the `tensorboard` library. \n",
"> `NOTE:` This notebook demonstrates the use of the teacher finetuning, pruning and the distillation script. These scripts should ideally be run on a multi-node cluster with a larger `GLOBAL_BATCH_SIZE` and `STEPS` to see improvement in the validation loss."
"\n",
"> `NOTE:` This notebook demonstrates the use of the teacher fine-tuning, pruning, and the distillation script. These scripts should ideally be run on a multi-node cluster with a larger `GLOBAL_BATCH_SIZE` and `STEPS` to see improvement in the validation loss."
]
},
{
"cell_type": "markdown",
"id": "b5822d62-8131-4046-8c22-0bf0fce81df7",
"metadata": {},
"source": [
"#### Validation Loss using depth-pruned model as student in distillation script\n",
"Here is an image of the validation loss over 30 steps of running the training step in the distillation script when we distill the knowledge from the finetuned teacher model to the depth-pruned student."
"#### Validation Loss Using Depth-Pruned Model as Student in Distillation Script\n",
"Here is an image of the validation loss over 30 steps of running the training step in the distillation script, where we distill the knowledge from the fine-tuned teacher model to the depth-pruned student."
]
},
{
Expand All @@ -35,7 +36,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 1,
"id": "db6fcf26-8ae8-40e1-875a-0a10bf85be81",
"metadata": {
"tags": []
Expand All @@ -44,7 +45,7 @@
{
"data": {
"text/html": [
"<h5>Validation Loss over 30 Training Steps with Depth-Pruned model as Student</h5>"
"<h5>Validation Loss over 30 Training Steps with Depth-Pruned Model as Student</h5>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
Expand All @@ -68,7 +69,7 @@
],
"source": [
"from IPython.display import Image, display, HTML\n",
"title = \"Validation Loss over 30 Training Steps with Depth-Pruned model as Student\"\n",
"title = \"Validation Loss over 30 Training Steps with Depth-Pruned Model as Student\"\n",
"display(HTML(f\"<h5>{title}</h5>\"))\n",
"display(Image(url=\"https://github.com/NVIDIA/NeMo/releases/download/r2.0.0rc1/val_loss_depth_pruned_student_distillation.png\", width=400))"
]
Expand All @@ -78,8 +79,8 @@
"id": "f10041ae-6533-47de-9f76-f97d4469c27a",
"metadata": {},
"source": [
"#### Validation Loss using width-pruned model as student in distillation script\n",
"Here is an image of the validation loss over 30 steps of running the training step in the distillation script when we distill the knowledge from the finetuned teacher model to the width-pruned student."
"#### Validation Loss Using Width-Pruned Model as Student in Distillation Script\n",
"Here is an image of the validation loss over 30 steps of running the training step in the distillation script, where we distill the knowledge from the fine-tuned teacher model to the width-pruned student."
]
},
{
Expand All @@ -97,7 +98,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 2,
"id": "ecd79583-f662-40c6-a690-9f4bb847de4e",
"metadata": {
"tags": []
Expand All @@ -106,7 +107,7 @@
{
"data": {
"text/html": [
"<h5>Validation Loss over 30 Training Steps with Width-Pruned model as Student</h5>"
"<h5>Validation Loss over 30 Training Steps with Width-Pruned Model as Student</h5>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
Expand All @@ -130,18 +131,10 @@
],
"source": [
"from IPython.display import Image, display, HTML\n",
"title = \"Validation Loss over 30 Training Steps with Width-Pruned model as Student\"\n",
"title = \"Validation Loss over 30 Training Steps with Width-Pruned Model as Student\"\n",
"display(HTML(f\"<h5>{title}</h5>\"))\n",
"display(Image(url=\"https://github.com/NVIDIA/NeMo/releases/download/r2.0.0rc1/val_loss_width_pruned_student_distillation.png\", width=400))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ab6ed6f-8bc3-4188-919f-7cee842635ed",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
Loading

0 comments on commit d8f3e9b

Please sign in to comment.