MINI-GPT2

Personal repo for follow-along with Andrej Karapathy's nano-GPT2. In this repo I have tried to implement alll corrections from errata and also several elements from PR (e.g. shuffling fineweb to avoid periodization issue)

In this repo we build the library nano-GPT together with Andrej Karpathy and his video GPT2 - the movie. We managed to replicate the model and gain significatn efficieny improvements during training:

We also adressed several of the issues Andrej had throughout the video including:

Seasonality in training data
Unable to use torch.compile while doing HellaSwag evals in the training loop
More aggressive learning rate and schema
Enabled resuming training based on last model checkpoint
All the issues mentioned in Andrej's Errata in his repo
Several other improvements suggested in PRs in Andrej's repo

During training i managed to get a dt of ~ 0.34 per step and processed ~ 1.5M tokens per second. I trained the model for 1 epoch (~10B tokens) in under 2 hrs on 8 A100 (80 GB SXM4) GPUs. This gave me a min training loss of 2.84857, min validation loss of 3.0383 and a max HellaSwag eval of 0.3101. Model checkpoints can be provided upon request

Learn from my mistakes: download the fineweb data and upload it to the cloud (e.g. Google Cloud). You can then download the training data form your cloud when in the GPU cluster instead of running the fineweb script (which can run for up to 1 hr depending on latency / intenet speed). This would have saved me around 15$.

Instructions on setting up lambda cluster

Make sure to have your conda env config file (environment.yml) up to date.
Initialize a GPU cluster on Lambda websites
Hop on a terminal and SSH into the cluster (on Windows):

ssh -i C:/Users/<USERNAME>/.ssh/LAMBDA_SSH.pem ubuntu@<XXX.XXX.XXX.XXX>

Clone this repo:

git clone https://github.com/victorbjorsvik/minigpt.git

Install MiniConda:

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

sh Miniconda3-latest-Linux-x86_64.sh

source ~/.bashrc

Create the environment:

conda env create -f environment.yml

Rock n' Roll

Instructions on setting up Google Cloud CLI (for uploading model states and training data)

Link to google cloud storage

Check that Ubuntu is up to date and has necessary packages:

sudo apt-get update

sudo apt-get install apt-transport-https ca-certificates gnupg curl

Import Google Cloud public key:

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg

Add the gcloud CLI distribution URI as a package source

echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list

Update and install the gcloud CLI:

sudo apt-get update && sudo apt-get install google-cloud-cli

Run gcloud init to get started

gcloud init

This step will prompt you to autorize with your google account via browser. Upon verification you will receive a verification code that you can input in the terminal. After this step you will be able to transfer files from your local directory to a google cloud storage destination e.g.:

gcloud storage cp <file_to_transfer> gs://<destination_bucket_name>

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
gpt		gpt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
eval.png		eval.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MINI-GPT2

Instructions on setting up lambda cluster

Instructions on setting up Google Cloud CLI (for uploading model states and training data)

About

Releases

Packages

Languages

License

victorbjorsvik/minigpt

Folders and files

Latest commit

History

Repository files navigation

MINI-GPT2

Instructions on setting up lambda cluster

Instructions on setting up Google Cloud CLI (for uploading model states and training data)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages