Skip to content

This repo is for rebuilding the openai GPT2(124M) model

Notifications You must be signed in to change notification settings

yebyyy/Reproduce-GPT2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reproduce-GPT2

This repo is for rebuilding the OpenAi GPT2(124M) model.

As a Decoder-only Transformer model, the structure of GPT follows the decoder structure as in the Attention is All You Need Paper.

The original OpenAI Blog Post can be found at Better language models and their implications, which links to the paper Language Models are Unsupervised Multitask Learners and the github repo gpt2.

Besides the GPT2 paper, this repo also references the GPT3 paper, which is the Language Models are Few-Shot Learners.

The model training, optimization, and hyperparameter tuning follows both papers and implements Flash Attention as mentioned in both FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness and FlashAttention-2:Faster Attention with Better Parallelism and Work Partitioning.

The Dataset used to train the model is the 10BT sample from Huggingface Fineweb-Edu.

The model evaluation uses the Hellaswag LLM Benchmark, and achieved 30.68% Accuracy, which is 1.13% higher than the GPT2(124M) model with only 10% size training dataset.

Since this repo uses Pytorch, the huggingface GPT2 implementation is also referenced.

About

This repo is for rebuilding the openai GPT2(124M) model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published