Reproduce-GPT2

This repo is for rebuilding the OpenAi GPT2(124M) model.

As a Decoder-only Transformer model, the structure of GPT follows the decoder structure as in the Attention is All You Need Paper.

The original OpenAI Blog Post can be found at Better language models and their implications, which links to the paper Language Models are Unsupervised Multitask Learners and the github repo gpt2.

Besides the GPT2 paper, this repo also references the GPT3 paper, which is the Language Models are Few-Shot Learners.

The model training, optimization, and hyperparameter tuning follows both papers and implements Flash Attention as mentioned in both FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness and FlashAttention-2:Faster Attention with Better Parallelism and Work Partitioning.

The Dataset used to train the model is the 10BT sample from Huggingface Fineweb-Edu.

The model evaluation uses the Hellaswag LLM Benchmark, and achieved 30.68% Accuracy, which is 1.13% higher than the GPT2(124M) model with only 10% size training dataset.

Since this repo uses Pytorch, the huggingface GPT2 implementation is also referenced.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.gitignore		.gitignore
GPT2_Lecture.ipynb		GPT2_Lecture.ipynb
README.md		README.md
fineweb.py		fineweb.py
hellaswag.py		hellaswag.py
input.txt		input.txt
train_gpt2.py		train_gpt2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproduce-GPT2

About

Releases

Packages

Languages

yebyyy/Reproduce-GPT2

Folders and files

Latest commit

History

Repository files navigation

Reproduce-GPT2

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages