Skip to content

cm2435/gpt_from_scratch

Repository files navigation

Building GPT-2 from Scratch: An Hands on Guide

Introduction

Welcome, Xapien Engineers! We're excited to announce a hands-on seminar series aimed at building a working version of GPT-2 from scratch using PyTorch. This seminar will be the longest and most challenging project we've tackled, but it promises to be the most rewarding and insightful experience for our reading group.

Project Overview

Over the next several weeks, we will:

  1. Learn how to implement self-attention (Weeks 1-2)
  2. Write layer normalization functions and a multi-layer perceptron (Week 3)
  3. Develop a simple byte-pair encoding tokenizer (Week 4)
  4. Create a PyTorch training loop to pre-train our model on a text corpus (Weeks 5-6)

Seminar Structure

Audience

This seminar is designed for engineers and researchers interested in understanding the inner workings of transformer models. While the material is complex, we will provide ample support and resources to help everyone succeed.

Format

  • Time: Sessions will be held weekly from 1 PM to 2 PM.
  • Method: Each session will involve cloning a tutorial repository and implementing code based on provided guidance.
  • Support: The seminar will be structured like a university lab, with reference guides and in-person assistance available.

Schedule

  • Weeks 1-2: Implementing causal-self-attention (the most challenging part!)
  • Week 3: Layer normalization and multi-layer perceptron
  • Week 4: Byte-pair encoding tokenizer
  • Weeks 5-6: PyTorch training loop and integration

Goals

Our primary goal is to build a functional GPT-2 model from scratch and pre-train it on a small text corpus. This hands-on experience will deepen our understanding of transformer models and PyTorch.

Preparation

Recommended Reading

To get the most out of this seminar, we recommend familiarizing yourself with the following resources:

Prerequisites

  • Basic understanding of transformers and their components
  • Knowledge of causal attention
  • Familiarity with PyTorch and writing layers in PyTorch

Project Details

Implementation Strategy

We will adopt a "fill-in-the-blanks" approach, where you will complete code snippets with provided references and support. This method ensures that everyone can participate, regardless of their prior experience with PyTorch or mathematical concepts.

Project Details

Lab notes

Inside of notes will be a weekly uploaded markdown file giving a set of potential pointers to accompany the in person lab. Read them as your enter that part of the project!

Reference Implementation

Much of the implementation will be adapted from Karpathy's nanoGPT, with modifications to include unit tests and make the content more accessible.

Collaboration

Feel free to collaborate, ask questions, and share your progress. Missing a session is okay, but catching up will help you build a comprehensive understanding of the project.

Conclusion

We're excited to embark on this journey of building and understanding GPT-2 together. This seminar will be a significant learning opportunity, and we look forward to seeing the amazing work you'll produce.

Let's build something cool!


Contact: If you have any questions or need further assistance, please reach out to Charlie by raising an issue on this branch!

Happy coding!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages