implements https://arxiv.org/pdf/2212.09748 in a simple, clean, and minimal way. uses a adaLN-Zero variant of the transformer block in the DiT. useful for practice not for implementation.
- patchify image is split into patches
- position embedding learnable position embedding is added to the patches
- transformer patch tokens are passed through transformer encoder
- decoder reconstructs image from the next token patch tokens
- diffusion noise is added to the image and model learns to denoise it at each step
- train.py: contains training loop for the DiT model
- model.py: implements DiT (Diffusion Transformer) model
- transformer.py: defines TransformerBlock, SelfAttention, and LayerNorm
- diffusion.py: defines diffusion process for the model