Skip to content

Latest commit

 

History

History
152 lines (101 loc) · 7.45 KB

README.md

File metadata and controls

152 lines (101 loc) · 7.45 KB

Converting PyTorch 2 Lightning Examples

The repository will show you how to:

  • Convert a pure PyTorch Convolutional Neural Network Classifier trained on MNIST to PyTorch Lightning.
  • Extend Pure PyTorch trivially with Lightning best practice features.
  • Seamlessly scale your training in the cloud with Grid.ai - No code changes.
  • Learn about Lighting Flash and its 15+ production ready tasks.

Find below PyTorch Community Voices | PyTorch Lightning | William Falcon & Thomas Chaton presenting this repository.

Alt text

Bare MNIST Classifier

Minst Dataset

Add DDP Support

Add DDP Spawn Support

Add Accumulated Gradients Support

Add Profiling Support

Add DeepSpeed, FSDP, Multiple Loggers, Mutliple Profilers, TorchScript, Loop Customization, Fault Tolerant Training, etc ....

  • PyTorch | requires a huge number of addtional lines. You definitely do not want to do that 😫
  • PyTorch Lightning | Still ~ 106 lines. Let's keep it simple. 🚀

Learn more with Lighting Docs.

PyTorch Lightning 1.4 is out ! Here is our CHANGELOG.

Don't forget to ⭐ PyTorch Lightning.

Training on Grid.ai

Grid.ai is a ML Platform from the creators of PyTorch Lightning that enables you to train Machine Learning code without worrying about infrastructure.

Learn more with Grid.ai Docs

1. Install Lightning-Grid

pip install lightning-grid --upgrade

2. SEAMLESSLY TRAIN 100s OF MACHINE LEARNING MODELS ON THE CLOUD FROM YOUR LAPTOP - NO CODE CHANGES

grid run --instance_type 4_M60_8gb ddp_mnist_grid/lightning.py --trainer.max_epochs 2 --trainer.gpus 4 --trainer.accelerator ddp

With Grid DataStores, low-latency, highly-scalable auto-versioned dataset.

grid datastore create --name mnist --source data
grid run --instance_type 4_M60_8gb --datastore_name mnist --datastore_mount_dir data ddp_mnist_grid/lightning.py  --trainer.max_epochs 2 --trainer.gpus 4 --trainer.accelerator ddp

Pure PyTorch:

grid datastore create --name mnist --source data
grid run --instance_type g4dn.xlarge --gpus 2 ddp_mnist_grid/boring_pytorch.py

Add --use_spot to use interruptible machines.

Grid.ai makes scaling multi node training easy 🚀 Train on 2+ nodes with 4 GPUS using DDP Sharded 🔥

grid run --instance_type 4_M60_8gb --gpus 8 --datastore_name mnist --datastore_mount_dir data  ddp_mnist_grid/lightning.py  --trainer.max_epochs 2 --trainer.num_nodes 2 --trainer.gpus 4 --trainer.accelerator ddp_sharded

Train Andrej Karpathy minGPT converted to PyTorch Lightning by @williamFalcon and bencharmked with DeepSpeed by @SeanNaren

git clone https://github.com/SeanNaren/minGPT.git
git checkout benchmark
grid run --instance_type g4dn.12xlarge --gpus 8 benchmark.py --n_layer 6 --n_head 16 --n_embd 2048 --gpus 4 --num_nodes 2 --precision 16 --batch_size 32 --plugins deepspeed_stage_3

Learn how to scale your scripts with PyTorch Lighting + DeepSpeed

Lighting Flash is collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning built on top of PyTorch Lightning.

Train a PyTorchVideo Classifier with Lighting Flash. Check out Grid.ai reproducible button: Grid

import os

import flash
from flash.core.data.utils import download_data
from flash.video import VideoClassificationData, VideoClassifier

# 1. Create the DataModule
# Find more datasets at https://pytorchvideo.readthedocs.io/en/latest/data.html
download_data("https://pl-flash-data.s3.amazonaws.com/kinetics.zip", "./data")

datamodule = VideoClassificationData.from_folders(
    train_folder=os.path.join(os.getcwd(), "data/kinetics/train"),
    val_folder=os.path.join(os.getcwd(), "data/kinetics/val"),
    clip_sampler="uniform",
    clip_duration=1,
    decode_audio=False,
)

# 2. Build the task
model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes, pretrained=False)

# 3. Create the trainer and finetune the model
trainer = flash.Trainer(max_epochs=3)
trainer.finetune(model, datamodule=datamodule, strategy="freeze")

# 4. Make a prediction
predictions = model.predict(os.path.join(os.getcwd(), "data/kinetics/predict"))
print(predictions)

# 5. Save the model!
trainer.save_checkpoint("video_classification.pt")

Credits

Credit to PyTorch Team for providing the Bare Mnist example.

Credit to Andrej Karpathy for providing an implementation of minGPT.

Troubleshooting

Kill ddp processes

sudo kill -9 $(ps -aef | grep -i 'ddp' | grep -v 'grep' | awk '{ print $2 }')