Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train a custom CLIP with DeepSpeed CPU offload, 16 bit precision #388

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

afiaka87
Copy link
Contributor

@afiaka87 afiaka87 commented Nov 20, 2021

(disclaimer): this is code for training a custom CLIP from the repository here, not the one in the OpenAI repo. For something like that I recommend open_clip. There are valid concerns about the effectiveness of a CLIP trained with a low batch size as the retrieval task has far less context to work with. Food for thought.

There's plenty left to do to make this as robust as the other training scripts, but if you have deepspeed working, this should work now with far fewer caveats than DALL-e. I trained a small CLIP last night on COCO using 16 bit precision, deepspeed stage 3 and cpu offload for both params and the optimizer. I havent done many rigorous comparisons but I was able to actually use my computer while training with it due to cpu offload, which was refreshing.

weights and biases workspace:
https://wandb.ai/dalle-pytorch-replicate/dalle_train_clip_report

I'll be busy for the holidays most likely so won't have time to implement everything else, but it's mostly just copying from the work done in previous contributions in train_dalle.py/train_vae.py. I suspect @janEbert was responsible for ensuring external parameters were flagged for deepspeed in @lucidrains CLIP implementation?

There are likely to be errors as well and there's probably a few things missing from the CLIP paper. I think they clamped their logits to ln(2) or similar - not sure if we're doing that.

to run with deepspeed, bite the bullet and setup a docker container targeting pytorch=1.7.1, cuda=10.2. Conda works too - make sure you set your python=3.7 as there are issues with >3.7. There's no guarantee that fused operations will run on any particular GPU, even with a docker container, and indeed the only officially supported ones are the V100 and A100. If you see an error about failed JIT compilation - that may be the reason.

run_train_clip.sh

#!/bin/bash

deepspeed train_clip.py --dataset_dir=/mnt/evo_internal_1TB/DATASETS/COCO \
    --epochs=200 \
    --batch_size=128 \
    --learning_rate=0.004 \
    --clip_grad_norm=1.0 \
    --resize_ratio=0.8 \
    --truncate_captions=True \
    --save_every_n_steps=1000 \
    --log_frequency=10 \
    --clip_output_file_name=clip_latest.pt \
    --dim_text=128 \
    --dim_image=128 \
    --dim_latent=256 \
    --text_enc_depth=6 \
    --text_seq_len=128 \
    --text_heads=8 \
    --num_visual_tokens=256 \
    --visual_enc_depth=6 \
    --visual_heads=8 \
    --visual_image_size=128 \
    --visual_patch_size=16 \
    --channels=3 \
    --num_workers=24 \
    --fp16=True \
    --distributed_backend=deepspeed

After training has finished, you can create a 32-bit pytorch checkpoint by opening the checkpoint directory:

cd checkpoints 
cp globalstep_99999/convert_to_fp32.py . # desired step, usually the biggest number
python convert_to_fp32.py globalstep_99999 my_normal_pytorch_ckpt.bin

…eed stage 3 round robin/gradient accumulate/cpu offload, 16-bit precision, WarmupLRDecay init, wandb logging, argparsing
@afiaka87 afiaka87 changed the title (custom_clip) create train_clip.py - image text folder loader, deepsp… Train a DALLE-pytorch CLIP with CPU offload, 16 bit precision Nov 20, 2021
@afiaka87 afiaka87 changed the title Train a DALLE-pytorch CLIP with CPU offload, 16 bit precision Train a custom CLIP with DeepSpeed CPU offload, 16 bit precision Nov 20, 2021
@janEbert
Copy link
Contributor

Hi! About the external parameters, I looked through the model and don't think anything needs to be registered, so that should be all good. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants