OOM when using "stabilityai/stable-diffusion-2-1" with batch size of 2 #24

xilongzhou · 2024-05-01T05:52:02Z

Hi,

Thanks for sharing the code!

I am using your code and fine-tuning this model stabilityai/stable-diffusion-2-1, I choose aesthetic, I have set Lora=True also. But the training is very memory intensive and in 80GB A100 it cannot even fit batch size of 2 per GPUs. I always have OOM error. Below are my settings:

config = compressibility()
config.project_name = "ddpo-aesthetic"
config.pretrained.model = "stabilityai/stable-diffusion-2-1"

config.num_epochs = 20000
config.reward_fn = "aesthetic_score"

# the DGX machine I used had 8 GPUs, so this corresponds to 8 * 8 * 4 = 256 samples per epoch.
config.sample.batch_size = 2
config.sample.num_batches_per_epoch = 1

# this corresponds to (8 * 4) / (4 * 2) = 4 gradient updates per epoch.
config.train.batch_size = 2
config.train.gradient_accumulation_steps = 1

config.prompt_fn = "simple_animals"
config.per_prompt_stat_tracking = {
  "buffer_size": 32,
  "min_count": 16,
}

Any suggestions regarding this? I appreciate your help!

The text was updated successfully, but these errors were encountered:

roywang021 · 2024-08-11T11:48:35Z

Have you solved this problem? Can you share it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM when using "stabilityai/stable-diffusion-2-1" with batch size of 2 #24

OOM when using "stabilityai/stable-diffusion-2-1" with batch size of 2 #24

xilongzhou commented May 1, 2024

roywang021 commented Aug 11, 2024

OOM when using "stabilityai/stable-diffusion-2-1" with batch size of 2 #24

OOM when using "stabilityai/stable-diffusion-2-1" with batch size of 2 #24

Comments

xilongzhou commented May 1, 2024

roywang021 commented Aug 11, 2024