Skip to content

Latest commit

 

History

History
277 lines (211 loc) · 27.5 KB

schedulers.md

File metadata and controls

277 lines (211 loc) · 27.5 KB

Schedulers for Stable Diffusion Inference

1. Introduction

Mindone contains multiple schedule functions for the diffusion process.

The schedule functions take in the output of a pre-trained model, a sample which the diffusion process is iterating on, and a timestep to return a denoised sample. Schedulers define the method for iteratively adding noise to an image or for updating a sample based on model outputs (removing noise). Schedulers are often defined by a noise schedule and an update rule to solve the differential equation solution.

2. Summary of Schedulers

Mindone implements 5 different schedulers in addition to the DDPM scheduler. The following table summarizes these schedulers:

Scheduler Reference
DDPM Denoising Diffusion Probabilistic Models
DDIM Denoising Diffusion Implicit Models
PLMS Pseudo Numerical Methods for Diffusion Models on Manifolds
DPM-Solver DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
DPM-Solver++ DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models
UniPC UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models

3. Inference with SD2.0

3.1 Quick Start

Normally, you can test the stable diffusion model using the following command with the default DPM-Solver++ scheduler (Refer to Stable Diffusion 2.0-Inference).

# Text to image generation with SD2.0
python text_to_image.py --prompt "A wolf in winter" --version 2.0

You can obtain diverse results according to the given prompt. Here are 5 examples:

DPM-Solver++ #1 DPM-Solver++ #2 DPM-Solver++ #3 DPM-Solver++ #4 DPM-Solver++ #5

3.2 Inference with Different Schedulers

As quantitative evaluation for different schedulers is usually insufficient to determine which is the best, it is often recommended to simply try them out and visually compare the results. In this chapter, we will demonstrate how to generate images using different schedulers, and show you the visual comparison of images, the time required for each scheduler for reference.

3.2.1 Commands of Inference with Different Schedulers

You can test the stable diffusion model with different scheduler using the following commands. The ${scheduler} can be one of ddim, plms, dpm_solver and uni_pc (note: the default scheduler is DPM-Solver++). The optimal Hyperparameters can vary for different schedulers. For example, the optimal ${sampling_steps} is 50 for PLMS, DDIM and 20 for UniPC,DPM-Solver, DPM-Solver++.

# Scheduler PLMS: set ${scheduler} to plms and ${sampling_steps} to 50
# Scheduler DDIM: set ${scheduler} to ddim and ${sampling_steps} to 50
# Scheduler DPM-Solver: set ${scheduler} to dpm_solver and ${sampling_steps} to 20
# Scheduler DPM-Solver++: set ${sampling_steps} to 20  (Note: in this case, no need to pass the ${scheduler} argument)
# Scheduler UniPC: set ${scheduler} to uni_pc and ${sampling_steps} to 20
# ${prompt} be "A Van Gogh style oil painting of sunflower" as default.
python text_to_image.py \
    --prompt ${prompt} \
    --config configs/v2-inference.yaml \
    --version 2.0 \
    --output_path ./output/ \
    --seed 42 \
    --${scheduler} \
    --n_iter 8 \
    --n_samples 1 \
    --W 512 \
    --H 512 \
    --sampling_steps ${sampling_steps}

Note that, If you set big sampling steps for UniPC schedulers, the program will report a warning such as The selected sampling timesteps are not appropriate for UniPC sampler. The default ${prompt} is "A Van Gogh style oil painting of sunflower".

3.2.2 Visual Comparison

Now, let's execute the command above to generate some images with different ${prompt} and different ${scheduler}. All these images have a default resolution of 512x512.

# Set ${scheduler} to plms, ddim, dpm_solver and uni_pc in turn (Note: also use their respective optimal ${sampling_steps})
# Note: no need to pass ${scheduler} while using dpm_solver_pp
python text_to_image.py  --prompt "A Van Gogh style oil painting of sunflower"  --config configs/v2-inference.yaml  --version 2.0  --seed 42  --${scheduler} --n_iter 8 --n_samples 1 --sampling_steps ${sampling_steps}
PLMS DDIM DPM-Solver DPM-Solver++ UniPC
# Set ${scheduler} to plms, ddim, dpm_solver and uni_pc in turn (Note: also use their respective optimal ${sampling_steps})
# Note: no need to pass ${scheduler} while using dpm_solver_pp
python text_to_image.py  --prompt "A photo of an astronaut riding a horse on mars"  --config configs/v2-inference.yaml  --version 2.0  --seed 42  --${scheduler} --n_iter 8 --n_samples 1 --sampling_steps ${sampling_steps}
PLMS DDIM DPM-Solver DPM-Solver++ UniPC
# Set ${scheduler} to plms, ddim, dpm_solver and uni_pc in turn (Note: also use their respective optimal ${sampling_steps})
# Note: no need to pass ${scheduler} while using dpm_solver_pp
python text_to_image.py  --prompt "A high tech solarpunk utopia in the Amazon rainforest"  --config configs/v2-inference.yaml  --version 2.0  --seed 42  --${scheduler} --n_iter 8 --n_samples 1 --sampling_steps ${sampling_steps}
PLMS DDIM DPM-Solver DPM-Solver++ UniPC
# Set ${scheduler} to plms, ddim, dpm_solver and uni_pc in turn (Note: also use their respective optimal ${sampling_steps})
# Note: no need to pass ${scheduler} while using dpm_solver_pp
python text_to_image.py  --prompt "The beautiful night view of the city has various buildings, traffic flow, and lights"  --config configs/v2-inference.yaml  --version 2.0  --seed 42  --${scheduler} --n_iter 8 --n_samples 1 --sampling_steps ${sampling_steps}
PLMS DDIM DPM-Solver DPM-Solver++ UniPC
# Set ${scheduler} to plms, ddim, dpm_solver and uni_pc in turn (Note: also use their respective optimal ${sampling_steps})
# Note: no need to pass ${scheduler} while using dpm_solver_pp
python text_to_image.py  --prompt "A pikachu fine dining with a view to the Eiffel Tower"  --config configs/v2-inference.yaml  --version 2.0  --seed 42  --${scheduler} --n_iter 8 --n_samples 1 --sampling_steps ${sampling_steps}
PLMS DDIM DPM-Solver DPM-Solver++ UniPC

3.2.3 Time Comparison

We made the following table to compare the time required for different schedulers (with their optimal sampling steps) to generate images by executing the command above on Ascend 910. This means that the image resolution is 521x512, n_iter is 8, n_samples is 1(calculate the average of the last 7 iterations). Please note that if n_samples is increased (e.g., 8), the time required for each sample will decrease.

scheduler sampling_steps time(second/image)
ddim 50 16.48s
plms 50 16.77s
dmp_solver 20 12.72s
dmp_solver_pp 20 13.43s
uni_pc 20 14.97s

4. Inference (based on LoRA)

The difference schedulers can be used for stable diffusion model + LoRA inference (see LoRA for more information). Users can specify schedulers in the same manner as described in chapter Inference with Different Schedulers. For detailed additional information, please refer to Use LoRA for Stable Diffusion Finetune. In this chapter, we will provide a visual, qualitative and quantitative comparison with Diffusers using different schedulers based on LoRa finetune.

4.1 Visual Comparison

Based on the LoRA models trained on pokemon and chinese_art datasets (see LoRA for more information), we test them using different schedulers. The base model is Stable Diffusion 2.0.

  • pokemon dataset:
# Set ${scheduler} to plms, ddim, dpm_solver and uni_pc in turn (Note: also use their respective optimal ${sampling_steps})
# Note: no need to pass ${scheduler} while using dpm_solver_pp
# Set ${prompt} to "a drawing of a blue and white cat with big eyes"
bash scripts/run_test_to_image_v2_lora.sh
PLMS DDIM DPM-Solver DPM-Solver++ UniPC
# Set ${scheduler} to plms, ddim, dpm_solver and uni_pc in turn (Note: also use their respective optimal ${sampling_steps})
# Note: no need to pass ${scheduler} while using dpm_solver_pp
# Set ${prompt} to "a cartoon of a black and white pokemon"
bash scripts/run_test_to_image_v2_lora.sh
PLMS DDIM DPM-Solver DPM-Solver++ UniPC
  • chinese_art dataset:
# Set ${scheduler} to plms, ddim, dpm_solver and uni_pc in turn (Note: also use their respective optimal ${sampling_steps})
# Note: no need to pass ${scheduler} while using dpm_solver_pp
# Set ${prompt} to "a painting of a group of people sitting on a hill with trees in the background and a stream of water"
bash scripts/run_test_to_image_v2_lora.sh
PLMS DDIM DPM-Solver DPM-Solver++ UniPC
# Set ${scheduler} to plms, ddim, dpm_solver and uni_pc in turn (Note: also use their respective optimal ${sampling_steps})
# Note: no need to pass ${scheduler} while using dpm_solver_pp
# Set ${prompt} to "a drawing of a village with a boat and a house in the background with a red ribbon on the bottom of the picture"
bash scripts/run_test_to_image_v2_lora.sh
PLMS DDIM DPM-Solver DPM-Solver++ UniPC

4.2 Qualitative Comparison with Diffusers

We also show some text-to-image generation samples for the LoRA models trained by MindOne and Diffusers. The base model is Stable Diffusion 2.0.

  • pokemon dataset:
${prompt}="a drawing of a black and gray dragon"
Framework PLMS DDIM DPM-Solver++ UniPC
MindOne
Diffusers
${prompt}="a cartoon panda with a leaf in its mouth"
Framework PLMS DDIM DPM-Solver++ UniPC
MindOne
Diffusers
  • chinese_art dataset:
${prompt}="a painting of a landscape with a mountain in the background and a river running through it with a few people on it"
Framework PLMS DDIM DPM-Solver++ UniPC
MindOne
Diffusers

4.3 Quantitative Comparison with Diffusers

Here are the evaluation results for our implementation.

Pretrained Model Dataset Finetune Method Sampling Algorithm FID (MindOne) ↓ FID (Diffusers) ↓
stable_diffusion_2.0_base pokemon_blip LoRA PLMS (scale: 9, steps: 50) 103 105
stable_diffusion_2.0_base pokemon_blip LoRA DDIM (scale: 9, steps: 50) 101 109
stable_diffusion_2.0_base pokemon_blip LoRA DPM Solver ++ (scale: 9, steps: 20) 98 107
stable_diffusion_2.0_base pokemon_blip LoRA UniPC (scale: 9, steps: 20) 104 107
stable_diffusion_2.0_base chinese_art_blip LoRA PLMS (scale: 9, steps: 50) 279 260
stable_diffusion_2.0_base chinese_art_blip LoRA DDIM (scale: 9, steps: 50) 277 250
stable_diffusion_2.0_base chinese_art_blip LoRA DPM Solver ++ (scale: 9, steps: 20) 265 254
stable_diffusion_2.0_base chinese_art_blip LoRA UniPC (scale: 9, steps: 20) 288 254

5. Inference with SD1.5

These different schedulers are also suitable for SD1.5 (See Stable Diffusion 1.5 for more detail). Users can specify schedulers in the same manner as described in chapter Inference with Different Schedulers, except switching SD from 2.0 to 1.5 by setting the --version (-v) argument. In this chapter, we will provide a visual text to image generation using different schedulers base on SD1.5.

5.1 Visual comparison

# Set ${scheduler} to plms, ddim, dpm_solver and uni_pc in turn (Note: also use their respective optimal ${sampling_steps})
# Note: no need to pass ${scheduler} while using dpm_solver_pp
# Note: here, the ${version} is 1.5 and config file is "configs/v1-inference.yaml"
 python text_to_image.py  --prompt "A wolf in winter"  --config configs/v1-inference.yaml  --version 1.5  --seed 42  --${scheduler}  --n_iter 8  --n_samples 1  --sampling_steps ${sampling_steps}
PLMS DDIM DPM-Solver DPM-Solver++ UniPC