classifier free guidance during training and inference #254

Lucky-Light-Sun · 2023-07-24T03:38:47Z

Hi, thanks for your owesome contributions.
Thesedays I wanna to do some work about stable diffusion(SD), so I carefully look through your source code. I have some questions about it, and hope to receive your answer.

classifier free guidance

According to Paper:High-Resolution Image Synthesis with Latent Diffusion Models, Stable Diffusion uses classifier free guidance to train LDM model and gets fantastic images. Algorithm 1 and Algorithm 2 in Classifier-Free Diffusion Guidance, tell us how to use the guidance to train and infer.

In lora_diffusion/cli_lora_pti.py file, the loss_step function in trainning code seems that it does not use classifier free guidance cause we don't set the textual condition to be ""(empty string) with some probability. So I want to ask, is that because we do not need the classifier free guidance during fine-tune process or we just forget to use it?
And what's more, just diving into the StableDiffusionInpaintPipeline in huggingface official lib diffusers I see they using the classifier free guidance during inference. As we can see the code below:

# StableDiffusionInpaintPipeline    _encode_prompt function:
if do_classifier_free_guidance and negative_prompt_embeds is None:
            uncond_tokens: List[str]
            if negative_prompt is None:
                uncond_tokens = [""] * batch_size
...
...

# StableDiffusionInpaintPipeline    __call__ function:
# perform guidance
if do_classifier_free_guidance:
        noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
        noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)

Here is the code about textual information used during train. We can see that the text is just the raw text provided from the dataset captions, and isn't set as the empty string with some probability such as 50%. So I guess we don't use classifier-free guidance during fine-tine process.

lora/lora_diffusion/cli_lora_pti.py

Lines 315 to 324 in bdd51b0

    
           if mixed_precision: 
        
               with torch.cuda.amp.autocast(): 
        
                   encoder_hidden_states = text_encoder( 
        
                       batch["input_ids"].to(text_encoder.device) 
        
                   )[0] 
        
                   model_pred = unet( 
        
                       latent_model_input, timesteps, encoder_hidden_states 
        
                   ).sample

I'm also wondering that can we just consider the image condition as the textual condition and use the classifier-free guidance? Cause during StableDiffusionInpaintPipeline call function, I don't see the classifier free guidance for image condition. They just use the textual info for classifier free guidance.

In the code below, we can see latent_model_input is just concated with mask and masked_image_latents. Maybe we can also use classifier-free guidance for image information.

lora/lora_diffusion/cli_lora_pti.py

Lines 306 to 313 in bdd51b0

    
           noisy_latents = scheduler.add_noise(latents, noise, timesteps) 
        
           if train_inpainting: 
        
               latent_model_input = torch.cat( 
        
                   [noisy_latents, mask, masked_image_latents], dim=1 
        
               ) 
        
           else: 
        
               latent_model_input = noisy_latents

Thank you for reading the above content. Currently, I am also conducting relevant experiments, and I hope you can provide some valuable suggestions!

inpainting code about blur_amount parameter

In your code, I see the define of the parameter, blur_amount but I don't see you use it.

lora/lora_diffusion/cli_lora_pti.py

Line 853 in bdd51b0

train_dataset.blur_amount = 200

I think maybe you forget to use it here:

lora/lora_diffusion/dataset.py

Lines 214 to 219 in bdd51b0

    
           masks = face_mask_google_mediapipe( 
        
               [ 
        
                   Image.open(f).convert("RGB") 
        
                   for f in self.instance_images_path 
        
               ] 
        
           )

because the official defination of this function is below:

def face_mask_google_mediapipe(
    images: List[Image.Image], blur_amount: float = 80.0, bias: float = 0.05
) -> List[Image.Image]

My English writing is not good, I hope you can forgive me

Finally, thank you for watching and looking forward to your reply!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classifier free guidance during training and inference #254

classifier free guidance during training and inference #254

Lucky-Light-Sun commented Jul 24, 2023 •

edited

Loading

classifier free guidance during training and inference #254

classifier free guidance during training and inference #254

Comments

Lucky-Light-Sun commented Jul 24, 2023 • edited Loading

Lucky-Light-Sun commented Jul 24, 2023 •

edited

Loading