Skip to content

Stable Diffusion with Self-attention guidance and ControlNet with artistic style capabilities.

Notifications You must be signed in to change notification settings

Shashank-Holla/diffusion-controlnet-sag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stable Diffusion with Self-Guided Attention and ControlNet

1. Overview

This is Stable Diffusion built on pre-trained Stable Diffusion v1.5 weights with Self-Attention Guidelines (SAG) to enhance generated image's stability. It also uses ControlNet, a neural network model, to support additional input to control the image generation. Additionally, the model can add artistic features to the generated image by utilizing trained style weights.

This model is built on Hugging Face modules. It utilizes Tokenizer, Text Encoder, Variational Auto Encoder and Unet model from it.

  1. Tokenizer - creates tokens with padding to match required length.
  2. Text Encoder - Get token embedding from tokens and the positional embedding. It is then combined and fed to a transformer model to get the output embedding
  3. UNet - Takes in noisy latents and predicts the noise residual of the latent shape.
  4. Variational Autoencoder - Takes in the latents and decodes it into the image space.

2. Features

2.1. Self Attention Guidelines

Self attention guidelines helps stable diffusion to improve generated image. It uses the intermediate self-attention maps to adversially blur and guides the model. Parameter sag_scale controls the SAG influence on the model.

2.2. ControlNet support

ControlNet conditions the diffusion model to learn specific user input conditions (like edges, depth). This helps it generate images which are related to the desired spatial context. canny and openpose controlnets are supported in this application. Conditional input image such as edge map, keypoints are also provided along with the controlnet model for inference. controlnet_cond_scale parameter controls the scale to which the generated image are faithful to the conditional image.

2.3. Style

The application is trained on a novel art via Textual Inversion. In our case, images stylistically related to pop-art are trained in order to associate it with <pop-art> word within the text encoder embedding. Training images and the weights for style training are available here <pop-art>.

To use the style, add in the prompt. While running the model, enable style_flag to use the style.

simg_1 simg_2 simg_3 simg_4

3. Deploy and Run

Stable Diffusion can be run in the following two ways-

3.1. Clone Repository and execute

Clone repository and change directory-

git clone https://github.com/Shashank-Holla/diffusion-controlnet-sag.git

cd diffusion-controlnet-sag/

Install dependencies-

pip install -r requirements.txt

Run model

!python main.py --prompt "Margot Robbie as wonderwoman in style" --seed 3 --batch_size 1 --controlNet_image ./control_images/controlimage_1.jpg --controlNet_type canny --style_flag T --sag_scale 0.75 --controlnet_cond_scale 1.0

3.2. Install CLI application and run

This repository is also available as CLI application. Build files are available in dist folder in this repository. Control Image and style weights path must be absolute. Valid Control Image is required if controlnet model is provided.

Clone repository and change directory-

git clone https://github.com/Shashank-Holla/diffusion-controlnet-sag.git

cd diffusion-controlnet-sag/

Install distribution-

!pip install dist/diffusion-0.0.7-py3-none-any.whl

Run application generate. Provide input as prompted-

/usr/local/bin/generate

4. Results

Shared here are few run results by changing the various parameters.

4.1. By changing SAG scale and adding artistic style

These run results are by varying SAG scale and adding artistic style.

Prompt Type Prompt Generated Image
Generation with SAG; without ControlNet without Style addition SAG scale changes Prompt: "Margot Robbie as wonderwoman in polychrome, good anatomy, best and quality, extremely detailed" SAG_scale: 0.25 img_6
Generation with SAG; without ControlNet without Style addition SAG scale changes Prompt: "Margot Robbie as wonderwoman in polychrome, good anatomy, best and quality, extremely detailed" SAG_scale: 1.0 img_7
Generation with SAG; without ControlNet with Style Addition Prompt: "Margot Robbie as wonderwoman in <pop-art> style" SAG_scale: 0.9 img_8

4.2. By adding ControlNet- Canny conditioning

Below is the control image used. Edge map is fed as the conditioning image for stable diffusion.

Control Image Extracted features for spatial context
img_10 img_11

Images with pop-art style shows the style did not exist in the base Stable Diffusion model and is added with the new weights and the newly added word <pop-art>. These images also shows how the model performs when the scale of controlNet conditioning is varied.

The second image has controlnet_cond_scale of 1.0 and closely follows the edge structure of the conditioning image.

Prompt Type Prompt Generated Image
Generation with SAG; with Canny ControlNet without Style addition Prompt: "Margot Robbie as wonderwoman in polychrome, good anatomy, best and quality, extremely detailed" ControlNet with Canny controlnet_cond_scale: 0.5 img_9
Generation with SAG; with Canny ControlNet without Style addition Controlnet_cond_scale changes Prompt: Margot Robbie as wonderwoman in style" ControlNet with Canny controlnet_cond_scale: 1.0 img_1
Generation with SAG; with Canny ControlNet with Style addition "Margot Robbie as wonderwoman in <pop-art> style" controlnet_cond_scale: 0.25 img_2

4.3. By adding ControlNet- Openpose

These images have been generated by passing keypoint control image.

Control Image Extracted features for spatial context
img_12 img_13
Prompt Type Prompt Generated Image
Generation with SAG; with OpenPose ControlNet without Style addition Prompt: "Margot Robbie as wonderwoman in style" ControlNet with OpenPose controlnet_cond_scale: 1.0 img_5
Generation with SAG; with OpenPose ControlNet with Style addition Controlnet_cond_scale changes Prompt: Margot Robbie as wonderwoman in <pop-art> style" ControlNet with OpenPose controlnet_cond_scale: 1.0 img_4

About

Stable Diffusion with Self-attention guidance and ControlNet with artistic style capabilities.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages