Skip to content

Latest commit

 

History

History
 
 

263-latent-consistency-models-image-generation

Image generation with Latent Consistency Model and OpenVINO

LCMs: The next generation of generative models after Latent Diffusion Models (LDMs). Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling is computationally intensive and leads to slow generation.

Inspired by Consistency Models, Latent Consistency Models (LCMs) were proposed, enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion. The Consistency Model (CM) (Song et al., 2023) is a new family of generative models that enables one-step or few-step generation. The core idea of the CM is to learn the function that maps any points on a trajectory of the PF-ODE (probability flow of ordinary differential equation) to that trajectory’s origin (i.e., the solution of the PF-ODE). By learning consistency mappings that maintain point consistency on ODE-trajectory, these models allow for single-step generation, eliminating the need for computation-intensive iterations. However, CM is constrained to pixel space image generation tasks, making it unsuitable for synthesizing high-resolution images. LCMs adopt a consistency model in the image latent space for generation high-resolution images. Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Utilizing image latent space in large-scale diffusion models like Stable Diffusion (SD) has effectively enhanced image generation quality and reduced computational load. The authors of LCMs provide a simple and efficient one-stage guided consistency distillation method named Latent Consistency Distillation (LCD) to distill SD for few-step (2∼4) or even 1-step sampling and propose the SKIPPING-STEP technique to further accelerate the convergence. More details about the proposed approach and models can be found using the following resources: project page, paper, original repository.

In this[tutorial, we consider how to convert and run LCM using OpenVINO. We consider 2 examples of using LCM:

Using models prepared with Latent Consistency Distillation

This notebook is a demonstration in which you can type a text description and the pipeline will generate an image that reflects the context of the input text, using model distilled using Latent Consistency Distillation (LCD) approach. Step-by-step, the diffusion process will iteratively denoise latent image representation while being conditioned on the text embeddings provided by the text encoder.

The following image shows an example of the input sequence and corresponding predicted image.

Input text: a beautiful pink unicorn, 8k

Notebook Contents

This notebook demonstrates how to convert and run LCM_Dreamshaper_v7 using OpenVINO. An additional part demonstrates how to run quantization with NNCF to speed up pipeline.

The notebook contains the following steps:

  1. Convert PyTorch models to OpenVINO Intermediate Representation using OpenVINO Model Conversion API
  2. Prepare Inference Pipeline.
  3. Run Inference pipeline with OpenVINO.
  4. Optimize LatentConsistencyModelPipeline with NNCF quantization.
  5. Compare results of original and optimized pipelines.
  6. Run Interactive demo.

The notebook also provides interactive interface for image generation based on user input

Using models prepared using LCM LoRA

The LCM LoRA allows to training of just a small number of adapters, known as LoRA layers, instead of the full model. The resulting LoRAs can then be applied to any fine-tuned version of the model without having to distill them separately. The benefit of this LCM LoRA distillation process is that it can be integrated into the existing inference pipelines without changes to the main code, for example, into the ControlNet-guided Stable Diffusion pipeline. More details about LCM LoRA can be found in the technical report and blog post

Notebook Contents

This notebook explores how to speed up ControlNet pipeline using LCM LoRA and OpenVINO. It demonstrates how to apply LCM LoRA and convert and run stable-diffusion-1.5 conditioned by ConrolNet Normal Mapping using OpenVINO. An additional part demonstrates how to run quantization with NNCF to speed up pipeline.

The following image shows an example of the input sequence, input image, condition image and corresponding predicted image.

Input text: A head full of roses

  1. Convert PyTorch models to OpenVINO Intermediate Representation using OpenVINO Model Conversion API
  2. Prepare Inference Pipeline.
  3. Run Inference pipeline with OpenVINO.
  4. Optimize OVControlNetStableDiffusionPipeline with NNCF quantization.
  5. Compare results of original and optimized pipelines.
  6. Run Interactive demo.

The notebook also provides interactive interface for image generation based on user input.

Installation Instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.