Skip to content

Latest commit

 

History

History
 
 

236-stable-diffusion-v2

Text-to-Image Generation and Infinite Zoom with Stable Diffusion v2 and OpenVINO™

Stable Diffusion v2 is the next generation of Stable Diffusion model a Text-to-Image latent diffusion model created by the researchers and engineers from Stability AI and LAION.

General diffusion models are machine learning systems that are trained to denoise random gaussian noise step by step, to get to a sample of interest, such as an image. Diffusion models have been shown to achieve state-of-the-art results for generating image data. But one downside of diffusion models is that the reverse denoising process is slow. In addition, these models consume a lot of memory because they operate in pixel space, which becomes unreasonably expensive when generating high-resolution images. Therefore, it is challenging to train these models and also use them for inference. OpenVINO brings capabilities to run model inference on Intel hardware and opens the door to the fantastic world of diffusion models for everyone!

In previous notebooks, we already discussed how to run Text-to-Image generation and Image-to-Image generation using Stable Diffusion v1 and controlling its generation process using ControlNet. Now, we have Stable Diffusion v2 as our latest showcase.

This notebook series demonstrates approaches to image generation using an AI method called diffusion:

  • Text-to-Image generation to create images from a text description as input.

  • Text-to-Image-demo is a shorter version of the original notebook for demo purposes, if would like to get started right away and run the notebook more easily.

This is a demonstration in which you can type a text description and the pipeline will generate an image that reflects the context of the input text. Step-by-step, the diffusion process will iteratively denoise latent image representation while being conditioned on the text embeddings provided by the text encoder.

The following image shows an example of the input text and the corresponding predicted image.

Input text: valley in the Alps at sunset, epic vista, beautiful landscape, 4k, 8k

  • Text-guided Inpainting generation to create an image, using text description and masked image region, which should be part of the generated image.

In this demonstration Stable Diffusion v2 Inpainting model for generating sequence of images for infinite zoom video effect, extending previous images beyond its borders.

The following image shows an example of the input text and corresponding video.

Input text: valley in the Alps at sunset, epic vista, beautiful landscape, 4k, 8k

  • Text-to-Image with Optimum-Intel-OpenVINO can create images from a text description as input, using Optimum-Intel-OpenVINO. You can load optimized models from the Hugging Face Hub and create pipelines to run inference with OpenVINO Runtime without rewriting your APIs. You can run this notebook multiple times.

  • Text-to-Image with Optimum-Intel-OpenVINO in Multiple Hardware. This notebook will provide you a way to see different precision models performing in different hardware. This notebook was done for showing case the use of Optimum-Intel-OpenVINO and it is not optimized for running multiple times.

Notebooks demonstrate how to convert and run Stable Diffusion v2 models using OpenVINO.

Notebook contains the following steps:

  1. Create pipeline with PyTorch models using Diffusers library.
  2. Convert PyTorch models to OpenVINO IR format, using model conversion API.
  3. Run Stable Diffusion v2 pipeline with OpenVINO.

If you have not installed all required dependencies, follow the Installation Guide.