[CVPR 2024 Highlight] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
Paper / Arxiv / Project Page
To run our code, please follow these steps:
It may require a single GPU with more than 20GB of memory. I tested the code in the pytorch/pytorch:1.8.1-cuda11.1-cudnn8-devel Docker image.
** You can also refer to "diffusers_implementation/" for StyleID implementation based on diffusers library. **
Our codebase is built on (CompVis/stable-diffusion and MichalGeyer/plug-and-play) and has similar dependencies and model architecture.
conda env create -f environment.yaml
conda activate StyleID
Download the StableDiffusion weights from the CompVis organization at Hugging Face
(download the sd-v1-4.ckpt
file), and link them:
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt
For running StyleID, run:
python run_styleid.py --cnt <content_img_dir> --sty <style_img_dir>
For running default configuration in sample image files, run:
python run_styleid.py --cnt data/cnt --sty data/sty --gamma 0.75 --T 1.5 # default
python run_styleid.py --cnt data/cnt --sty data/sty --gamma 0.3 --T 1.5 # high style fidelity
To fine-tune the parameters, you have control over the following aspects in the style transfer:
- Attention-based style injection is removed by the
--without_attn_injection
parameter. - Query preservation is controlled by the
--gamma
parameter. (A higher value enhances content fidelity but may result a lack of style fidelity). - Attention temperature scaling is controlled through the
--T
parameter. - Initial latent AdaIN is removed by the
--without_init_adain
parameter.
By default, it generates a "precomputed_feats" directory and saves the DDIM inversion feature of each input image. This reduces the time for two DDIM inversions but requires a significant amount of storage (over 3 GB for each image). If you encounter "no space left" error, please set the "precomputed" parameter as follows:
python run_styleid.py --precomputed "" # not save DDIM inversion features
For a quantitative evaluation, we incorporate a set of randomly selected inputs from MS-COCO and WikiArt in "./data" directory.
Before executing evalution code, please duplicate the content and style images to match the number of stylized images first. (40 styles, 20 contents -> 800 style images, 800 content images)
run:
python util/copy_inputs.py --cnt data/cnt --sty data/sty
We largely employ matthias-wright/art-fid and mahmoudnafifi/HistoGAN for our evaluation.
run:
cd evaluation;
python eval_artfid.py --sty ../data/sty_eval --cnt ../data/cnt_eval --tar ../output
run:
cd evaluation;
python eval_histogan.py --sty ../data/sty_eval --tar ../output
Also, we additionally provide the style and content images for qualitative comparsion, in "./data_vis" directory.
If you find our work useful, please consider citing and star:
@InProceedings{Chung_2024_CVPR,
author = {Chung, Jiwoo and Hyun, Sangeek and Heo, Jae-Pil},
title = {Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {8795-8805}
}