This is the official Github repo of "UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer". The first model that could combine all the person image generation functions, and conditioning using pose, text and visual prompts.
A follow-up paper has been published, resulting in huge improvement in image quality, check out the ViscoNet project page.
ICCV Workshop 2023: Official page
Arxiv: https://arxiv.org/abs/2304.08870.
BibTeX:
@InProceedings{Cheong_2023_ICCV,
author = {Cheong, Soon Yau and Mustafa, Armin and Gilbert, Andrew},
title = {UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {October},
year = {2023},
pages = {4173-4182}
}
Simultaneous pose and camera view interpolation via SMPL parameter linear interpolation.
The code was adapted from https://github.com/Stability-AI/stablediffusion/.
Featured in RSIP Vision Newsletter September 2023.
Click on the icon to view demonstration of earlier version of our app on Youtube.
[2023.12.09] A follow-up paper has been published, resulting in huge improvement in image quality, check out the ViscoNet project page.
[2023.09.25] Selected for live demo at ICCV 2023 on Friday Oct 6th. https://iccv2023.thecvf.com/demos-111.php
[2023.09.05] Featured in RSIP Vision Newsletter September 2023.
[2023.08.16] Accepted at 2023 IEEE/CVF International Conference on Computer Vision (ICCV) Workshops.
[2023.07.27] Updated Arxiv paper.
[2023.06.05] Training data and script released - pose transfer with bounding box as RPM. This concludes all the planned releases.
[2023.06.01] I have updated the code for pose interpolation. However, you will need to download the new model file interp_256.zip (previously pt_256.zip). The app now also come with pre-loaded style images and generated examples.
The ground truth and generated images used in the paper can be downaloded from the repo release.
A suitable conda environment named upgpt
can be created
and activated with:
conda env create -f environment.yaml
conda activate upgpt
Model checkpoints and dataset can be downloaded from HuggingFace.
This demonstration uses pre-segmented style images from DeepFashion Multimodal dataset and does not support arbitrary images that you upload. We provide a few samples in the app for you to play with. If you want to try more style images, follow instructions in "Additional Data".
- Download models interp_256.zip and upscale.zip(optional) and unzip into ./models/upgpt
- Start the app by typing in terminal
streamlit run app.py --server.fileWatcherType none
- Click "Image Styles->Browse files" to select images from ./fashion. Then "select styles" and click "Show/Get Styles" to extract style images. The model is trained for pose transfer, hence a face style image is advised to produce good result.
- Entering "style text" will override corresponding style images, therefore remove style text if you want to use style image.
- Download and unzip deepfashion_inshop.zip into datasets/deepfashion_inshop.
- You can try more style images from the DeepFashion Multimodal dataset by downloading and unzip images.zip from DeepFashion Multimodal dataset. Use this inplace of ./fashion to select fashion images from. Also, run
rm -r app_cache/styles && ln -s deepfashion_inshop/styles app_cache/styles
to link to the full dataset style images.
There are several configurations proposed in the paper but for simplicity we provide only one config (bounding box as RPM) that can perform both pose transfer and pose interpolation. If you want to compare our result (silhouette mask as RPM), we suggest you to download the generated images (see section "Paper's Result" above).
- Download and unzip deepfashion_inshop.zip into datasets/deepfashion_inshop.
- Download deepfashion_256_v2.ckpt and place it in models/first_stage_models/kl-f8-deepfashion
- Run train.sh, or
python main.py -t --base configs/deepfashion/bbox.yaml --gpus 0, --scale_lr False --num_nodes 1
Checkpoints and generated images will be saved in ./logs.
The following SMPL pose estimator was used in this project: https://github.com/facebookresearch/phosa