Skip to content

Latest commit

 

History

History
230 lines (152 loc) · 7.82 KB

README.md

File metadata and controls

230 lines (152 loc) · 7.82 KB

Website | ArXiv | Get Start

Global-Flow-Local-Attention

The source code for our paper "Deep Image Spatial Transformation for Person Image Generation" (to appear in CVPR2020)

We propose a Global-Flow Local-Attention Model for deep image spatial transformation. Our model can be flexibly applied to tasks such as:

  • Pose-Guided Person Image Generation:

Left: generated results of our model; Right: Input source images.

  • Pose-Guided Person Image Animation

Left most: Skeleton Squences. The others: Animation Results.

  • Face Image Animation

Left: Input image; Right: Output results.

  • View Synthesis

Form Left to Right: Input image, Results of Appearance Flow, Results of Ours, Ground-truth images.

News

  • 2020.4.30 Several demos are provided for quick exploration.

  • 2020.4.29 Code for Pose-Guided Person Image Animation is avaliable now!

  • 2020.3.15 We upload the code and trained models of the Face Animation and View Synthesis!

  • 2020.3.3 Project Website and Paper are avaliable!

  • 2020.2.29 Code for PyTorch is available now!

Colab Demo

For a quick exploration of our model, find the online colab demo.

Get Start

1) Installation

Requirements

  • Python 3
  • pytorch (1.0.0)
  • CUDA
  • visdom

Conda installation

# 1. Create a conda virtual environment.
conda create -n gfla python=3.6 -y
source activate gfla

# 2. Install dependency
pip install -r requirement.txt

# 3. Build pytorch Custom CUDA Extensions
./setup.sh

Note: The current code is tested with Tesla V100. If you use a different GPU, you may need to select correct nvcc_args for your GPU when you buil Custom CUDA Extensions. Comment or Uncomment --gencode in block_extractor/setup.py, local_attn_reshape/setup.py, and resample2d_package/setup.py. Please check here for details.

2) Download Resources

We provide the pre-trained weights of our model. The resources are listed as following:

Download the Per-Trained Models and the Demo Images by running the following code:

./download.sh

3) Pose-Guided Person Image Generation

The Pose-Guided Person Image Generation task is to transfer a source person image to a target pose.

Run the demo of this task:

python demo.py \
--name=pose_fashion_checkpoints \
--model=pose \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0 \
--dataset_mode=fashion \
--dataroot=./dataset/fashion \
--results_dir=./demo_results/fashion

For more training and testing details, please find the PERSON_IMAGE_GENERATION.md

4) Pose-Guided Person Image Animation

The Pose-Guided Person Image Animation task generates a video clip from a still source image according to a driving target sequence. We further model the temporal consistency for this task.

Run the the demo of this task:

python demo.py \
--name=dance_fashion_checkpoints \
--model=dance \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0 \
--dataset_mode=dance \
--sub_dataset=fashion \
--dataroot=./dataset/danceFashion \
--results_dir=./demo_results/dance_fashion \
--test_list=val_list.csv

For more training and testing details, please find the PERSON_IMAGE_ANIMATION.md.

5) Face Image Animation

Given an input source image and a guidance video sequence depicting the structure movements, our model generating a video containing the specific movements.

Run the the demo of this task:

python demo.py \
--name=face_checkpoints \
--model=face \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0 \
--dataset_mode=face \
--dataroot=./dataset/FaceForensics \
--results_dir=./demo_results/face 

We use the real video of the FaceForensics dataset. See FACE_IMAGE_ANIMATION.md for more details.

6) Novel View Synthesis

View synthesis requires generating novel views of objects or scenes based on arbitrary input views.

In this task, we use the car and chair categories of the ShapeNet dataset. See VIEW_SYNTHESIS.md for more details.

Citation

@article{ren2020deep,
  title={Deep Image Spatial Transformation for Person Image Generation},
  author={Ren, Yurui and Yu, Xiaoming and Chen, Junming and Li, Thomas H and Li, Ge},
  journal={arXiv preprint arXiv:2003.00696},
  year={2020}
}

Acknowledgement

We build our project base on Vid2Vid. Some dataset preprocessing methods are derived from Pose-Transfer.