Skip to content

KiseKloset/DM-VTON

Folders and files

NameName
Last commit message
Last commit date
Aug 30, 2023
Sep 16, 2023
Sep 16, 2023
Sep 16, 2023
Sep 16, 2023
Sep 8, 2023
Sep 4, 2023
Sep 16, 2023
Sep 2, 2023
Sep 16, 2023
Sep 11, 2023
Sep 19, 2023
Apr 20, 2024
Sep 1, 2023
Sep 16, 2023
Sep 9, 2023
Sep 9, 2023
Sep 9, 2023
Sep 9, 2023
Sep 2, 2023
Sep 1, 2023

Repository files navigation

DM-VTON: Distilled Mobile Real-time Virtual Try-On

[Paper] [Colab Notebook] [Web Demo]


This is the official pytorch implementation of DM-VTON: Distilled Mobile Real-time Virtual Try-On. DM-VTON is designed to be fast, lightweight, while maintaining the quality of the try-on image. It can achieve 40 frames per second on a single Nvidia Tesla T4 GPU and only take up 37 MB of memory.

πŸ“ Documentation

Installation

This source code has been developed and tested with python==3.10, as well as pytorch=1.13.1 and torchvision==0.14.1. We recommend using the conda package manager for installation.

  1. Clone this repo.
git clone https://github.com/KiseKloset/DM-VTON.git
  1. Install dependencies with conda (we provide script scripts/install.sh).
conda create -n dm-vton python=3.10
conda activate dm-vton
bash scripts/install.sh

Data Preparation

VITON

Because of copyright issues with the original VITON dataset, we use a resized version provided by CP-VTON. We followed the work of Han et al. to filter out duplicates and ensure no data leakage happens (VITON-Clean). You can download VITON-Clean dataset here.

VITON VITON-Clean
Training pairs 14221 6824
Testing pairs 2032 416

Dataset folder structure:

β”œβ”€β”€ VTON-Clean
|   β”œβ”€β”€ VITON_test
|   |   β”œβ”€β”€ test_pairs.txt
|   |   β”œβ”€β”€ test_img
β”‚   β”‚   β”œβ”€β”€ test_color
β”‚   β”‚   β”œβ”€β”€ test_edge
|   β”œβ”€β”€ VITON_traindata
|   |   β”œβ”€β”€ train_pairs.txt
|   |   β”œβ”€β”€ train_img
β”‚   β”‚   β”‚   β”œβ”€β”€ [000003_0.jpg | ...]  # Person
β”‚   β”‚   β”œβ”€β”€ train_color
β”‚   β”‚   β”‚   β”œβ”€β”€ [000003_1.jpg | ...]  # Garment
β”‚   β”‚   β”œβ”€β”€ train_edge
β”‚   β”‚   β”‚   β”œβ”€β”€ [000003_1.jpg | ...]  # Garment mask
β”‚   β”‚   β”œβ”€β”€ train_label
β”‚   β”‚   β”‚   β”œβ”€β”€ [000003_0.jpg | ...]  # Parsing map
β”‚   β”‚   β”œβ”€β”€ train_densepose
β”‚   β”‚   β”‚   β”œβ”€β”€ [000003_0.npy | ...]  # Densepose
β”‚   β”‚   β”œβ”€β”€ train_pose
β”‚   β”‚   β”‚   β”œβ”€β”€ [000003_0.json | ...] # Openpose

Inference

test.py run inference on image folders, then evaluate FID, LPIPS, runtime and save results to runs/TEST_DIR. Check the sample script for running: scripts/test.sh. You can download the pretrained checkpoints here.

Note: to run and save separate results for each pair [person, garment], set batch_size=1.

Training

For each dataset, you need to train a Teacher network first to guide the Student network. DM-VTON uses FS-VTON as the Teacher. Each model is trained through 2 stages: first stage only trains warping module and stage 2 trains the entire model (warping module + generator). Check the sample scripts for training both Teacher network (scripts/train_pb_warp + scripts/train_pb_e2e) and Student network (scripts/train_pf_warp + scripts/train_pf_e2e). We also provide a Colab notebook Colab as a quick tutorial.

Training Settings

A full list of trainning settings can be found in opt/train_opt.py. Below are some important settings.

  • device: Device (gpu) for performing training (e.g. 0,1,2). DM-VTON needs a GPU to run with cupy.
  • batch_size: Customize batch_size for each stage to optimize for your hardware.
  • lr: learning rate
  • Epochs = niter + niter_decay
    • niter: Number of epochs using starting learning rate.
    • niter_decay: Number of epochs to linearly decay learning rate to zero.
  • save_period: Frequency of saving checkpoints after save_period epochs.
  • resume: Use if you want to continue training from a previous process.
  • project and name: The results (checkpoints, logs, images, etc.) will be saved in the project/name folder. Note that if the folder already exists, the code will create a new folder (e.g. project/name-1, project/name-2).`

πŸ“ˆ Result

Results on VITON

Methods FID ↓ Runtime (ms) ↓ Memory (MB) ↓
ACGPN (CVPR20) 33.3 153.6 565.9
PF-AFN (CVPR21) 27.3 35.8 293.3
C-VTON (WACV22) 37.1 66.9 168.6
SDAFN (ECCV22) 30.2 83.4 150.9
FS-VTON (CVPR22) 26.5 37.5 309.3
OURS 28.2 23.3 37.8

😎 Supported Models

We also support some parser-free models that can be used as Teacher and/or Student. The methods all have a 2-stage architecture (warping module and generator). For more details, see here.

Methods Source Teacher Student
PF-AFN Parser-Free Virtual Try-on via Distilling Appearance Flows βœ… βœ…
FS-VTON Style-Based Global Appearance Flow for Virtual Try-On βœ… βœ…
RMGN RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on ❌ βœ…
DM-VTON (Ours) DM-VTON: Distilled Mobile Real-time Virtual Try-On βœ… βœ…

β„Ή Citation

If our code or paper is helpful to your work, please consider citing:

@inproceedings{nguyen2023dm,
  title        = {DM-VTON: Distilled Mobile Real-time Virtual Try-On},
  author       = {Nguyen-Ngoc, Khoi-Nguyen and Phan-Nguyen, Thanh-Tung and Le, Khanh-Duy and Nguyen, Tam V and Tran, Minh-Triet and Le, Trung-Nghia},
  year         = 2023,
  booktitle    = {IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)},
}

πŸ™ Acknowledgments

This code is based on PF-AFN.

πŸ“„ License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The use of this code is for academic purposes only.