Home

Awesome

DM-VTON: Distilled Mobile Real-time Virtual Try-On

<div align="center">

[Paper] [Colab Notebook] [Web Demo]

<img src="https://raw.githubusercontent.com/KiseKloset/DM-VTON/assets/promotion.png" width="35%"><br>

This is the official pytorch implementation of DM-VTON: Distilled Mobile Real-time Virtual Try-On. DM-VTON is designed to be fast, lightweight, while maintaining the quality of the try-on image. It can achieve 40 frames per second on a single Nvidia Tesla T4 GPU and only take up 37 MB of memory.

<img src="https://raw.githubusercontent.com/KiseKloset/DM-VTON/assets/model_diagram.png" class="left" width='100%'> </div>

<div align="center"> šŸ“ Documentation </div>

Installation

This source code has been developed and tested with python==3.10, as well as pytorch=1.13.1 and torchvision==0.14.1. We recommend using the conda package manager for installation.

  1. Clone this repo.
git clone https://github.com/KiseKloset/DM-VTON.git
  1. Install dependencies with conda (we provide script scripts/install.sh).
conda create -n dm-vton python=3.10
conda activate dm-vton
bash scripts/install.sh

Data Preparation

VITON

Because of copyright issues with the original VITON dataset, we use a resized version provided by CP-VTON. We followed the work of Han et al. to filter out duplicates and ensure no data leakage happens (VITON-Clean). You can download VITON-Clean dataset here.

VITONVITON-Clean
Training pairs142216824
Testing pairs2032416

Dataset folder structure:

ā”œā”€ā”€ VTON-Clean
|   ā”œā”€ā”€ VITON_test
|   |   ā”œā”€ā”€ test_pairs.txt
|   |   ā”œā”€ā”€ test_img
ā”‚   ā”‚   ā”œā”€ā”€ test_color
ā”‚   ā”‚   ā”œā”€ā”€ test_edge
|   ā”œā”€ā”€ VITON_traindata
|   |   ā”œā”€ā”€ train_pairs.txt
|   |   ā”œā”€ā”€ train_img
ā”‚   ā”‚   ā”‚   ā”œā”€ā”€ [000003_0.jpg | ...]  # Person
ā”‚   ā”‚   ā”œā”€ā”€ train_color
ā”‚   ā”‚   ā”‚   ā”œā”€ā”€ [000003_1.jpg | ...]  # Garment
ā”‚   ā”‚   ā”œā”€ā”€ train_edge
ā”‚   ā”‚   ā”‚   ā”œā”€ā”€ [000003_1.jpg | ...]  # Garment mask
ā”‚   ā”‚   ā”œā”€ā”€ train_label
ā”‚   ā”‚   ā”‚   ā”œā”€ā”€ [000003_0.jpg | ...]  # Parsing map
ā”‚   ā”‚   ā”œā”€ā”€ train_densepose
ā”‚   ā”‚   ā”‚   ā”œā”€ā”€ [000003_0.npy | ...]  # Densepose
ā”‚   ā”‚   ā”œā”€ā”€ train_pose
ā”‚   ā”‚   ā”‚   ā”œā”€ā”€ [000003_0.json | ...] # Openpose
<!-- #### Custom dataset -->

Inference

test.py run inference on image folders, then evaluate FID, LPIPS, runtime and save results to runs/TEST_DIR. Check the sample script for running: scripts/test.sh. You can download the pretrained checkpoints here.

Note: to run and save separate results for each pair [person, garment], set batch_size=1.

Training

For each dataset, you need to train a Teacher network first to guide the Student network. DM-VTON uses FS-VTON as the Teacher. Each model is trained through 2 stages: first stage only trains warping module and stage 2 trains the entire model (warping module + generator). Check the sample scripts for training both Teacher network (scripts/train_pb_warp + scripts/train_pb_e2e) and Student network (scripts/train_pf_warp + scripts/train_pf_e2e). We also provide a Colab notebook Colab as a quick tutorial.

Training Settings

A full list of trainning settings can be found in opt/train_opt.py. Below are some important settings.

<div align="center"> šŸ“ˆ Result </div>

<div align="center"> <img src="https://raw.githubusercontent.com/KiseKloset/DM-VTON/assets/fps.png" class="left" width='60%'> </div>

Results on VITON

MethodsFID $\downarrow$Runtime (ms) $\downarrow$Memory (MB) $\downarrow$
ACGPN (CVPR20)33.3153.6565.9
PF-AFN (CVPR21)27.335.8293.3
C-VTON (WACV22)37.166.9168.6
SDAFN (ECCV22)30.283.4150.9
FS-VTON (CVPR22)26.537.5309.3
OURS28.223.337.8

<div align="center"> šŸ˜Ž Supported Models </div>

We also support some parser-free models that can be used as Teacher and/or Student. The methods all have a 2-stage architecture (warping module and generator). For more details, see here.

MethodsSourceTeacherStudent
PF-AFNParser-Free Virtual Try-on via Distilling Appearance Flowsāœ…āœ…
FS-VTONStyle-Based Global Appearance Flow for Virtual Try-Onāœ…āœ…
RMGNRMGN: A Regional Mask Guided Network for Parser-free Virtual Try-onāŒāœ…
DM-VTON (Ours)DM-VTON: Distilled Mobile Real-time Virtual Try-Onāœ…āœ…

<div align="center"> ā„¹ Citation </div>

If our code or paper is helpful to your work, please consider citing:

@inproceedings{nguyen2023dm,
  title        = {DM-VTON: Distilled Mobile Real-time Virtual Try-On},
  author       = {Nguyen-Ngoc, Khoi-Nguyen and Phan-Nguyen, Thanh-Tung and Le, Khanh-Duy and Nguyen, Tam V and Tran, Minh-Triet and Le, Trung-Nghia},
  year         = 2023,
  booktitle    = {IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)},
}

<div align="center"> šŸ™ Acknowledgments </div>

This code is based on PF-AFN.

<div align="center"> šŸ“„ License </div>

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /> This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. The use of this code is for academic purposes only.