Home

Awesome

SD-VITON-Virtual-Try-On

This is the official repository for the following paper:

Towards Squeezing-Averse Virtual Try-On via Sequential Deformation [arxiv]

Sang-Heon Shim, Jiwoo Chung, Jae-Pil Heo
Accepted by AAAI 2024.

teaser 

Notice

This repository is currently built only for sharing the source code of an academic research paper.
It has several limitations. Please check out them at below.

News

Installation

Clone this repository:

git clone https://github.com/SHShim0513/SD-VITON.git
cd ./SD-VITON/

Install PyTorch and other dependencies:

conda create -n {env_name} python=3.8
conda activate {env_name}
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c nvidia
pip install opencv-python torchgeometry Pillow tqdm tensorboardX scikit-image scipy timm==0.4.12

Dataset

We train and evaluate our model using the dataset from the following link.
We assume that you have downloaded it into ./data.

Inference

Here are the download links for each model checkpoint:

DatasetNetwork TypeOutput ResolutionGoogle Cloud
VITON-HDTry-on condition generatorAppearance flows with 128 x 96Download
VITON-HDTry-on image generatorImages with 1024 x 768Download
python3 test_generator.py --occlusion --test_name {test_name} --tocg_checkpoint {condition generator ckpt} --gpu_ids {gpu_ids} --gen_checkpoint {image generator ckpt} --datasetting unpaired --dataroot {dataset_path} --data_list {pair_list_textfile} --composition_mask

Training

Try-on condition generator

python3 train_condition.py --gpu_ids {gpu_ids} --Ddownx2 --Ddropout --interflowloss --occlusion --tvlambda_tvob 2.0 --tvlambda_taco 2.0

Try-on image generator

python3 train_generator.py --name test -b 4 -j 8 --gpu_ids {gpu_ids} --fp16 --tocg_checkpoint {condition generator ckpt path} --occlusion --composition_mask

This stage takes approximately 4 days with two A6000 GPUs.

To use "--fp16" option, you should install apex library.

Limitations

Our work still has several limitations that are not an unique problem of ours in our best knowledge.

Issue #1: crack

Several samples have sufferred from a crack artifact.
In our best knowledge, the crack is amplified due to the up-sizing of last appearance flows (AFs).
E.g., our network infers the last AFs with 128 x 96 resolution, and then up-scales to 1024 x 768.
Thereby, the crack regions are extended.

teaser 

A slightly reduceable way will be to infer the last AFs with more closer to an image resolution (see "After").
We provide a checkpoint, where networks infer the AFs with 256 x 192 and an image with 512 x 384 resolution.

DatasetNetwork TypeOutput ResolutionGoogle Cloud
VITON-HDTry-on condition generatorAppearance flows with 256 x 192Download
VITON-HDTry-on image generatorImages with 512 x 384Download

The corresponding script for inference is as follows:

python3 test_generator.py --occlusion --test_name {test_name} --tocg_checkpoint {condition generator ckpt} --gpu_ids {gpu_ids} --gen_checkpoint {image generator ckpt} --datasetting unpaired --dataroot {dataset_path} --data_list {pair_list_textfile} --fine_width 384 --fine_height 512 --num_upsampling_layers more --cond_G_ngf 48 --cond_G_input_width 384 --cond_G_input_height 512 --cond_G_num_layers 6

Issue #2: clothes behind the neck

Same as other methods, our network cannot fully remove the clothes textures behind the neck.
Thereby, it remains in the generated samples.

A solution would be to mask out such regions when pre-processing the inputs.
We did not apply such additional technique, since it was not included in a dataset.

Acknowledgments

This repository is built based on HR-VITON repository. Thanks for the great work.

Citation

If you find this work useful for your research, please cite our paper:

@article{shim2023towards,
  title={Towards Squeezing-Averse Virtual Try-On via Sequential Deformation},
  author={Shim, Sang-Heon and Chung, Jiwoo and Heo, Jae-Pil},
  journal={arXiv preprint arXiv:2312.15861},
  year={2023}
}