Home

Awesome

Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content, CVPR'20.

Rearranged code of CVPR 2020 paper 'Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content' for open-sourcing. We rearrange the VITON dataset for easy access.

Notably, virtual try-on is a difficult research topic, and our solution is of course not perfect. Please refer to our failure cases and limitations before using this repo.

The code is not fully tested. If you meet any bugs or want to improve the system, please feel free to raise in the Issue and we can disscuss. For email request, please send to hanyang@ethz.ch

[Sample Try-on Video] [Checkpoints]

[Dataset_Test] Dataset_Train

[Paper]

Update

Inference

python test.py

Note that the results of our pretrained model are only guaranteed in VITON dataset only, you should re-train the pipeline to get good results in other datasets.

Inference using colab Open In Colab

Thanks Levin for contributing the colab inference script.

Evaluation IS and SSIM

Note that The released checkpoints are different from what we used in the paper which generate better visual results but may have different (lower or higher) quantitative statistics. Same results of the paper can be reproduced by re-training with different training epochs.

The results for computing IS and SSIM are same-clothes reconstructed results.

The code defaultly generates random clothes-model pairs, so you need to modify ACGPN_inference/data/aligned_dataset.py to generate the reconstructed results.

Here, we also offer the reconstructed results on test set of VITON dataset by inferencing this github repo, [Precomputed Evaluation Results] The results here can be directly used to compute the IS and SSIM evalutations. You can get identical results using this github repo.

SSIM score

  1. Use the pytorch SSIM repo. https://github.com/Po-Hsun-Su/pytorch-ssim
  2. Normalize the image (img/255.0) and reshape correctly. If not normalized correctly, the results differ a lot.
  3. Compute the score with window size = 11. The SSIM score should be 0.8664, which is a higher score than reported in paper since it is a better checkpoint.

IS score

  1. Use the pytorch inception score repo. https://github.com/sbarratt/inception-score-pytorch
  2. Normalize the images ((img/255.0)*2-1) and reshape correctly. Please strictly follow the procedure given in this repo.
  3. Compute the score. The splits number also changes the results. We use splits number =1 to compute the results.
  4. Note that the released checkpoints produce IS score 2.82, which is slightly lower (but still SOTA) than the paper since it is a different checkpoint with better SSIM performance.

The specific key points we choose to evaluate the try-on difficulty

image

The formula to compute the difficulty of try-on reference image

image

where t is a certain key point, Mp' is the set of key point we take into consideration, and N is the size of the set.

Segmentation Label

0 -> Background
1 -> Hair
4 -> Upclothes
5 -> Left-shoe 
6 -> Right-shoe
7 -> Noise
8 -> Pants
9 -> Left_leg
10 -> Right_leg
11 -> Left_arm
12 -> Face
13 -> Right_arm

Sample images from different difficulty level

image

Sample Try-on Results

image

Limitations and Failure Cases

image 1. Large transformation of the semantic layout is hard to handle, partly ascribing to the agnostic input of fused segmentation. 2. The shape of the original clues is not completely removed. The same problem as VITON. 3. Very difficult pose is hard to handle. Better solution could be proposed.

Training Details

Due to some version differences of the code, and some updates for better quality, some implementation details may be different from the paper.

For better inference performance, model G and G2 should be trained with 200 epoches, while model G1 and U net should be trained with 20 epoches.

License

The use of this software is RESTRICTED to non-commercial research and educational purposes.

Citation

If you use our code or models or the offered baseline results in your research, please cite with:

@InProceedings{Yang_2020_CVPR,
author = {Yang, Han and Zhang, Ruimao and Guo, Xiaobao and Liu, Wei and Zuo, Wangmeng and Luo, Ping},
title = {Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

@inproceedings{ge2021disentangled,
  title={Disentangled Cycle Consistency for Highly-realistic Virtual Try-On},
  author={Ge, Chongjian and Song, Yibing and Ge, Yuying and Yang, Han and Liu, Wei and Luo, Ping},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={16928--16937},
  year={2021}
}

@inproceedings{yang2022full,
title = {Full-Range Virtual Try-On With Recurrent Tri-Level Transform},
author = {Yang, Han and Yu, Xinrui and Liu, Ziwei},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages = {3460--3469}
year = {2022}
}

Dataset

VITON Dataset This dataset is presented in VITON, containing 19,000 image pairs, each of which includes a front-view woman image and a top clothing image. After removing the invalid image pairs, it yields 16,253 pairs, further splitting into a training set of 14,221 paris and a testing set of 2,032 pairs.