Awesome
PPE ✨
Repository for our CVPR'2022 paper:
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model. Zipeng Xu, Tianwei Lin, Hao Tang, Fu Li, Dongliang He, Nicu Sebe, Radu Timofte, Luc Van Gool, Errui Ding. To appear in CVPR 2022.
</p>Pytorch implementation is at here: zipengxuc/PPE-Pytorch.
Updates
24 Mar 2022: We update our arxiv-version paper.
30 Mar 2022: We have had some changes in releasing the code. Pytorch implementation is now at here: zipengxuc/PPE-Pytorch.
14 Apr 2022: Update our PaddlePaddle inference code in this repository.
To reproduce our results:
Setup:
-
Install CLIP:
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION> pip install ftfy regex tqdm gdown pip install git+https://github.com/openai/CLIP.git
-
Download pre-trained models:
The code relies on the PaddleGAN (PaddlePaddle implementation of StyleGAN2). Download the pre-trained StyleGAN2 generator from here.
We provided several pretrained PPE models on here.
-
Invert real images:
The mapper is trained on latent vectors, so it is necessary to invert images into latent space. To edit human face, StyleCLIP provides the CelebA-HQ that was inverted by e4e: test set.
Usage:
Please first put downloaded pretraiend models and data on ckpt
folder.
Inference
In PaddlePaddle version, we only provide inference code to generate editing results:
python mapper/evaluate.py
Reference
@article{xu2022ppe,
author = {Zipeng Xu and Tianwei Lin and Hao Tang and Fu Li and Dongliang He and Nicu Sebe and Radu Timofte and Luc Van Gool and Errui Ding},
title = {Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model},
journal = {arXiv preprint arXiv:2111.13333},
year = {2021}
}
If you have any questions, please contact zipeng.xu@unitn.it. :)