Awesome
PPE ✨
PyTorch implementation of our CVPR'2022 paper:
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model. Zipeng Xu, Tianwei Lin, Hao Tang, Fu Li, Dongliang He, Nicu Sebe, Radu Timofte, Luc Van Gool, Errui Ding. To appear in CVPR 2022.
</p>This code is reimplemented based on the orpatashnik/StyleCLIP. We thank for their open sourcing.
We also have a PaddlePaddle implementation here.
Updates
24 Mar 2022: Update our arxiv-version paper.
26 Mar 2022: Release code for reimplementing the experiments in the paper.
30 Mar 2022: Create this new repository for the Pytorch implementation.
To be continued...
To reproduce our results:
Setup:
Same as StyleCLIP, the setup is as follows:
-
Install CLIP:
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION> pip install ftfy regex tqdm gdown pip install git+https://github.com/openai/CLIP.git
-
Download pre-trained models:
The code relies on the Rosinality pytorch implementation of StyleGAN2. Download the pre-trained StyleGAN2 generator from here.
The training also needs the weights for the facial recognition network used in the ID loss. Download the weights from here.
-
Invert real images:
The mapper is trained on latent vectors, so it is necessary to invert images into latent space. To edit human face, StyleCLIP provides the CelebA-HQ that was inverted by e4e: train set, test set.
Usage:
All procedures are conducted under the mapper directory, so please run:
cd mapper
mkdir preprocess
Predict
-
Aggregate the images that are most relevant to the text command:
python scripts/randc.py --cmd "black hair"
-
Find the attributes that appear most frequently in the command-relevant images:
python scripts/find_ancs.py --cmd "black hair"
Prevent
Train the mapper network with Entanglement Loss based on the found attributes (we call it "anchors" colloquially):
python scripts/train.py --exp_dir ../results/black_hair_ppe --description "black hair" --anchors 'short eyebrows','with bangs','short hair','black eyes','narrow eyes','high cheekbones','with lipstick','pointy face','sideburns','with makeup' --tar_dist 0.1826171875
Evaluate
Evaluate the manipulation with our evaluation metric:
python scripts/evaluate.py --exp_dir ../results/black_hair_ppe --description "black hair" --anchors 'short eyebrows','with bangs','short hair','black eyes','narrow eyes','high cheekbones','with lipstick','pointy face','sideburns','with makeup' --tar_dist 0.1826171875
Reference
@article{xu2022ppe,
author = {Zipeng Xu and Tianwei Lin and Hao Tang and Fu Li and Dongliang He and Nicu Sebe and Radu Timofte and Luc Van Gool and Errui Ding},
title = {Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model},
journal = {arXiv preprint arXiv:2111.13333},
year = {2021}
}
Please contact zipeng.xu@unitn.it if you have any question.