


PyTorch implementation of our CVPR'2022 paper:

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model. Zipeng Xu, Tianwei Lin, Hao Tang, Fu Li, Dongliang He, Nicu Sebe, Radu Timofte, Luc Van Gool, Errui Ding. To appear in CVPR 2022.


This code is reimplemented based on the orpatashnik/StyleCLIP. We thank for their open sourcing.

We also have a PaddlePaddle implementation here.


24 Mar 2022: Update our arxiv-version paper.

26 Mar 2022: Release code for reimplementing the experiments in the paper.

30 Mar 2022: Create this new repository for the Pytorch implementation.

To be continued...

To reproduce our results:


Same as StyleCLIP, the setup is as follows:


All procedures are conducted under the mapper directory, so please run:

cd mapper
mkdir preprocess



Train the mapper network with Entanglement Loss based on the found attributes (we call it "anchors" colloquially):

python scripts/train.py --exp_dir ../results/black_hair_ppe --description "black hair" --anchors 'short eyebrows','with bangs','short hair','black eyes','narrow eyes','high cheekbones','with lipstick','pointy face','sideburns','with makeup' --tar_dist 0.1826171875


Evaluate the manipulation with our evaluation metric:

python scripts/evaluate.py --exp_dir ../results/black_hair_ppe --description "black hair" --anchors 'short eyebrows','with bangs','short hair','black eyes','narrow eyes','high cheekbones','with lipstick','pointy face','sideburns','with makeup' --tar_dist 0.1826171875


author = {Zipeng Xu and Tianwei Lin and Hao Tang and Fu Li and Dongliang He and Nicu Sebe and Radu Timofte and Luc Van Gool and Errui Ding},
title = {Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model},
journal = {arXiv preprint arXiv:2111.13333},
year = {2021}

Please contact zipeng.xu@unitn.it if you have any question.