Home

Awesome

Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint, CVPR2023 (Official PyTorch Implementation)

<a href="https://arxiv.org/abs/2211.11448"><img src="https://img.shields.io/badge/arXiv-2008.00951-b31b1b.svg" height=22.5></a> <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" height=22.5></a> <a href="https://www.youtube.com/watch?v=hsB9Wv50dm0"><img src="https://img.shields.io/static/v1?label=CVPR 2023&message=7 Minute Video&color=red" height=22.5></a>

Project

Hongyu Liu<sup>1</sup>, Yibing Song<sup>2</sup>, Qifeng Chen<sup>1</sup>

<sup>1</sup>HKUST, <sup>2</sup> AI3 Institute, Fudan University

<img src='doc/teaser.png'>

:sparkles: Pipeline

How to find the suitable foundation latent code $w$? Using Contrastive Learning!

<img src='doc/contrastive.png'>

Obtaining the better latent code $w^+$ and $f$ based on foundation latent code $w$!

<img src='doc/pipeline.png'>

Getting Started

Prerequisites

Installation

git clone https://github.com/KumapowerLIU/CLCAE.git
cd CLCAE

Pretrained Models

Please download the pre-trained models from the following links. Each CLCAE model contains the entire architecture, including the encoder and decoder weights.

PathDescription
FFHQ_InversionCLCAE trained with the FFHQ dataset for StyleGAN inversion.
Car_InversionCLCAE trained with the Car dataset for StyleGAN inversion.

If you wish to use one of the pretrained models for training or inference, you may do so using the flag --checkpoint_path_af.

In addition, we provide various auxiliary models needed for training your own pSp model from scratch as well as pretrained models needed for computing the ID metrics reported in the paper.

PathDescription
Contrastive model and data for FFHQContrastive model for FFHQ as mentioned in our paper
Contrastive model and data for CarContrastive model for Car as mentioned in our paper
FFHQ StyleGANStyleGAN model pretrained on FFHQ taken from rosinality with 1024x1024 output resolution.
Car StyleGANStyleGAN model pretrained on FFHQ taken from rosinality with 512x512 output resolution.
IR-SE50 ModelPretrained IR-SE50 model taken from TreB1eN for use in our ID loss during pSp training.
MoCo ResNet-50Pretrained ResNet-50 model trained using MOCOv2 for computing MoCo-based similarity loss on non-facial domains. The model is taken from the official implementation.
CurricularFace BackbonePretrained CurricularFace model taken from HuangYG123 for use in ID similarity metric computation.
MTCNNWeights for MTCNN model taken from TreB1eN for use in ID similarity metric computation. (Unpack the tar.gz to extract the 3 model weights.)

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.

Training

Preparing your Data

Training CLCAE (We set DDP as default setting )

The main training script can be found in scripts/train.py.
Intermediate training results are saved to opts.exp_dir. This includes checkpoints, train outputs, and test outputs.
Additionally, if you have tensorboard installed, you can visualize tensorboard logs in opts.exp_dir/logs.

Training the image and latent encoders during contrastive learning

For contrastive learning, you need generate the latent-image pair data with pre-trained StyleGAN model as mentioned in our paper.

For the FFHQ:

python3 -m torch.distributed.launch  --nproc_per_node GPU_NUM --use_env \
./scripts/train.py \
--exp_dir ./checkpoints/contrastive \
--use_norm --use_ddp --val_interval 2500 \
--save_interval 5000  --workers 8 --batch_size $batchsize_num --test_batch_size $batchsize_num
--dataset_type ffhq_encode_contrastive --train_contrastive True

For the Car:

python3 -m torch.distributed.launch  --nproc_per_node GPU_NUM --use_env \
./scripts/train.py \
--exp_dir ./checkpoints/contrastive \
--use_norm --use_ddp --val_interval 2500 \
--save_interval 5000  --workers 8 --batch_size $batchsize_num --test_batch_size $batchsize_num
--dataset_type car_encode_contrastive --train_contrastive True

Training the inversion model with contrastive learning

For the FFHQ:

python3 -m torch.distributed.launch  --nproc_per_node GPU_NUM --use_env  
./scripts/train.py    \
--exp_dir /checkpoints/ffhq_inversion  \
--use_norm --use_ddp   --val_interval 2500 --save_interval 5000  --workers 8 --batch_size 2 --test_batch_size 2 \
--lpips_lambda=0.2 --l2_lambda=1 --id_lambda=0.1 \
--feature_matching_lambda=0.01 --contrastive_lambda=0.1 --learn_in_w --output_size 1024
--dataset_type ffhq_encode_inversion --train_inversion True

For the Car:

python3 -m torch.distributed.launch  --nproc_per_node GPU_NUM --use_env  
./scripts/train.py    \
--exp_dir /checkpoints/car_inversion  \
--use_norm --use_ddp   --val_interval 2500 --save_interval 5000  --workers 8 --batch_size 2 --test_batch_size 2 \
--lpips_lambda=0.2 --l2_lambda=1 --id_lambda=0.1 \
--feature_matching_lambda=0.01 --contrastive_lambda=0.1 --learn_in_w --output_size 512
--dataset_type car_encode_inversion --train_inversion True --contrastive_model_image contrastive_car_image \ 
--contrastive_model_image contrastive_car_latent

Testing

Inference of Inversion

Having trained your model, you can use scripts/inference_inversion.py to apply the model on a set of images.
For example,

python3 scripts/inference_inversion.py \
--exp_dir=./results \
--checkpoint_path_af= You should wrtie the path of pretrainmodel \
--data_path= You should wrtie the path of test images folder \
--test_batch_size=1 \
--test_workers=1 \
--couple_outputs \
--resize_outputs
 

Inference of Editing

You should check the scripts/inference_edit.py and scripts/inference_edit_not_interface.py

Acknowledgments

This code borrows heavily from pSp, e4e and FeatureStyleEncoder

Citation

If you find our work useful for your research, please consider citing the following papers :)

@InProceedings{Liu_2023_CVPR,
    author    = {Liu, Hongyu and Song, Yibing and Chen, Qifeng},
    title     = {Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {10072-10082}
}

License

The codes and the pretrained model in this repository are under the MIT license as specified by the LICENSE file.