Awesome
PTI: Pivotal Tuning for Latent-based editing of Real Images (ACM TOG 2022)
<!-- > Recently, a surge of advanced facial editing techniques have been proposed that leverage the generative power of a pre-trained StyleGAN. To successfully edit an image this way, one must first project (or invert) the image into the pre-trained generator’s domain. As it turns out, however, StyleGAN’s latent space induces an inherent tradeoff between distortion and editability, i.e. between maintaining the original appearance and convincingly altering some of its attributes. Practically, this means it is still challenging to apply ID-preserving facial latent-space editing to faces which are out of the generator’s domain. In this paper, we present an approach to bridge this gap. Our technique slightly alters the generator, so that an out-of-domain image is faithfully mapped into an in-domain latent code. The key idea is pivotal tuning — a brief training process that preserves the editing quality of an in-domain latent region, while changing its portrayed identity and appearance. In Pivotal Tuning Inversion (PTI), an initial inverted latent code serves as a pivot, around which the generator is fined-tuned. At the same time, a regularization term keeps nearby identities intact, to locally contain the effect. This surgical training process ends up altering appearance features that represent mostly identity, without affecting editing capabilities. To supplement this, we further show that pivotal tuning can also adjust the generator to accommodate a multitude of faces, while introducing negligible distortion on the rest of the domain. We validate our technique through inversion and editing metrics, and show preferable scores to state-of-the-art methods. We further qualitatively demonstrate our technique by applying advanced edits (such as pose, age, or expression) to numerous images of well-known and recognizable identities. Finally, we demonstrate resilience to harder cases, including heavy make-up, elaborate hairstyles and/or headwear, which otherwise could not have been successfully inverted and edited by state-of-the-art methods. --><a href="https://arxiv.org/abs/2106.05744"><img src="https://img.shields.io/badge/arXiv-2008.00951-b31b1b.svg"></a>
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg"></a>
Inference Notebook: <a href="https://colab.research.google.com/github/danielroich/PTI/blob/main/notebooks/inference_playground.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=20></a>
Description
Official Implementation of our PTI paper + code for evaluation metrics. PTI introduces an optimization mechanizem for solving the StyleGAN inversion task. Providing near-perfect reconstruction results while maintaining the high editing abilitis of the native StyleGAN latent space W. For more details, see <a href="https://arxiv.org/abs/2106.05744"><img src="https://img.shields.io/badge/arXiv-2008.00951-b31b1b.svg"></a>
Recent Updates
2021.07.01: Fixed files download phase in the inference notebook. Which might caused the notebook not to run smoothly.
2021.06.29: Added support for CPU. In order to run PTI on CPU please change device
parameter under configs/global_config.py
to "cpu" instead of "cuda".
2021.06.25 : Adding mohawk edit using StyleCLIP+PTI in inference notebook. Updating documentation in inference notebook due to Google Drive rate limit reached. Currently, Google Drive does not allow to download the pretrined models using Colab automatically. Manual intervention might be needed.
Getting Started
Prerequisites
- Linux or macOS
- NVIDIA GPU + CUDA CuDNN (Not mandatory bur recommended)
- Python 3
Installation
- Dependencies:
- lpips
- wandb
- pytorch
- torchvision
- matplotlib
- dlib
- All dependencies can be installed using pip install and the package name
Pretrained Models
Please download the pretrained models from the following links.
Auxiliary Models
We provide various auxiliary models needed for PTI inversion task.
This includes the StyleGAN generator and pre-trained models used for loss computation.
Path | Description |
---|---|
FFHQ StyleGAN | StyleGAN2-ada model trained on FFHQ with 1024x1024 output resolution. |
Dlib alignment | Dlib alignment used for images preproccessing. |
FFHQ e4e encoder | Pretrained e4e encoder. Used for StyleCLIP editing. |
Note: The StyleGAN model is used directly from the official stylegan2-ada-pytorch implementation. For StyleCLIP pretrained mappers, please see StyleCLIP's official routes
By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models
.
However, you may use your own paths by changing the necessary values in configs/path_configs.py
.
Inversion
Preparing your Data
In order to invert a real image and edit it you should first align and crop it to the correct size. To do so you should perform One of the following steps:
- Run
notebooks/align_data.ipynb
and change the "images_path" variable to the raw images path - Run
utils/align_data.py
and change the "images_path" variable to the raw images path
Weights And Biases
The project supports Weights And Biases framework for experiment tracking. For the inversion task it enables visualization of the losses progression and the generator intermediate results during the initial inversion and the Pivotal Tuning(PT) procedure.
The log frequency can be adjusted using the parameters defined at configs/global_config.py
under the "Logs" subsection.
There is no no need to have an account. However, in order to use the features provided by Weights and Biases you first have to register on their site.
Running PTI
The main training script is scripts/run_pti.py
. The script receives aligned and cropped images from paths configured in the "Input info" subscetion in
configs/paths_config.py
.
Results are saved to directories found at "Dirs for output files" under configs/paths_config.py
. This includes inversion latent codes and tuned generators.
The hyperparametrs for the inversion task can be found at configs/hyperparameters.py
. They are intilized to the default values used in the paper.
Editing
By default, we assume that all auxiliary edit directions are downloaded and saved to the directory editings
.
However, you may use your own paths by changing the necessary values in configs/path_configs.py
under "Edit directions" subsection.
Example of editing code can be found at scripts/latent_editor_wrapper.py
Inference Notebooks
To help visualize the results of PTI we provide a Jupyter notebook found in notebooks/inference_playground.ipynb
.
The notebook will download the pretrained models and run inference on a sample image found online or
on images of your choosing. It is recommended to run this in Google Colab.
The notebook demonstrates how to:
- Invert an image using PTI
- Visualise the inversion and use the PTI output
- Edit the image after PTI using InterfaceGAN and StyleCLIP
- Compare to other inversion methods
Evaluation
Currently the repository supports qualitative evaluation for reconstruction of: PTI, SG2 (W Space), e4e, SG2Plus (W+ Space).
As well as editing using InterfaceGAN and GANSpace for the same inversion methods.
To run the evaluation please see evaluation/qualitative_edit_comparison.py
. Examples of the evaluation scripts are:
Coming Soon - Quantitative evaluation and StyleCLIP qualitative evaluation
Repository structure
Path | Description <img width=200> |
---|---|
├ configs | Folder containing configs defining Hyperparameters, paths and logging |
├ criteria | Folder containing various loss and regularization criterias for the optimization |
├ dnnlib | Folder containing internal utils for StyleGAN2-ada |
├ docs | Folder containing the latent space edit directions |
├ editings | Folder containing images displayed in the README |
├ environment | Folder containing Anaconda environment used in our experiments |
├ licenses | Folder containing licenses of the open source projects used in this repository |
├ models | Folder containing models used in different editing techniques and first phase inversion |
├ notebooks | Folder with jupyter notebooks to demonstrate the usage of PTI end-to-end |
├ scripts | Folder with running scripts for inversion, editing and metric computations |
├ torch_utils | Folder containing internal utils for StyleGAN2-ada |
├ training | Folder containing the core training logic of PTI |
├ utils | Folder with various utility functions |
Credits
StyleGAN2-ada model and implementation:
https://github.com/NVlabs/stylegan2-ada-pytorch
Copyright © 2021, NVIDIA Corporation.
Nvidia Source Code License https://nvlabs.github.io/stylegan2-ada-pytorch/license.html
LPIPS model and implementation:
https://github.com/richzhang/PerceptualSimilarity
Copyright (c) 2020, Sou Uchida
License (BSD 2-Clause) https://github.com/richzhang/PerceptualSimilarity/blob/master/LICENSE
e4e model and implementation:
https://github.com/omertov/encoder4editing
Copyright (c) 2021 omertov
License (MIT) https://github.com/omertov/encoder4editing/blob/main/LICENSE
StyleCLIP model and implementation:
https://github.com/orpatashnik/StyleCLIP
Copyright (c) 2021 orpatashnik
License (MIT) https://github.com/orpatashnik/StyleCLIP/blob/main/LICENSE
InterfaceGAN implementation:
https://github.com/genforce/interfacegan
Copyright (c) 2020 genforce
License (MIT) https://github.com/genforce/interfacegan/blob/master/LICENSE
GANSpace implementation:
https://github.com/harskish/ganspace
Copyright (c) 2020 harkish
License (Apache License 2.0) https://github.com/harskish/ganspace/blob/master/LICENSE
Acknowledgments
This repository structure is based on encoder4editing and ReStyle repositories
Contact
For any inquiry please contact us at our email addresses: danielroich@gmail.com or ron.mokady@gmail.com
Citation
If you use this code for your research, please cite:
@article{roich2021pivotal,
title={Pivotal Tuning for Latent-based Editing of Real Images},
author={Roich, Daniel and Mokady, Ron and Bermano, Amit H and Cohen-Or, Daniel},
publisher = {Association for Computing Machinery},
journal={ACM Trans. Graph.},
year={2021}
}