Home

Awesome

Swapping Autoencoder for Deep Image Manipulation

Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang

UC Berkeley and Adobe Research

NeurIPS 2020

teaser

<p float="left"> <img src="https://taesung.me/SwappingAutoencoder/index_files/church_style_swaps.gif" height="190" /> <img src="https://taesung.me/SwappingAutoencoder/index_files/tree_smaller.gif" height="190" /> <img src="https://taesung.me/SwappingAutoencoder/index_files/horseshoe_bend_evensmaller.gif" height="190" /> </p>

Project page | Paper | 3 Min Video

Overview

<img src='imgs/overview.jpg' width="1000px"/>

Swapping Autoencoder consists of autoencoding (top) and swapping (bottom) operation. Top: An encoder E embeds an input (Notre-Dame) into two codes. The structure code is a tensor with spatial dimensions; the texture code is a 2048-dimensional vector. Decoding with generator G should produce a realistic image (enforced by discriminator D matching the input (reconstruction loss). Bottom: Decoding with the texture code from a second image (Saint Basil's Cathedral) should look realistic (via D) and match the texture of the image, by training with a patch co-occurrence discriminator Dpatch that enforces the output and reference patches look indistinguishable.

Installation / Requirements

Testing and Evaluation.

We provide the pretrained models and also several images that reproduce the figures of the paper. Please download and unzip them here (2.1GB) (Note: this is a http (not https) address, and you may need to paste in the link URL directly in the address bar for some browsers like Chrome, or download the dataset using wget). The scripts assume that the checkpoints are at ./checkpoints/, and the test images at ./testphotos/, but they can be changed by modifying --checkpoints_dir and --dataroot options.

UPDATE: The pretrained model for the AFHQ dataset was added. Please download the models and samples images here (256MB) (Note: again, you may need to paste in the link URL directly in the address bar).

Swapping and Interpolation of the mountain model using sample images

<img src='imgs/interpolation.png' width="1000px"/>

To run simple swapping and interpolation, specify the two input reference images, change input_structure_image and input_texture_image fields of experiments/mountain_pretrained_launcher.py, and run

python -m experiments mountain_pretrained test simple_swapping
python -m experiments mountain_pretrained test simple_interpolation

The provided script, opt.tag("simple_swapping") and opt.tag("simple_interpolation") in particular of experiments/mountain_pretrained_launcher.py, invokes a terminal command that looks similar to the following one.

python test.py --evaluation_metrics simple_swapping \
--preprocess scale_shortside --load_size 512 \
--name mountain_pretrained  \
--input_structure_image [path_to_sample_image] \
--input_texture_image [path_to_sample_image] \
--texture_mix_alpha 0.0 0.25 0.5 0.75 1.0

In other words, feel free to use this command if that feels more straightforward.

The output images are saved at ./results/mountain_pretrained/simpleswapping/.

Texture Swapping

<img src='imgs/swapping.jpg' width="1000px"/> Our Swapping Autoencoder learns to disentangle texture from structure for image editing tasks such as texture swapping. Each row shows the result of combining the structure code of the leftmost image with the texture code of the top image.

To reproduce this image (Figure 4) as well as Figures 9 and 12 of the paper, run the following command:


# Reads options from ./experiments/church_pretrained_launcher.py
python -m experiments church_pretrained test swapping_grid

# Reads options from ./experiments/bedroom_pretrained_launcher.py
python -m experiments bedroom_pretrained test swapping_grid

# Reads options from ./experiments/mountain_pretrained_launcher.py
python -m experiments mountain_pretrained test swapping_grid

# Reads options from ./experiments/ffhq512_pretrained_launcher.py
python -m experiments ffhq512_pretrained test swapping_grid

Make sure the dataroot and checkpoints_dir paths are correctly set in the respective ./experiments/xx_pretrained_launcher.py script.

Quantitative Evaluations

To perform quantitative evaluation such as FID in Table 1, Fig 5, and Table 2, we first need to prepare image pairs of input structure and texture references images.

The reference images are randomly selected from the val set of LSUN, FFHQ, and the Waterfalls dataset. The pairs of input structure and texture images should be located at input_structure/ and input_style/ directory, with the same file name. For example, input_structure/001.png and input_style/001.png will be loaded together for swapping.

Replace the path to the test images at dataroot="./testphotos/church/fig5_tab2/" field of the script experiments/church_pretrained_launcher.py, and run

python -m experiments church_pretrained test swapping_for_eval
python -m experiments ffhq1024_pretrained test swapping_for_eval

The results can be viewed at ./results (that can be changed using --result_dir option).

The FID is then computed between the swapped images and the original structure images, using https://github.com/mseitzer/pytorch-fid.

Model Training.

Datasets

Training Scripts

The training configurations are specified using the scripts in experiments/*_launcher.py. Use the following commands to launch various trainings.

# Modify |dataroot| and |checkpoints_dir| at
# experiments/[church,bedroom,ffhq,mountain]_launcher.py
python -m experiments church train church_default
python -m experiments bedroom train bedroom_default
python -m experiments ffhq train ffhq512_default
python -m experiments ffhq train ffhq1024_default

# By default, the script uses GPUtil to look at available GPUs
# on the machine and sets appropriate GPU IDs. To specify specific set of GPUs,
# use the |--gpu| option. Be sure to also change |num_gpus| option in the corresponding script.
python -m experiments church train church_default --gpu 01234567

The training progress can be monitored using visdom at the port number specified by --display_port. The default is https://localhost:2004. For reference, the training takes 14 days on LSUN Church 256px, using 4 V100 GPUs.

Additionally, a few swapping grids are generated using random samples of the training set. They are saved as webpages at [checkpoints_dir]/[expr_name]/snapshots/. The frequency of the grid generation is controlled using --evaluation_freq.

All configurable parameters are printed at the beginning of training. These configurations are spreaded throughout the codes in def modify_commandline_options of relevant classes, such as models/swapping_autoencoder_model.py, util/iter_counter.py, or models/networks/encoder.py. To change these configuration, simply modify the corresponding option in opt.specify of the training script.

The code for parsing and configurations are at experiments/__init__.py, experiments/__main__.py, experiments/tmux_launcher.py.

Continuing training.

The training continues by default from the last checkpoint, because the --continue_train option is set True by default. To start from scratch, remove the checkpoint, or specify continue_train=False in the training script (e.g. experiments/church_launcher.py).

Code Structure (Main Functions)

Change Log

Bibtex

If you use this code for your research, please cite our paper:

@inproceedings{park2020swapping,
  title={Swapping Autoencoder for Deep Image Manipulation},
  author={Park, Taesung and Zhu, Jun-Yan and Wang, Oliver and Lu, Jingwan and Shechtman, Eli and Efros, Alexei A. and Zhang, Richard},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}

Acknowledgment

The StyleGAN2 layers heavily borrows (or rather, directly copies!) the PyTorch implementation of @rosinality. We thank Nicholas Kolkin for the helpful discussion on the automated content and style evaluation, Jeongo Seo and Yoseob Kim for advice on the user interface, and William T. Peebles, Tongzhou Wang, and Yu Sun for the discussion on disentanglement.