Awesome
Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration
This repository contains the code (in TensorFlow) for the paper:
Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration <br> Lu Sheng, Ziyi Lin, Jing Shao, Xiaogang Wang <br> CVPR 2018
Overview
In this repository, we propose an efficient and effective Avatar-Net that enables visually plausible multi-scale transfer for arbitrary style in real-time. The key ingredient is a style decorator that makes up the content features by semantically aligned style features, which does not only holistically match their feature distributions but also preserve detailed style patterns in the decorated features. By embedding this module into an image reconstruction network that fuses multi-scale style abstractions, the Avatar-Net renders multi-scale stylization for any style image in one feed-forward pass.
Examples
Comparison with Prior Arts
<p align='center'><img src="./docs/figures/closed_ups.png" width="500"></p>- The result by Avatar-Net receives concrete multi-scale style patterns (e.g. color distribution, brush strokes and circular patterns in candy image).
- WCT distorts the brush strokes and circular patterns. AdaIN cannot even keep the color distribution, while Style-Swap fails in this example.
Execution Efficiency
Method | Gatys et. al. | AdaIN | WCT | Style-Swap | Avatar-Net |
---|---|---|---|---|---|
256x256 (sec) | 12.18 | 0.053 | 0.62 | 0.064 | 0.071 |
512x512 (sec) | 43.25 | 0.11 | 0.93 | 0.23 | 0.28 |
- Avatar-Net has a comparable executive time as AdaIN and GPU-accelerated Style-Swap, and is much faster than WCT and the optimization-based style transfer by Gatys et. al..
- The reference methods and the proposed Avatar-Net are implemented on a same TensorFlow platform with a same VGG network as the backbone.
Dependencies
- TensorFlow (version >= 1.0, but just tested on TensorFlow 1.0).
- Heavily depend on TF-Slim and its model repository.
Download
- The trained model of Avatar-Net can be downloaded through the Google Drive.
- The training of our style transfer network requires pretrained VGG networks, and they can be obtained from the TF-Slim model repository. The encoding layers of Avatar-Net are also borrowed from pretrained VGG models.
- MSCOCO dataset is applied for the training of the proposed image reconstruction network.
Usage
Basic Usage
Simply use the bash file ./scripts/evaluate_style_transfer.sh
to apply Avatar-Net to all content images in CONTENT_DIR
from any style image in STYLE_DIR
. For example,
bash ./scripts/evaluate_style_transfer.sh gpu_id CONTENT_DIR STYLE_DIR EVAL_DIR
gpu_id
: the mounted GPU ID for the TensorFlow session.CONTENT_DIR
: the directory of the content images. It can be./data/contents/images
for multiple exemplar content images, or./data/contents/sequences
for an exemplar content video.STYLE_DIR
: the directory of the style images. It can be./data/styles
for multiple exemplar style images.EVAL_DIR
: the output directory. It contains multiple subdirectories named after the names of the style images.
More detailed evaluation options can be found in evaluate_style_transfer.py
, such as
python evaluate_style_transfer.py
Configuration
The detailed configuration of Avatar-Net is listed in configs/AvatarNet.yml
, including the training specifications and network hyper-parameters. The style decorator has three options:
patch_size
: the patch size for the normalized cross-correlation, in default is5
.style_coding
: the projection and reconstruction method, eitherZCA
orAdaIN
.style_interp
: interpolation option between the transferred features and the content features, eithernormalized
orbiased
.
The style transfer is actually performed in AvatarNet.transfer_styles(self, inputs, styles, inter_weight, intra_weights)
, in which
inputs
: the content images.styles
: a list of style images (len(styles)
> 2 for multiple style interpolation).inter_weight
: the weight balancing the style and content images.intra_weights
: a list of weights balancing the effects from different styles.
Users may modify the evaluation script for multiple style interpolation or content-style trade-off.
Training
- Download MSCOCO datasets and transfer the raw images into
tfexamples
, according to the python script./datasets/convert_mscoco_to_tfexamples.py
. - Use
bash ./scripts/train_image_reconstruction.sh gpu_id DATASET_DIR MODEL_DIR
to start training with default hyper-parameters.gpu_id
is the mounted GPU for the applied Tensorflow session. ReplaceDATASET_DIR
with the path to MSCOCO training images andMODEL_DIR
to Avatar-Net model directory.
Citation
If you find this code useful for your research, please cite the paper:
Lu Sheng, Ziyi Lin, Jing Shao and Xiaogang Wang, "Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [Arxiv]
@inproceedings{sheng2018avatar,
Title = {Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration},
author = {Sheng, Lu and Lin, Ziyi and Shao, Jing and Wang, Xiaogang},
Booktitle = {Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
pages={1--9},
year={2018}
}
Acknowledgement
This project is inspired by many style-agnostic style transfer methods, including AdaIN, WCT and Style-Swap, both from their papers and codes.
Contact
If you have any questions or suggestions about this paper, feel free to contact me (lsheng@ee.cuhk.edu.hk)