Awesome

Anyres-GAN

Any-resolution Training for High-resolution Image Synthesis.
ECCV 2022
Lucy Chai, Michaël Gharbi, Eli Shechtman, Phillip Isola, Richard Zhang

Prerequisites

Linux
gcc-7
Python 3
NVIDIA GPU + CUDA CuDNN

Table of Contents:

Colab - run it in your browser without installing anything locally
Setup - download pretrained models and resources
Pretrained Models - quickstart with pretrained models
Notebooks - jupyter notebooks for interactive composition
Training - pipeline for training encoders
Evaluation - evaluation script

Colab

Interactive Demo: Try our interactive demo here! Does not require local installations.

Setup

Clone this repo:

git clone https://github.com/chail/anyres-gan.git

Install dependencies:
- gcc-7 or above is required for installation. Update gcc following these steps.
- We provide a Conda environment.yml file listing the dependencies. You can create a Conda environment with the dependencies using:

conda env create -f environment.yml

Download resources: we provide a script for downloading associated resources and pretrained models. Fetch these by running:

bash download_resources.sh

Quickstart with pretrained models

Pretrained models are downloaded from the above download_resources.sh script. Any-resolution images can be constructed by specifying the appropriate transformation matrices. The following code snippet provides a basic example; additional examples can be found in the notebook.

import pickle
import torch
import numpy as np
from util import patch_util, renormalize
torch.set_grad_enabled(False)

PATH = 'pretrained/bird_pretrained_final.pkl'

with open(PATH, 'rb') as f:
    G_base = pickle.load(f)['G_ema'].cuda()  # torch.nn.Module
    
full_size = 500
seed = 0

rng = np.random.RandomState(seed)
z = torch.from_numpy(rng.standard_normal(G_base.z_dim)).float()
z = z[None].cuda()
c = None

ws = G_base.mapping(z, c, truncation_psi=0.5, truncation_cutoff=8)
full = torch.zeros([1, 3, full_size, full_size])
patches = patch_util.generate_full_from_patches(full_size, G_base.img_resolution)
for bbox, transform in patches:
    img = patch_util.scale_condition_wrapper(G_base, ws, transform[None].cuda(), noise_mode='const', force_fp32=True)
    full[:, :, bbox[0]:bbox[1], bbox[2]:bbox[3]] = img
renormalize.as_image(full[0])

Notebooks

Note: remember to add the conda environment to jupyter kernels:

python -m ipykernel install --user --name anyres-gan

We provide example notebook notebook-demo.ipynb for running inference on pretrained models.

Training

See the script train.sh for training examples.

Training notes:

patch-based training is run in two stages: first global fixed-resolution pretraining, then patch training
arguments --batch-gpu and --gamma are taken from Stylegan 3 recommended configurations
arguments --random_crop=True and --patch_crop=True performs random cropping on fixed-resolution and variable resolution datasets respectively.
--scale_max and --scale_min correspond to the largest and smallest sampled image scales for patch training (size = 1/scale * g_size). --scale_max should correspond to the smallest image size in the patch dataset (for example, if the smallest image is 512px and the generator size is 256, then --scale_max=0.5). Omitting --scale_min will use the smallest possible scale as the minimum bound (the image native size).
--scale_mapping_min and --scale_mapping_max correspond to normalization limits in the scale mapping branch; the min can be kept at 1 and the max can be set to an approximate zoom factor between the fixed-resolution dataset and the size of the HR images.
for patch training, metrics are evaluated offline, hence --metrics=none should be specified for training. See below for more details on evaluation.

Training progress can be visualized using:

tensorboard --logdir training-runs/

Datasets

Beyond the standard FFHQ and LSUN Church datasets, we train on datasets scraped from flickr. Due to licensing we cannot release this images directly. Please see datasets/download/download_dataset.sh for examples on how to download the flickr datasets. You will need to fill in a flickr api key and secret and pip install flickr_api.

For the LSUN Church dataset, you can follow the standard stylegan data preparation and use the resulting archive for training.

Evaluations

See custom_metrics.sh for an example on running FID variations and pFID on the patch models.

pFID can be specified using a string such as fid-patch256-min256max0: this samples 50k patches of size 256, with minimum image size 256 and maximum image size as the max size allowable by a given real image.
The max sampled size could also be specified with a number; for example fid-patch256-min256max1024.
For larger models (e.g. mountains), FID by default downsamples images to 299 width; therefore we use a variant that further takes a crop of the image: fid-subpatch1024-min1024max0.
Note that these metrics are implemented to run on a single gpu.

Note: the released pretrained models are reimplementations of the models used in the current paper version, so the evaluation numbers are slightly different.

Acknowledgements

Our code is largely based on the Stylegan3 repository (license). Changes to the StyleGAN3 code are documented in diff. Some additional utilities are from David Bau and Taesung Park, and we thank Assaf Shocher for proofreading. Remaining changes are covered under Adobe Research License.

Citation

If you use this code for your research, please cite our paper:

@inproceedings{chai2022anyresolution,
    title={Any-resolution training for high-resolution image synthesis.},
    author={Chai, Lucy and Gharbi, Michael and Shechtman, Eli and Isola, Phillip and Zhang, Richard},
    booktitle={European Conference on Computer Vision},
    year={2022}
}