Awesome

CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction(ECCV24)

CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction <br> Shengke Sun, Ziqian Luan, Zhanshan Zhao, Shijie Luo and Shuzhen Han <br> European Conference on Computer Vision ( ECCV ) 2024

This folder contains all the codes that implements the Consistent Latent Representation and Reconstruction method on StyleGAN-V2 and StyleGAN-V2-ADA used for reproducing the experimental results reported on the paper.

How to use this code:

Preparing datasets

Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels.

Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information. Alternatively, the folder can also be used directly as a dataset, without running it through dataset_tool.py first, but doing so may lead to suboptimal performance.

Legacy TFRecords datasets are not supported — see below for instructions on how to convert them.

FFHQ:

Step 1: Download the Flickr-Faces-HQ dataset as TFRecords.

Step 2: Extract images from TFRecords using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python ../stylegan2-ada/dataset_tool.py unpack \
    --tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked

Step 3: Create ZIP archive using dataset_tool.py from this repository:

# Original 1024x1024 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq.zip

# Scaled down 256x256 resolution.
#
# Note: --resize-filter=box is required to reproduce FID scores shown in the
# paper.  If you don't need to match exactly, it's better to leave this out
# and default to Lanczos.  See https://github.com/NVlabs/stylegan2-ada-pytorch/issues/283#issuecomment-1731217782
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq256x256.zip \
    --width=256 --height=256 --resize-filter=box

AFHQ: Download the AFHQ dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/afhq/train/cat --dest=~/datasets/afhqcat.zip
python dataset_tool.py --source=~/downloads/afhq/train/dog --dest=~/datasets/afhqdog.zip
python dataset_tool.py --source=~/downloads/afhq/train/wild --dest=~/datasets/afhqwild.zip

LSUN: Download the desired categories from the LSUN project page and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/lsun/raw/cat_lmdb --dest=~/datasets/lsuncat200k.zip \
    --transform=center-crop --width=256 --height=256 --max_images=200000

python dataset_tool.py --source=~/downloads/lsun/raw/car_lmdb --dest=~/datasets/lsuncar200k.zip \
    --transform=center-crop-wide --width=512 --height=384 --max_images=200000

Training new networks

In its most basic form, training new networks boils down to:

python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1 --dry-run
python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1

The first command is optional; it validates the arguments, prints out the training configuration, and exits. The second command kicks off the actual training.

In this example, the results are saved to a newly created directory ~/training-runs/<ID>-mydataset-auto1, controlled by --outdir. The training exports network pickles (network-snapshot-<INT>.pkl) and example images (fakes<INT>.png) at regular intervals (controlled by --snap). For each pickle, it also evaluates FID (controlled by --metrics) and logs the resulting scores in metric-fid50k_full.jsonl (as well as TFEvents if TensorBoard is installed).

The name of the output directory reflects the training configuration. For example, 00000-mydataset-auto1 indicates that the base configuration was auto1, meaning that the hyperparameters were selected automatically for training on one GPU. The base configuration is controlled by --cfg:

Base config	Description
`auto` (default)	Automatically select reasonable defaults based on resolution and GPU count. Serves as a good starting point for new datasets but does not necessarily lead to optimal results.
`paper256`	Reproduce results for FFHQ and LSUN Church at 256x256 using 1, 2, 4, or 8 GPUs.
`paper512`	Reproduce results for AFHQ-Cat at 512x512 using 1, 2, 4, or 8 GPUs.

The training configuration can be further customized with additional command line options:

--aug=noaug disables ADA.
--cond=1 enables class-conditional training (requires a dataset with labels).
--mirror=1 amplifies the dataset with x-flips. Often beneficial, even with ADA.
--resume=ffhq1024 --snap=10 performs transfer learning from FFHQ trained at 1024x1024.
--resume=~/training-runs/<NAME>/network-snapshot-<INT>.pkl resumes a previous training run.
--gamma=10 overrides R1 gamma. We recommend trying a couple of different values for each new dataset.
--aug=ada --target=0.7 adjusts ADA target value (default: 0.6).
--augpipe=blit enables pixel blitting but disables all other augmentations.
--augpipe=bgcfnc enables all available augmentations (blit, geom, color, filter, noise, cutout).

Please refer to python train.py --help for the full list.

Model Repository

The following table lists the pre-trained GAN model that be used to reproduce the experimental results listed in paper.

Model	Link
CIFAR-10	Google Drive
CelebA	Google Drive
AFHQ-Cat	Google Drive
LSUN-Church	Google Drive
ImageNet(64x64)	Google Drive

Acknowledgement

Thanks to StyleGAN-ADA for sharing the code.

BibTeX

If you find our work helpful for your research, please consider to cite:

@inproceedings{sun2024clrgan,
    title     = {CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction},
    author    = {Sun, Shengke and Luan, Ziqian and Zhao, Zhanshan and Luo, Shijie and Han, Shuzhen},
    booktitle = {European Conference on Computer Vision},
    year      = {2024}
}