Home

Awesome

<p align="center"> IC-GAN: Instance-Conditioned GAN </p>

Official Pytorch code of Instance-Conditioned GAN by Arantxa Casanova, Marlène Careil, Jakob Verbeek, Michał Drożdżal, Adriana Romero-Soriano. IC-GAN results

Generate images with IC-GAN in a Colab Notebook

We provide a Google Colab notebook to generate images with IC-GAN and its class-conditional counter part. We also invite users to check out the demo on Replicate, courtesy of Replicate.

The figure below depicts two instances, unseen during training and downloaded from Creative Commons search, and the generated images with IC-GAN and class-conditional IC-GAN when conditioning on the class "castle":

<p align="center"> <img src="./figures/icgan_transfer_all_github.png?raw=true"> </p>

Additionally, and inspired by this Colab, we provide the funcionality in the same Colab notebook to guide generations with text captions, using the CLIP model. As an example, the following Figure shows three instance conditionings and a text caption (top), followed by the resulting generated images with IC-GAN (bottom), when optimizing the noise vector following CLIP's gradient for 100 iterations.

<p align="center"> <img src="./figures/icgan_clip.png?raw=true"> </p>

Credit for the three instance conditionings, from left to right, that were modified with a resize and central crop: 1: "Landscape in Bavaria" by shining.darkness, licensed under CC BY 2.0, 2: "Fantasy Landscape - slolsss" by Douglas Tofoli is marked with CC PDM 1.0, 3: "How to Draw Landscapes Simply" by Kuwagata Keisai is marked with CC0 1.0

Requirements

Overview

This repository consists of four main folders:

(Python script) Generate images with IC-GAN

Alternatively, we can <b> generate images with IC-GAN models </b> directly from a python script, by following the next steps:

  1. Download the desired pretrained models (links below) and the pre-computed 1000 instance features from ImageNet and extract them into a folder pretrained_models_path.
modelbackboneclass-conditional?training datasetresolutionurl
IC-GANBigGANNoImageNet256x256model
IC-GAN (half capacity)BigGANNoImageNet256x256model
IC-GANBigGANNoImageNet128x128model
IC-GANBigGANNoImageNet64x64model
IC-GANBigGANYesImageNet256x256model
IC-GAN (half capacity)BigGANYesImageNet256x256model
IC-GANBigGANYesImageNet128x128model
IC-GANBigGANYesImageNet64x64model
IC-GANBigGANYesImageNet-LT256x256model
IC-GANBigGANYesImageNet-LT128x128model
IC-GANBigGANYesImageNet-LT64x64model
IC-GANBigGANNoCOCO-Stuff256x256model
IC-GANBigGANNoCOCO-Stuff128x128model
IC-GANStyleGAN2NoCOCO-Stuff256x256model
IC-GANStyleGAN2NoCOCO-Stuff128x128model
  1. Execute:
python inference/generate_images.py --root_path [pretrained_models_path] --model [model] --model_backbone [backbone] --resolution [res]

This script results in a .PNG file where several generated images are shown, given an instance feature (each row), and a sampled noise vector (each grid position).

<b>Additional and optional parameters</b>:

Data preparation

<div id="data-preparation"> <details> <summary>ImageNet</summary> <br> <ol> <li>Download dataset from <a href="https://image-net.org/download.php"> here </a>. </li> <li>Download <a href="https://github.com/facebookresearch/swav"> SwAV </a> feature extractor weights from <a href="https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar"> here </a>. </li> <li> Replace the paths in data_utils/prepare_data.sh: <code>out_path</code> by the path where hdf5 files will be stored, <code>path_imnet</code> by the path where ImageNet dataset is downloaded, and <code>path_swav</code> by the path where SwAV weights are stored. </li> <li> Execute <code>./data_utils/prepare_data.sh imagenet [resolution]</code>, where <code>[resolution]</code> can be an integer in {64,128,256}. This script will create several hdf5 files: <ul> <li> <code>ILSVRC[resolution]_xy.hdf5</code> and <code>ILSVRC[resolution]_val_xy.hdf5</code>, where images and labels are stored for the training and validation set respectively. </li> <li> <code>ILSVRC[resolution]_feats_[feature_extractor]_resnet50.hdf5</code> that contains the instance features for each image. </li> <li> <code>ILSVRC[resolution]_feats_[feature_extractor]_resnet50_nn_k[k_nn].hdf5</code> that contains the list of [k_nn] neighbors for each of the instance features. </li> </ul> </li> </ol> </br> </details> <details> <summary>ImageNet-LT</summary> <br> <ol> <li>Download ImageNet dataset from <a href="https://image-net.org/download.php"> here </a>. Following <a href="https://github.com/zhmiao/OpenLongTailRecognition-OLTR"> ImageNet-LT </a>, the file <code>ImageNet_LT_train.txt</code> can be downloaded from <a href="https://drive.google.com/drive/u/1/folders/1j7Nkfe6ZhzKFXePHdsseeeGI877Xu1yf" > this link </a> and later stored in the folder <code>./BigGAN_PyTorch/imagenet_lt</code>. </li> <li>Download the pre-trained weights of the ResNet on ImageNet-LT from <a href="https://dl.fbaipublicfiles.com/classifier-balancing/ImageNet_LT/models/resnet50_uniform_e90.pth"> this link</a>, provided by the <a href="https://github.com/facebookresearch/classifier-balancing"> classifier-balancing repository </a>. </li> <li> Replace the paths in data_utils/prepare_data.sh: <code>out_path</code> by the path where hdf5 files will be stored, <code>path_imnet</code> by the path where ImageNet dataset is downloaded, and <code>path_classifier_lt</code> by the path where the pre-trained ResNet50 weights are stored. </li> <li> Execute <code>./data_utils/prepare_data.sh imagenet_lt [resolution]</code>, where <code>[resolution]</code> can be an integer in {64,128,256}. This script will create several hdf5 files: <ul> <li> <code>ILSVRC[resolution]longtail_xy.hdf5</code>, where images and labels are stored for the training and validation set respectively. </li> <li> <code>ILSVRC[resolution]longtail_feats_[feature_extractor]_resnet50.hdf5</code> that contains the instance features for each image. </li> <li> <code>ILSVRC[resolution]longtail_feats_[feature_extractor]_resnet50_nn_k[k_nn].hdf5</code> that contains the list of [k_nn] neighbors for each of the instance features. </li> </ul> </li> </ol> </br> </details> <details> <summary>COCO-Stuff</summary> <br> <ol> <li>Download the dataset following the <a href="https://github.com/WillSuen/LostGANs/blob/master/INSTALL.md"> LostGANs' repository instructions </a>. </li> <li>Download <a href="https://github.com/facebookresearch/swav"> SwAV </a> feature extractor weights from <a href="https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar"> here </a>. </li> <li> Replace the paths in data_utils/prepare_data.sh: <code>out_path</code> by the path where hdf5 files will be stored, <code>path_imnet</code> by the path where ImageNet dataset is downloaded, and <code>path_swav</code> by the path where SwAV weights are stored. </li> <li> Execute <code>./data_utils/prepare_data.sh coco [resolution]</code>, where <code>[resolution]</code> can be an integer in {128,256}. This script will create several hdf5 files: <ul> <li> <code>COCO[resolution]_xy.hdf5</code> and <code>COCO[resolution]_val_test_xy.hdf5</code>, where images and labels are stored for the training and evaluation set respectively. </li> <li> <code>COCO[resolution]_feats_[feature_extractor]_resnet50.hdf5</code> that contains the instance features for each image. </li> <li> <code>COCO[resolution]_feats_[feature_extractor]_resnet50_nn_k[k_nn].hdf5</code> that contains the list of [k_nn] neighbors for each of the instance features. </li> </ul> </li> </ol> </br> </details> <details> <summary>Other datasets</summary> <br> <ol> <li>Download the corresponding dataset and store in a folder <code>dataset_path</code>. </li> <li>Download <a href="https://github.com/facebookresearch/swav"> SwAV </a> feature extractor weights from <a href="https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar"> here </a>. </li> <li> Replace the paths in data_utils/prepare_data.sh: <code>out_path</code> by the path where hdf5 files will be stored and <code>path_swav</code> by the path where SwAV weights are stored. </li> <li> Execute <code>./data_utils/prepare_data.sh [dataset_name] [resolution] [dataset_path]</code>, where <code>[dataset_name]</code> will be the dataset name, <code>[resolution]</code> can be an integer, for example 128 or 256, and <code>dataset_path</code> contains the dataset images. This script will create several hdf5 files: <ul> <li> <code>[dataset_name][resolution]_xy.hdf5</code>, where images and labels are stored for the training set. </li> <li> <code>[dataset_name][resolution]_feats_[feature_extractor]_resnet50.hdf5</code> that contains the instance features for each image. </li> <li> <code>[dataset_name][resolution]_feats_[feature_extractor]_resnet50_nn_k[k_nn].hdf5</code> that contains the list of <code>k_nn</code> neighbors for each of the instance features. </li> </ul> </li> </ol> </br> </details> <details> <summary>How to subsample an instance feature dataset with k-means</summary> <br> To downsample the instance feature vector dataset, after we have prepared the data, we can use the k-means algorithm: <code> python data_utils/store_kmeans_indexes.py --resolution [resolution] --which_dataset [dataset_name] --data_root [data_path] </code> <ul> <li> Adding <code>--gpu</code> allows the faiss library to compute k-means leveraging GPUs, resulting in faster execution. </li> <li> Adding the parameter <code>--feature_extractor [feature_extractor]</code> chooses which feature extractor to use, with <code>feature_extractor</code> in <code>['selfsupervised', 'classification'] </code>, if we are using swAV as feature extactor or the ResNet pretrained on the classification task on ImageNet, respectively. </li> <li> The number of k-means clusters can be set with <code>--kmeans_subsampled [centers]</code>, where <code>centers</code> is an integer. </li> </ul> </br> </details> </div>

How to train the models

BigGAN or StyleGAN2 backbone

Training parameters are stored in JSON files in [backbone_folder]/config_files/[dataset]/*.json, where [backbone_folder] is either BigGAN_Pytorch or stylegan2_ada_pytorch and [dataset] can either be ImageNet, ImageNet-LT or COCO_Stuff.

cd BigGAN_PyTorch
python run.py --json_config config_files/<dataset>/<selected_config>.json --data_root [data_root] --base_root [base_root]

or

cd stylegan_ada_pytorch
python run.py --json_config config_files/<dataset>/<selected_config>.json --data_root [data_root] --base_root [base_root]

where:

Note that one can create other JSON files to modify the training parameters.

Other backbones

To be able to run IC-GAN with other backbones, we provide some orientative steps:

How to test the models

<b>To obtain the FID and IS metrics on ImageNet and ImageNet-LT</b>:

  1. Execute:
python inference/test.py --json_config [BigGAN-PyTorch or stylegan-ada-pytorch]/config_files/<dataset>/<selected_config>.json --num_inception_images [num_imgs] --sample_num_npz [num_imgs] --eval_reference_set [ref_set] --sample_npz --base_root [base_root] --data_root [data_root] --kmeans_subsampled [kmeans_centers] --model_backbone [backbone]

To obtain the tensorflow IS and FID metrics, use an environment with the Python <3.7 and Tensorflow 1.15. Then:

  1. Obtain Inception Scores and pre-computed FID moments:
python ../data_utils/inception_tf13.py --experiment_name [exp_name] --experiment_root [base_root] --kmeans_subsampled [kmeans_centers] 

For stratified FIDs in the ImageNet-LT dataset, the following parameters can be added --which_dataset 'imagenet_lt' --split 'val' --strat_name [stratified_split], where stratified_split can be in [few,low, many].

  1. (Only needed once) Pre-compute reference moments with tensorflow code:
python ../data_utils/inception_tf13.py --use_ground_truth_data --data_root [data_root] --split [ref_set] --resolution [res] --which_dataset [dataset]
  1. (Using this repository) FID can be computed using the pre-computed statistics obtained in 2) and the pre-computed ground-truth statistics obtain in 3). For example, to compute the FID with reference ImageNet validation set: python TTUR/fid.py [base_root]/[exp_name]/TF_pool_.npz [data_root]/imagenet_val_res[res]_tf_inception_moments_ground_truth.npz

<b>To obtain the FID metric on COCO-Stuff</b>:

  1. Obtain ground-truth jpeg images: python data_utils/store_coco_jpeg_images.py --resolution [res] --split [ref_set] --data_root [data_root] --out_path [gt_coco_images] --filter_hd [filter_hd]
  2. Store generated images as jpeg images: python sample.py --json_config ../[BigGAN-PyTorch or stylegan-ada-pytorch]/config_files/<dataset>/<selected_config>.json --data_root [data_root] --base_root [base_root] --sample_num_npz [num_imgs] --which_dataset 'coco' --eval_instance_set [ref_set] --eval_reference_set [ref_set] --filter_hd [filter_hd] --model_backbone [backbone]
  3. Using this repository, compute FID on the two folders of ground-truth and generated images.

where:

Utilities for GAN backbones

We change and provide extra utilities to facilitate the training, for both BigGAN and StyleGAN2 base repositories.

BigGAN change log

The following changes were made:

StyleGAN2 change log

<div id="stylegan-changelog"> <ul> <li> Multi-node DistributedDataParallel training. </li> <li> Added early stopping based on the training FID metric. </li> <li> Automatic checkpointing when jobs are automatically rescheduled on a cluster. </li> <li> Option to load dataset from hdf5 file. </li> <li> Replaced the usage of Click python package by an `ArgumentParser`. </li> <li> Only saving best and last model weights. </li> </ul> </div>

Acknowledgements

We would like to thanks the authors of the Pytorch BigGAN repository and StyleGAN2 Pytorch, as our model requires their repositories to train IC-GAN with BigGAN or StyleGAN2 bakcbone respectively. Moreover, we would like to further thank the authors of generative-evaluation-prdc, data-efficient-gans, faiss and sg2im as some components were borrowed and modified from their code bases. Finally, we thank the author of WanderCLIP as well as the following repositories, that we use in our Colab notebook: pytorch-pretrained-BigGAN and CLIP.

License

The majority of IC-GAN is licensed under CC-BY-NC, however portions of the project are available under separate license terms: BigGAN and PRDC are licensed under the MIT license; COCO-Stuff loader is licensed under Apache License 2.0; DiffAugment is licensed under BSD 2-Clause Simplified license; StyleGAN2 is licensed under a NVIDIA license, available here: https://github.com/NVlabs/stylegan2-ada-pytorch/blob/main/LICENSE.txt. In the Colab notebook, CLIP and pytorch-pretrained-BigGAN code is used, both licensed under the MIT license.

Disclaimers

THE DIFFAUGMENT SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

THE CLIP SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

THE PYTORCH-PRETRAINED-BIGGAN SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Cite the paper

If this repository, the paper or any of its content is useful for your research, please cite:

@inproceedings{casanova2021instanceconditioned,
      title={Instance-Conditioned GAN}, 
      author={Arantxa Casanova and Marlène Careil and Jakob Verbeek and Michal Drozdzal and Adriana Romero-Soriano},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
      year={2021}
}