Home

Awesome

Visitors License CC BY-NC-SA 4.0 Python 3.9 Packagist hardware Last Commit Maintenance Ask Me Anything !

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis (CVPR 2023)

<p align="center"> <img src="logo.jpeg" width="500px"/> </p>

A high-quality, fast, and efficient text-to-image model

Official Pytorch implementation for our paper GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis by Ming Tao, Bing-Kun Bao, Hao Tang, Changsheng Xu.

<p align="center"> <b>Generated Images </p> <p align="center"> <img src="results.jpg"/> </p>

Requirements

GALIP is a small and fast generative model which can generate multiple pictures in one second even on the CPU.

Installation

Clone this repo.

git clone https://github.com/tobran/GALIP
pip install -r requirements.txt

Install CLIP

Preparation (Same as DF-GAN)

Datasets

  1. Download the preprocessed metadata for birds coco and extract them to data/
  2. Download the birds image data. Extract them to data/birds/
  3. Download coco2014 dataset and extract the images to data/coco/images/

Training

cd GALIP/code/

Train the GALIP model

Resume training process

If your training process is interrupted unexpectedly, set state_epoch, log_dir, and pretrained_model_path in train.sh to resume training.

TensorBoard

Our code supports automate FID evaluation during training, the results are stored in TensorBoard files under ./logs. You can change the test interval by changing test_interval in the YAML file.

Evaluation

Download Pretrained Model

Evaluate GALIP models

cd GALIP/code/

set pretrained_model in test.sh

Performance

The released model achieves better performance than the paper version.

ModelCOCO-FID↓COCO-CS↑CC12M-ZFID↓
GALIP(paper)5.850.333812.54
GALIP(released)5.010.337912.54

Sampling

Synthesize images from your text descriptions


Citing GALIP

If you find GALIP useful in your research, please consider citing our paper:


@inproceedings{tao2023galip,
  title={GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis},
  author={Tao, Ming and Bao, Bing-Kun and Tang, Hao and Xu, Changsheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={14214--14223},
  year={2023}
}

The code is released for academic research use only. For commercial use, please contact Ming Tao (陶明) (mingtao2000@126.com).

Reference