Home

Awesome

This project aims to unify the evaluation of generative text-to-image models and provide the ability to quickly and easily calculate most popular metrics.

Goals of this benchmark:

Table of Contents

Introduction

Generative text-to-image models have become a popular and widely used tool for users. There are many articles on the topic of image generation from text that present new, more advanced models. However, there is still no uniform way to measure the quality of such models. To address this issue, we provide an implementation of metrics and a dataset to compare the quality of generative models.

We propose to use the metric MS-COCO FID-30K with OpenAI's CLIP score, which has already become a standard for measuring the quality of text2image models. We provide the MS-COCO validation subset and precalculated metrics for it. We also recorded 30,000 descriptions that needs to be used to generate images for MS-COCO FID-30K.

You can easily contribute your model into benchmark and make FID results reproducible! See more in contribution section.

Main features

Installation

pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/boomb0om/text2image-benchmark

Getting started

Metrics: FID

Calculate FID for two sets of images:

from T2IBenchmark import calculate_fid

fid, _ = calculate_fid('assets/images/cats/', 'assets/images/dogs/')
print(fid)

Calculate FID between model generations and MS-COCO validation subset:

from T2IBenchmark import calculate_fid
from T2IBenchmark.datasets import get_coco_fid_stats

fid, _ = calculate_fid(
    'path/to/your/generations/',
    get_coco_fid_stats()
)

MS-COCO FID-30k for T2IModelWrapper. In this example we are using Kandinsky 2.1 model:

pip install -r T2IBenchmark/models/kandinsky21/requirements.txt
from T2IBenchmark import calculate_coco_fid
from T2IBenchmark.models.kandinsky21 import Kandinsky21Wrapper

fid, fid_data = calculate_coco_fid(
    Kandinsky21Wrapper,
    device='cuda:0',
    save_generations_dir='coco_generations/'
)

Metrics: CLIP-score

Example of calculating CLIP-score for a set of images and fixed prompt:

from T2IBenchmark import calculate_clip_score
from glob import glob

cat_paths = glob('assets/images/cats/*.jpg')
captions_mapping = {path: "a cat" for path in cat_paths}
clip_score = calculate_clip_score(cat_paths, captions_mapping=captions_mapping)

Project Structure

Examples

Examples of use are listed below in recommended order for study:

Documentation

Contribution

If you want to contribute your model into this benchmark and publish metrics, follow these steps:

  1. Create a fork of this repository
  2. Create a wrapper for your model that inherits T2IModelWrapper class
  3. Generate images and calculate metrics using calculate_coco_fid. For more information see this example
  4. Create a pull request with your model
  5. Congrats!

TO-DO

Contacts

Authors:

If you have any question, please email jeartgle@gmail.com.

Citing

If you use this repository in your research, consider citing it using the following Bibtex entry:

@misc{boomb0omT2IBenchmark,
  author={Pavlov, I. and Ivanov, A. and Stafievskiy, S.},
  title={{Text-to-Image Benchmark: A benchmark for generative models}},
  howpublished={\url{https://github.com/boomb0om/text2image-benchmark}},
  month={September},
  year={2023},
  note={Version 0.1.0},
}

Acknowledgments

Thanks to: