Home

Awesome

Table of contents
  1. Getting Started
  2. Evaluation Toolbox
  3. Text-to-Image Models
  4. Benchmark Results
  5. Acknowledgments
  6. Contacts

TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation

Tan M. Dinh, Rang Nguyen, Binh-Son Hua<br> VinAI Research, Vietnam

Abstract: In this paper, we conduct a study on the state-of-the-art methods for text-to-image synthesis and propose a framework to evaluate these methods. We consider syntheses where an image contains a single or multiple objects. Our study outlines several issues in the current evaluation pipeline: (i) for image quality assessment, a commonly used metric, e.g., Inception Score (IS), is often either miscalibrated for the single-object case or misused for the multi-object case; (ii) for text relevance and object accuracy assessment, there is an overfitting phenomenon in the existing R-precision (RP) and SOA metrics, respectively; (iii) for multi-object case, many vital factors for evaluation, e.g., object fidelity, positional alignment, counting alignment, are largely dismissed; (iv) the ranking of the methods based on current metrics is highly inconsistent with real images. To overcome these issues, we propose a combined set of existing and new metrics to systematically evaluate the methods. For existing metrics, we offer an improved version of IS named IS* by using temperature scaling to calibrate the confidence of the classifier used by IS; we also propose a solution to mitigate the overfitting issues of RP and SOA. For new metrics, we develop counting alignment, positional alignment, object-centric IS, and object-centric FID metrics for evaluating the multi-object case. We show that benchmark with our bag of metrics results in a highly consistent ranking among existing methods, being well-aligned to human evaluation. As a by-product, we create AttnGAN++, a simple but strong baseline for the benchmark by stabilizing the training of AttnGAN using spectral normalization. We also release our toolbox, so-called TISE, for advocating fair and consistent evaluation of text-to-image synthesis models.

Details of our evaluation framework and benchmark results can be found in our paper:

@inproceedings{dinh2021tise,
    title={TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation},
    author={Tan M. Dinh and Rang Nguyen and Binh-Son Hua},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    year={2022}
}

Please CITE our paper when TISE is used to help produce published results or is incorporated into other software.

Getting Started

Installation

git clone https://github.com/VinAIResearch/tise-toolbox.git
cd tise-toolbox
conda create -p ./envs python=3.7.3
conda activate ./envs
pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git

Pre-trained models

Run the below command to download the necessary pre-trained models:

python download_scripts/download_pretrained_models.py

Data

CUB

Run the below command to download and prepare CUB data:

python download_scripts/download_cub_data.py

MS-COCO

Run the below command to download and prepare MS-COCO (version 2014) data:

python download_scripts/download_ms_coco_metadata.py
sh download_scripts/download_ms_coco_images.sh

Evaluation data

Run the below command to download the necessary evaluation data:

python download_scripts/download_evaluation_data.py

Evaluation Toolbox

Generating images from test captions

The test captions for each set of metrics can be found in the captions folder of each aspect evaluation criteria's subfolder. Please use your text-to-image model to create images from the test captions in these files. We'll go over the structure of evaluation data and how to use it in the following sections.

Image Realism, Text Relevance and Counting Alignment

The test data has the format as below:

[
    ...
    {
        "caption_id": "", 
        "caption": "",    // raw format 
        ...               // other fields, which are not required for image generation
    },
    ...
]

Please use your text-to-image model to generate the image for each item in test data. For each item, the input caption is item['caption'] and the generated image is saved with the name as item['caption_id'].png.

The sample pseudo code for generating images for these aspect metrics is:

import pickle

with open (f'captions/{XXXX}.pkl', 'rb') as f:
    test_data = pickle.load(f)

GENERATED_IMAGE_DIR = f'images/{YOUR_METHOD}'

for item in test_data:
    caption_id = str(item['caption_id'])
    caption = item['caption']
    generated_image = your_text_to_image_model(caption)
    generated_image.save(f'{GENERATED_IMAGE_DIR}/{caption_id}.png')

Please replace XXXX with the name of the appropriate test caption file. These test caption files can be found in the captions folder of each aspect evaluation criteria 's sub-folder.

Semantic Object Accuracy

We follow the structure of the original version of SOA for the SOA test caption data. There are 80 pickle files containing the test captions for each MS-COCO object class. We generate 3 images for each caption in each file.

The sample pseudo code for generating images for SOA is:

import pickle 

with open(label_XX_XX.pkl, "rb") as f:
    label_XX_XX_test_data = pickle.load(f)

GENERATED_IMAGE_DIR = f'images/{YOUR_METHOD}'

for item in label_XX_XX_test_data:
    caption_id = str(item['caption_id'])
    caption = item['caption']
    for idx in range(3):
        generated_image = your_text_to_image_model(caption)
        generated_image.save(f'{GENERATED_IMAGE_DIR}/{label_XX_XX}/{caption_id}_{idx}.png')

Positional Alignment

The test data has the format as below:

{
    "behind" : [
        {
            "caption" : "",     // raw caption
            "caption_id": "",
            ...                 // other fields, which are not required for image generation
        }
        ...
    ],
    "bottom": [ ... ],
    "under" : [ ... ],
    ...
}

The sample pseudo code for generating images for PA is:

import os 
import pickle 

with open("captions/PA_input_captions.pkl", "rb") as f:
    test_data = pickle.load(f)

GENERATED_IMAGE_DIR = f'images/{YOUR_METHOD}'

for positional_word in test_data:
    for item in test_data['positional_word']:
        caption_id = str(item['caption_id'])
        caption = item['caption']
        generated_image = your_text_to_image_model(caption)
        if not os.path.exists(f'{GENERATED_IMAGE_DIR}/{positional_word}'):
            os.makedirs(f'{GENERATED_IMAGE_DIR}/{positional_word}') 
        generated_image.save(f'{GENERATED_IMAGE_DIR}/{positional_word}/{caption_id}.png')

For more reference, please see gen_evaluation_images_coco.sh and gen_evaluation_images_cub.sh about how to generate evaluation images of our AttnGAN++ model.

Single-object Text-To-Image Synthesis (CUB)

1. Image Realism

Move to image_realism metric folder:

cd image_realism

Please update the argument METHOD with the name of your method and run the command below to compute IS* metric.

METHOD=attngan++
GENERATED_IMAGE_DIR=images/cub/"$METHOD"
SAVED_RESULT_PATH=results/IS/cub/"$METHOD".txt
GPU_ID=0

python IS/bird/inception_score_star_bird.py \
--gpu "$GPU_ID" \
--image_folder "$GENERATED_IMAGE_DIR" \
--saved_file "$SAVED_RESULT_PATH"

Please update the argument METHOD with the name of your method and run the command below to compute FID metric.

METHOD=attngan++
GENERATED_IMAGE_DIR=images/cub/"$METHOD"
SAVED_RESULT_PATH=results/FID/cub/"$METHOD".txt
GPU_ID=0

python FID/fid_score.py \
--gpu "$GPU_ID" \
--batch-size 50 \
--path1 "FID/data/bird_val.npz" \
--path2 "$GENERATED_IMAGE_DIR" \
--saved_file "$SAVED_RESULT_PATH"

2. Text Relevance

Move to text_relevance metric folder:

cd text_relevance

Please update the argument METHOD with the name of your method and run the command below to compute RP metric.

METHOD=attngan++
GENERATED_IMAGE_DIR=images/cub/"$METHOD"
SAVED_RESULT_PATH=results/cub/"$METHOD".txt
GPU_ID=0

CUDA_VISIBLE_DEVICES="$GPU_ID" \
python RP_cub.py \
--image_dir "$GENERATED_IMAGE_DIR" \
--saved_file_path "$SAVED_RESULT_PATH"

Multi-object Text-To-Image Synthesis (MS-COCO)

1. Image Realism

Move to image_realism metric folder:

cd image_realism

Please update the argument METHOD with the name of your method and run the command below to compute IS* metric.

METHOD=attngan++
GENERATED_IMAGE_DIR=images/coco/"$METHOD"
SAVED_RESULT_PATH=results/IS/coco/"$METHOD".txt
GPU_ID=0

python IS/coco/inception_score_star_coco.py \
--gpu "$GPU_ID" \
--image_folder "$GENERATED_IMAGE_DIR" \
--saved_file "$SAVED_RESULT_PATH"

Please update the argument METHOD with the name of your method and run the command below to compute FID metric.

METHOD=attngan++
GENERATED_IMAGE_DIR=images/coco/"$METHOD"
SAVED_RESULT_PATH=results/FID/coco/"$METHOD".txt
GPU_ID=0

python FID/fid_score.py \
--gpu "$GPU_ID" \
--batch-size 50 \
--path1 "FID/data/coco_val.npz" \
--path2 "$GENERATED_IMAGE_DIR" \
--saved_file "$SAVED_RESULT_PATH" 

2. Object Fidelity

Move to object_fidelity metric folder:

cd object_fidelity

We leverage the generated images from Image Realism evaluation for accessing Object Fidelity. Hence, you need to evaluate Image Realism first or following the Image Realism's instruction to generate the test images. Then, please run the command below to crop objects.

METHOD=attngan++
GENERATED_IMAGE_DIR=../image_realism/images/coco/"$METHOD"
SAVED_CROPPED_OBJECTS_DIR=cropped_objects/"$METHOD"
GPU_ID=0

CUDA_VISIBLE_DEVICES="$GPU_ID" \
python crop_object.py \
--source_image_dir "$GENERATED_IMAGE_DIR" \
--saved_cropped_object_dir "$SAVED_CROPPED_OBJECTS_DIR"

Please update the argument METHOD with the name of your method and run the command below to compute O-IS metric.

METHOD=attngan++
CROPPED_OBJECTS_DIR=cropped_objects/"$METHOD"
SAVED_RESULT_PATH=results/O-IS/"$METHOD".txt
GPU_ID=0

python O-IS/object_centric_inception_score.py \
--gpu_id "$GPU_ID" \
--image_dir "$CROPPED_OBJECTS_DIR" \
--saved_file "$SAVED_RESULT_PATH" 

Please update the argument METHOD with the name of your method and run the command below to compute O-FID metric.

METHOD=attngan++
CROPPED_OBJECTS_DIR=cropped_objects/"$METHOD"
SAVED_RESULT_PATH=results/O-FID/"$METHOD".txt
GPU_ID=0

python O-FID/fid_score.py \
--gpu "$GPU_ID" \
--batch-size 50 \
--path1 "O-FID/data/cropped_object_coco.npz" \
--path2 "$CROPPED_OBJECTS_DIR" \
--saved_file "$SAVED_RESULT_PATH" 

3. Text Relevance

Move to text_relevance metric folder:

cd text_relevance

Please update the argument METHOD with the name of your method and run the command below to compute RP metric.

METHOD=attngan++
GENERATED_IMAGE_DIR=images/coco/"$METHOD"
SAVED_RESULT_PATH=results/coco/"$METHOD".txt
GPU_ID=0

python RP_coco.py \
--image_dir="$GENERATED_IMAGE_DIR" \
--saved_file_path="$SAVED_RESULT_PATH" 

4. Positional Alignment

Move to positional_alignment metric folder:

cd positional_alignment

Please update the argument METHOD with the name of your method and run the command below to compute PA metric.

METHOD=attngan++
GENERATED_IMAGE_DIR=images/"$METHOD"
SAVED_RESULT_PATH=results/"$METHOD".txt
GPU_ID=0

CUDA_VISIBLE_DEVICES="$GPU_ID" \
python PA.py \
--image_dir="$GENERATED_IMAGE_DIR" \
--saved_file_path="$SAVED_RESULT_PATH" 

5. Counting Alignment

Move to counting_alignment metric folder:

cd counting_alignment

Please update the argument METHOD with the name of your method and run the command below to compute CA metric.

METHOD=attngan++
GENERATED_IMAGE_DIR=images/"$METHOD"
SAVED_RESULT_PATH=results/"$METHOD".txt
GPU_ID=0

python CA.py \
--gpu_id="$GPU_ID" \
--image_dir="$GENERATED_IMAGE_DIR" \
--result_file="$SAVED_RESULT_PATH" 

6. Semantic Object Accuracy

Move to semantic_object_accuracy metric folder:

cd semantic_object_accuracy

Please update the argument METHOD with the name of your method and run the command below to compute SOA metric.

METHOD=attngan++
GENERATED_IMAGE_DIR=images/"$METHOD"
DETECTED_RESULTS_DIR=detected_results/"$METHOD"
SAVED_RESULT_PATH=results/"$METHOD".txt
GPU_ID=0

CUDA_VISIBLE_DEVICES="$GPU_ID" \
python SOA.py \
--images="$GENERATED_IMAGE_DIR" \
--detected_results="$DETECTED_RESULTS_DIR" \
--saved_file="$SAVED_RESULT_PATH"

7. Ranking Score

{ "FID": "", "IS*": "", "O-IS": "", "O-FID": "", "CA": "", "PA": "", "SOA-I": "", "SOA-C": "", "RP": ""}
python ranking_score.py

Text-to-Image Models

Below is a list of related text-to-image generation models we use in our benchmark with their codes.

Benchmark Results

Single-object Text-To-Image Synthesis (CUB)

<details><summary>CLICK TO VIEW</summary>
MethodIS*FIDRP
GAN-INT-CLS7.51194.413.83
StackGAN++12.6927.4013.57
AttnGAN13.6324.2765.30
AttnGAN + CL14.4217.9660.82
DM-GAN15.0015.5276.25
DM-GAN + CL15.0814.5769.80
DF-GAN14.7016.4642.95
AttnGAN++15.1315.0177.31
</details>

Multi-object Text-To-Image Synthesis (MS-COCO)

<details><summary>CLICK TO VIEW</summary>
MethodIS*FIDRPSOA-CSOA-IO-ISO-FIDCAPARS
GAN-CLS8.1192.09105.315.712.4651.132.5132.797
StackGAN15.553.449.19.249.93.3629.092.4134.3311.5
AttnGAN33.7936.950.5647.1349.785.0420.921.8240.0829
DM-GAN45.6328.9666.9855.7758.115.2217.481.7142.8341
CPGAN59.6450.6869.0881.8683.836.3820.072.0743.2843
DF-GAN30.4521.0542.4437.8540.195.1214.391.9640.3931.5
AttnGAN + CL36.8526.9357.5247.4549.334.9219.921.7243.9237
DM-GAN + CL46.6122.670.3658.6861.055.0915.51.6649.0651.5
DALLE-Mini (zero-shot)19.8262.948.7226.6427.94.123.832.3147.3923.5
AttnGAN++54.6326.5872.4867.8369.976.0115.431.5747.7556
Real-Images51.252.6283.5490.0291.198.6301.0510065
</details>

Acknowledgments

Our code borrowed some parts of the official repositories of text-to-image models, which are used in our benchmark. Thank you so much to the authors for their efforts to release source code and pre-trained weights.

Contacts

If you have any questions, please drop an email to tan.m.dinh.vn@gmail.com or open an issue in this repository.