Awesome

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

This is the official implementation for our ICCV-2023 paper

"HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models"

Eslam Abdelrahman, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, and Mohamed Elhoseiny

:satellite: :book: :clapper:

:loudspeaker: News

(July 13, 2023): The paper is accepted at ICCV-2023.
(April 11, 2023): The paper is published on arxiv.

:books: Synopsis

Holistic skills evaluation. Rather than focus on isolated metrics such as accuracy, we measure 13 skills, which could be categorized into five critical skills; accuracy, robustness, generalization, fairness, and bias.

<p align="center"> <img src="Figures/skills_metric.png" width="75%"/> </p>

Broad scenarios coverage. HRS-Bench covers 50 applications, e.g., fashion, animals, transportation, food, and clothes.

<p align="center"> <img src="Figures/pie_chart.png" width="75%"/> </p>

Standardization. We propose a unified benchmark, where we fairly evaluate the existing models across a wide range of metrics.

<p align="center"> <img src="Figures/AC_metric.png" width="75%"/> </p>

Holistic prompts generation.

<p align="center"> <img src="Figures/prompt_gen_horizontal_3.png" width="75%"/> </p>

:pushpin: Covered Models

:fire: Qualitative results

<p align="center"> <img src="Figures/qualitative_results.png" width="75%"/> </p>

:pushpin: Prerequisites

Python >= 3.7
Pytorch >= 1.7.0
Install other common packages (numpy, pytorch_transformers, etc.)

:pushpin: Data

:point_right: HRS-Bench:

First, download our prompts that covers the 13 skills from here.
Each skill has its own CSV file that contains the prompt and the GT that will be used during the evaluation phase.

:point_right: Prompt Generation:

You don't need to run the prompts generation codes as we already provide the generated prompts and can be downloaded from this link.

However, we provide also all the generation codes.

:pushpin: Evaluation

Follow the detailed instructions mentioned in the README file. to be able to run all our eval scripts for the whole skills.

:bouquet: Credits

The project is inspired from the great language benchmark HELM.

:telephone: Contact us

eslam.abdelrahman@kaust.edu.sa

:mailbox_with_mail: Citation

Please consider citing our paper if you find it useful.

@misc{2304.05390,
Author = {Eslam Mohamed Bakr and Pengzhan Sun and Xiaoqian Shen and Faizan Farooq Khan and Li Erran Li and Mohamed Elhoseiny},
Title = {HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models},
Year = {2023},
Eprint = {arXiv:2304.05390},
}