Home

Awesome

A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics

Xiangru Zhu<sup>1</sup>, Penglei Sun<sup>2</sup>, Chengyu Wang<sup>3</sup>, Jingping Liu<sup>4</sup>, Zhixu Li<sup>1</sup>, Yanghua Xiao<sup>1</sup>, Jun Huang<sup>3</sup>

<sup>1</sup>Fudan University, <sup>2</sup>The Hong Kong University of Science and Technology (Guangzhou), <sup>3</sup>Alibaba Group, <sup>4</sup>East China University of Science and Technology

Paper

<p align="center"> <img src="https://github.com/zhuxiangru/Winoground-T2I/blob/main/figures/figure1.png" alt="Failed cases on Stable Diffusion XL 1.0" width="600" title="Failed cases on Stable Diffusion XL 1.0" /> </p> <!-- ![Failed cases on Stable Diffusion XL 1.0](https://github.com/zhuxiangru/Winoground-T2I/blob/main/figures/figure1.png) --> <!-- ![The pipeline of data collection, quality control and labeling](https://github.com/zhuxiangru/Winoground-T2I/blob/main/figures/figure2.png) --> <!-- ![Statistics of categories](https://github.com/zhuxiangru/Winoground-T2I/blob/main/figures/figure3.png) -->

Evaluation results from SDXL and IF

Updates

Dataset

Winoground-T2I Dataset: data/dataset/

Templates: data/template/

Acknowledgments

We makes use of several T2I fidelity metrics to evaluate T2I synthesis models.