Awesome
Awesome Evaluation of Visual Generation
This repository collects methods for evaluating visual generation.
Overview
What You'll Find Here
Within this repository, we collect works that aim to answer some critical questions in the field of evaluating visual generation, such as:
- Model Evaluation: How does one determine the quality of a specific image or video generation model?
- Sample/Content Evaluation: What methods can be used to evaluate the quality of a particular generated image or video?
- User Control Consistency Evaluation: How to tell how well the generated images and videos align with the user controls or inputs?
Updates
This repository is updated periodically. If you have suggestions for additional resources, updates on methodologies, or fixes for expiring links, please feel free to do any of the following:
- raise an Issue,
- nominate awesome related works with Pull Requests,
- We are also contactable via email (
ZIQI002 at e dot ntu dot edu dot sg
).
Table of Contents
- 1. Evaluation Metrics of Generative Models
- 2. Evaluation Metrics of Condition Consistency
- 3. Evaluation Systems of Generative Models
- 3.1. Evaluation of Unconditional Image Generation
- 3.2. Evaluation of Text-to-Image Generation
- 3.3. Evaluation of Text-Based Image Editing
- 3.4. Evaluation of Neural Style Transfer
- 3.5. Evaluation of Video Generation
- 3.6. Evaluation of Text-to-Motion Generation
- 3.7. Evaluation of Model Trustworthiness
- 3.8. Evaluation of Entity Relation
- 4. Improving Visual Generation with Evaluation / Feedback / Reward
- 5. Quality Assessment for AIGC
- 6. Study and Rethinking
- 7. Other Useful Resources
<a name="1."></a>
1. Evaluation Metrics of Generative Models
<a name="1.1."></a>
1.1. Evaluation Metrics of Image Generation
Metric | Paper | Code |
---|---|---|
Inception Score (IS) | Improved Techniques for Training GANs (NeurIPS 2016) | |
Fréchet Inception Distance (FID) | GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (NeurIPS 2017) | |
Kernel Inception Distance (KID) | Demystifying MMD GANs (ICLR 2018) | |
CLIP-FID | The Role of ImageNet Classes in Fréchet Inception Distance (ICLR 2023) | |
Precision-and-Recall | Assessing Generative Models via Precision and Recall (2018-05-31, NeurIPS 2018) <br> Improved Precision and Recall Metric for Assessing Generative Models (NeurIPS 2019) | |
Renyi Kernel Entropy (RKE) | An Information-Theoretic Evaluation of Generative Models in Learning Multi-modal Distributions (NeurIPS 2023) | |
CLIP Maximum Mean Discrepancy (CMMD) | Rethinking FID: Towards a Better Evaluation Metric for Image Generation (CVPR 2024) |
-
Towards a Scalable Reference-Free Evaluation of Generative Models (2024-07-03)
-
Fine-tuning Diffusion Models for Enhancing Face Quality in Text-to-image Generation (2024-06-24)
<i>Note: Face Score introduced</i>
-
Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images (2024-05-15)
-
Unifying and extending Precision Recall metrics for assessing generative models (2024-05-02)
-
Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder (2024-03-08)
<i>Note: Fréchet Denoised Distance introduced</i>
-
Virtual Classifier Error (VCE) from Virtual Classifier: A Reversed Approach for Robust Image Evaluation (2024-03-04)
-
An Interpretable Evaluation of Entropy-based Novelty of Generative Models (2024-02-27)
-
Semantic Shift Rate from Discovering Universal Semantic Triggers for Text-to-Image Synthesis (2024-02-12)
-
Optimizing Prompts Using In-Context Few-Shot Learning for Text-to-Image Generative Models (2024-01-01)
<i>Note: Quality Loss introduced</i>
-
Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation (2023-12-23)
-
Attribute Based Interpretable Evaluation Metrics for Generative Models (2023-10-26)
-
On quantifying and improving realism of images generated with diffusion (2023-09-26)
<i>Note: Image Realism Score introduced</i>
-
Probabilistic Precision and Recall Towards Reliable Evaluation of Generative Models (2023-09-04)
<i>Note: P-precision and P-recall introduced</i>
-
Learning to Evaluate the Artness of AI-generated Images (2023-05-08)
<i>Note: ArtScore, metric for images resembling authentic artworks by artists</i>
-
Training-Free Location-Aware Text-to-Image Synthesis (2023-04-26)
<i>Note: New evaluation metric for control capability of location aware generation task</i>
-
Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples (2023-02-09)
-
LGSQE: Lightweight Generated Sample Quality Evaluatoin (2022-11-08)
-
SSD: Towards Better Text-Image Consistency Metric in Text-to-Image Generation (2022-10-27)
<i>Note: Semantic Similarity Distance introduced</i>
-
Layout-Bridging Text-to-Image Synthesis (2022-08-12)
<i>Note: Layout Quality Score (LQS), new metric for evaluating the generated layout</i>
-
Rarity Score: A New Metric to Evaluate the Uncommonness of Synthesized Images (2022-06-17)
-
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models (2022-05-25)
<i>Note: evaluates text to image and utilizes vision language models (VLM)</i>
-
TREND: Truncated Generalized Normal Density Estimation of Inception Embeddings for GAN Evaluation (2021-04-30, ECCV 2022)
-
CFID from Conditional Frechet Inception Distance (2021-03-21)
-
On Self-Supervised Image Representations for GAN Evaluation (2021-01-12)
<i>Note: SwAV, self-supervised image representation model</i>
-
Random Network Distillation as a Diversity Metric for Both Image and Text Generation (2020-10-13)
<i>Note: RND metric introduced</i>
-
The Vendi Score: A Diversity Evaluation Metric for Machine Learning (2022-10-05)
-
CIS from Evaluation Metrics for Conditional Image Generation (2020-04-26)
-
Text-To-Image Synthesis Method Evaluation Based On Visual Patterns (2020-04-09)
-
Cscore: A Novel No-Reference Evaluation Metric for Generated Images (2020-03-25)
-
SceneFID from Object-Centric Image Generation from Layouts (2020-03-16)
-
Reliable Fidelity and Diversity Metrics for Generative Models (2020-02-23, ICML 2020)
-
Effectively Unbiased FID and Inception Score and where to find them (2019-11-16, CVPR 2020)
-
On the Evaluation of Conditional GANs (2019-07-11)
<i>Note:Fréchet Joint Distance (FJD), which is able to assess image quality, conditional consistency, and intra-conditioning diversity within a single metric.</i>
-
Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality (2019-05-02)
<i>CrossLID, assesses the local intrinsic dimensionality </i>
-
A domain agnostic measure for monitoring and evaluating GANs (2018-11-13)
-
Learning to Generate Images with Perceptual Similarity Metrics (2015-11-19)
<i>Multiscale structural-similarity score introduced</i>
<a name="1.2."></a>
1.2. Evaluation Metrics of Video Generation
Metric | Paper | Code |
---|---|---|
FID-vid | GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (NeurIPS 2017) | |
Fréchet Video Distance (FVD) | Towards Accurate Generative Models of Video: A New Metric & Challenges (arXiv 2018) <br> FVD: A new Metric for Video Generation (2019-05-04) <i> (Note: ICLR 2019 Workshop DeepGenStruct Program Chairs)</i> |
<a name="1.3."></a>
1.3. Evaluation Metrics for Latent Representation
- Linear Separability & Perceptual Path Length (PPL) from A Style-Based Generator Architecture for Generative Adversarial Networks (2020-01-09)
<a name="2."></a>
2. Evaluation Metrics of Condition Consistency
<a name="2.1."></a>
2.1 Evaluation Metrics of Multi-Modal Condition Consistency
Metric | Condition | Pipeline | Code | References |
---|---|---|---|---|
CLIP Score (a.k.a. CLIPSIM) | Text | cosine similarity between the CLIP image and text embeddings | PyTorch Lightning | CLIP Paper (ICML 2021). Metrics first used in CLIPScore Paper (arXiv 2021) and GODIVA Paper (arXiv 2021) applies it in video evaluation. |
Mask Accuracy | Segmentation Mask | predict the segmentatio mask, and compute pixel-wise accuracy against the ground-truth segmentation mask | any segmentation method for your setting | |
DINO Similarity | Image of a Subject (human / object etc) | cosine similarity between the DINO embeddings of the generated image and the condition image | DINO paper. Metric is proposed in DreamBooth. |
-
Manipulation Direction (MD) from Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities (2023-11-20)
-
Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation (2022-12-02)
-
On the Evaluation of Conditional GANs (2019-07-11)
<i>Note: Fréchet Joint Distance (FJD), which is able to assess image quality, conditional consistency, and intra-conditioning diversity within a single metric.</i>
-
Classification Accuracy Score for Conditional Generative Models (2019-05-26)
<i>Note: New metric Classification Accuracy Score (CAS)</i>
-
Visual-Semantic (VS) Similarity from Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network (2018-12-26)
-
Semantically Invariant Text-to-Image Generation (2018-09-06)
<i>Note: They evaluate image-text similarity via image captioning</i>
-
Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis (2018-01-16)
<i>Note: An object detector based metric is proposed.</i>
<a name="2.2."></a>
2.2. Evaluation Metrics of Image Similarity
Metrics | Paper | Code |
---|---|---|
Learned Perceptual Image Patch Similarity (LPIPS) | The Unreasonable Effectiveness of Deep Features as a Perceptual Metric (2018-01-11) (CVPR 2018) | |
Structural Similarity Index (SSIM) | Image quality assessment: from error visibility to structural similarity (TIP 2004) | |
Peak Signal-to-Noise Ratio (PSNR) | - | |
Multi-Scale Structural Similarity Index (MS-SSIM) | Multiscale structural similarity for image quality assessment (SSC 2004) | PyTorch-Metrics |
Feature Similarity Index (FSIM) | FSIM: A Feature Similarity Index for Image Quality Assessment (TIP 2011) |
The community has also been using DINO or CLIP features to measure the semantic similarity of two images / frames.
There are also recent works on new methods to measure visual similarity (more will be added):
<a name="3."></a>
3. Evaluation Systems of Generative Models
<a name="3.1."></a>
3.1. Evaluation of Unconditional Image Generation
-
A Lightweight Generalizable Evaluation and Enhancement Framework for Generative Models and Generated Samples (2024-04-16)
-
Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability (2023-12-17, CVPR 2024)
-
Using Skew to Assess the Quality of GAN-generated Image Features (2023-10-31)
<i>Note: Skew Inception Distance introduced</i>
-
StudioGAN: A Taxonomy and Benchmark of GANs for Image Synthesis (2022-06-19)
-
HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models (2019-04-01)
-
An Improved Evaluation Framework for Generative Adversarial Networks (2018-03-20)
<i>Note: Class-Aware Frechet Distance introduced</i>
<a name="3.2."></a>
3.2. Evaluation of Text-to-Image Generation
-
ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images (2024-09-18)
-
Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation (2024-09-18)
-
Beyond Aesthetics: Cultural Competence in Text-to-Image Models (2024-07-09)
<i>Note: CUBE benchmark introduced</i>
-
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? (2024-07-05)
<i>Note: MJ-Bench introduced</i>
-
MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis (2024-07-02)
<i>Note: Benchmark COCO-MIG and Multimodal-MIG introduced</i>
-
Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models (2024-06-28)
-
EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations (2024-06-24)
-
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation (2024-06-24)
-
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models (2024-06-21)
-
Evaluating Numerical Reasoning in Text-to-Image Models (2024-06-20)
<i>Note: GeckoNum introduced</i>
-
Holistic Evaluation for Interleaved Text-and-Image Generation (2024-06-20)
<i>Note: InterleavedBench and InterleavedEval metric introduced</i>
-
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation (2024-06-19)
-
Decomposed evaluations of geographic disparities in text-to-image models (2024-06-17)
<i>Note: new metric Decomposed Indicators of Disparities introduced</i>
-
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models (2024-06-17)
<i>Note: PhyBench introduced</i>
-
Make It Count: Text-to-Image Generation with an Accurate Number of Objects (2024-06-14)
-
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? (2024-06-11)
<i>Note: Commonsense-T2I, benchmark for real-life commonsense reasoning capabilities of T2I models</i>
-
Unified Text-to-Image Generation and Retrieval (2024-06-09)
<i>Note: TIGeR-Bench, benchmark for evaluation of unified text-to-image generation and retrieval.</i>
-
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction (2024-06-07)
-
GenAI Arena: An Open Evaluation Platform for Generative Models (2024-06-06)
-
A-Bench: Are LMMs Masters at Evaluating AI-generated Images? (2024-06-05)
-
Multidimensional Preference Score from Learning Multi-dimensional Human Preference for Text-to-Image Generation (2024-05-23)
-
Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models (2024-05-20)
<i>Note: NewEpisode benchmark introduced</i>
-
Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation (2024-05-11)
<i>Note: GroundingScore metric introduced</i>
-
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation (2024-04-29)
<i>Note: consistent score r introduced</i>
-
Exposing Text-Image Inconsistency Using Diffusion Models (2024-04-28)
-
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings (2024-04-25)
-
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation (2024-04-23)
-
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting (2024-04-22)
<i>Note: Latent Fisher divergence and Wasserstein metric introduced</i>
-
TAVGBench: Benchmarking Text to Audible-Video Generation (2024-04-22)
-
Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control (2024-04-21)
-
Magic Clothing: Controllable Garment-Driven Image Synthesis (2024-04-15)
<i>Note: new metric Matched-Points-LPIPS introduced</i>
-
GenAI-Bench: A Holistic Benchmark for Compositional Text-to-Visual Generation (2024-04-09)
<i>Note: GenAI-Bench was introduced in a previous paper 'Evaluating Text-to-Visual Generation with Image-to-Text Generation'</i>
-
Detect-and-Compare from Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models (2024-04-05)
-
Enhancing Text-to-Image Model Evaluation: SVCS and UCICM (2024-04-02)
<i>Note: Evaluation metrics: Semantic Visual Consistency Score and User-Centric Image Coherence Metric </i>
-
Evaluating Text-to-Visual Generation with Image-to-Text Generation (2024-04-01)
-
Measuring Style Similarity in Diffusion Models (2024-04-01)
-
AAPMT: AGI Assessment Through Prompt and Metric Transformer (2024-03-28)
-
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models (2024-03-25)
-
Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation (2024-03-25)
<i>Note: LenCom-Eval introduced</i>
-
Exploring GPT-4 Vision for Text-to-Image Synthesis Evaluation (2024-03-20)
-
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation (2024-03-13)
<i>Note: DialogBen introduced</i>
-
Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis (2024-03-08)
-
An Information-Theoretic Evaluation of Generative Models in Learning Multi-modal Distributions (2024-02-13)
-
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis (2024-02-08)
<i>Note: COCO-MIG benchmark introduced</i>
-
CAS: A Probability-Based Approach for Universal Condition Alignment Score (2024-01-16)
<i>Note: Condition alignment of text-to-image, {instruction, image}-to-image, edge-/scribble-to-image, and text-to-audio</i>
-
EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models (2024-01-09)
<i>Note: emotion accuracy, semantic clarity and semantic diversity are not core contributions of this paper</i>
-
VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation (2023-12-22)
-
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models (2023-12-21)
<i>Note: AnimateBench, benchmark for comparisons in the field of personalized image animation</i>
-
Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods (2023-12-11)
-
A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics (2023-12-04)
-
The Challenges of Image Generation Models in Generating Multi-Component Images (2023-11-22)
-
SelfEval: Leveraging the discriminative nature of generative models for evaluation (2023-11-17)
-
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks (2023-11-02)
-
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation (2023-10-27, ICLR 2024)
-
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design (2023-10-23)
-
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment (2023-10-17)
-
Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy (2023-10-13)
-
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing (2023-10-12)
<i> Note: New Metric: Editing Success Rate </i>
-
ImagenHub: Standardizing the evaluation of conditional image generation models (2023-10-02)
-
Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation (2023-09-26, ICLR 2024)
-
Concept Score from Text-to-Image Generation for Abstract Concepts (2023-09-26)
-
OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation (2023-09-23) GenAI-Arena
<i>Note: Evaluates task of image and text generation</i>
-
Progressive Text-to-Image Diffusion with Soft Latent Direction (2023-09-18)
<i>Note: Benchmark for text-to-image generation tasks</i>
-
AltDiffusion: A Multilingual Text-to-Image Diffusion Model (2023-08-19, AAAI 2024)
<i>Note: Benchmark with focus on multilingual generation aspect</i>
-
LEICA from Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment (2023-08-16)
-
Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation (2023-07-18)
-
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation (2023-07-12)
-
TIAM -- A Metric for Evaluating Alignment in Text-to-Image Generation (2023-07-11, WACV 2024)
-
Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback (2023-07-10, NeurIPS 2023)
-
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis (2023-06-15)
-
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models (2023-06-07, AAAI 2024)
-
Visual Programming for Text-to-Image Generation and Evaluation (2023-05-24, NeurIPS 2023)
-
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation (2023-05-18, NeurIPS 2023)
-
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models (2023-05-18)
-
What You See is What You Read? Improving Text-Image Alignment Evaluation (2023-05-17, NeurIPS 2023)
-
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation (2023-05-02)
-
Analysis of Appeal for Realistic AI-Generated Photos (2023-04-17)
-
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation (2023-04-13)
-
HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models (2023-04-11, ICCV 2023)
-
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference (2023-03-25, ICCV 2023)
-
A study of the evaluation metrics for generative images containing combinational creativity (2023-03-23)
<i>Note: Consensual Assessment Technique and Turing Test used in T2I evaluation</i>
-
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering (2023-03-21, ICCV 2023)
-
Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics (2023-02-09)
<i>Note: an evaluation approach for early stopping criterion in T2I customization</i>
-
Benchmarking Spatial Relationships in Text-to-Image Generation (2022-12-20)
-
MMI and MOR from from Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift (2022-12-15)
-
TeTIm-Eval: a novel curated evaluation data set for comparing text-to-image models (2022-12-15)
-
Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark (2022-11-22)
-
UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance (2022-10-28)
<i>Note: UniBench, benchmark contains prompts for simple-scene images and complex-scene images in Chinese and English </i>
-
Re-Imagen: Retrieval-Augmented Text-to-Image Generator (2022-09-29)
<i>Note: EntityDrawBench, benchmark to evaluates image generation for diverse entities</i>
-
Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks (2022-08-20)
<i>Note: new metric, Vision-Language Matching Score (VLMS)</i>
-
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation (2022-06-22)
-
GR-GAN: Gradual Refinement Text-to-image Generation (2022-05-23)
<i>Note: new metric Cross-Model Distance introduced </i>
-
DrawBench from Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (2022-05-23)
-
StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis (2022-03-29, CVPR 2024)
<i>Note: Evaluation metric for compositionality of T2I models</i>
-
Benchmark for Compositional Text-to-Image Synthesis (2021-07-29)
-
TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation (2021-12-02, ECCV 2022)
-
Improving Generation and Evaluation of Visual Stories via Semantic Consistency (2021-05-20)
-
Leveraging Visual Question Answering to Improve Text-to-Image Synthesis (2020-10-28)
-
Image Synthesis from Locally Related Texts (2020-06-08)
<i>Note: VQA accuracy as a new evaluation metric.</i>
-
Semantic Object Accuracy for Generative Text-to-Image Synthesis (2019-10-29)
<i>Note: new evaluation metric, Semantic Object Accuracy (SOA)</i>
<a name="3.3."></a>
3.3. Evaluation of Text-Based Image Editing
-
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations (2024-07-03)
<i>Note: AURORA-Bench introduced</i>
-
GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization (2024-06-24)
-
MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models (2024-06-03)
<i>Note: PIE-Bench++, evaluating image-editing tasks involving multiple objects and attributes</i>
-
DiffUHaul: A Training-Free Method for Object Dragging in Images (2024-06-03)
<i>Note: foreground similarity, object traces and realism metric introduced</i>
-
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing (2024-04-15)
-
FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing (2024-03-27)
<i>Note: novel automatic mask-based evaluation metric tailored to various object-centric editing scenarios</i>
-
TransformationOriented Paired Benchmark from InstructBrush: Learning Attention-based Instruction Optimization for Image Editing (2024-03-27)
-
ImageNet Concept Editing Benchmark from Editing Massive Concepts in Text-to-Image Diffusion Models (2024-03-20)
-
Editing Massive Concepts in Text-to-Image Diffusion Models (2024-03-20)
<i>Note: ImageNet Concept Editing Benchmark (ICEB), for evaluating massive concept editing for T2I models</i>
-
Make Me Happier: Evoking Emotions Through Image Diffusion Models (2024-03-13)
<i>Note: EMR, ESR, ENRD, ESS metric introduced</i>
-
Diffusion Model-Based Image Editing: A Survey (2024-02-27)
<i>Note: EditEval, benchmark for text-guided image editing and LLM Score</i>
-
Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks (2024-01-15, AAAI 2024)
<i>Note: Editing-Mask, new benchmark to examine the mask accuracy and local editing ability</i>
-
RotationDrag: Point-based Image Editing with Rotated Diffusion Features (2024-01-12)
<i>Note: RotationBench introduced</i>
-
LEDITS++: Limitless Image Editing using Text-to-Image Models (2023-11-28)
<i>Note: TEdBench++, revised benchmark of TEdBench</i>
-
Emu Edit: Precise Image Editing via Recognition and Generation Tasks (2023-11-16)
-
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods (2023-10-03)
-
PIE-Bench from Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code (2023-10-02)
-
Iterative Multi-granular Image Editing using Diffusion Models (2023-09-01)
-
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing (2023-06-26)
<i>Note: drawbench benchmark introduced</i>
-
DreamEdit: Subject-driven Image Editing (2023-06-22)
-
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing (2023-06-16)
<i>Note: dataset only</i>
-
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting (2022-12-13, CVPR 2023)
-
Imagic: Text-Based Real Image Editing with Diffusion Models (2022-10-17)
<i>Note: TEdBench, image editing benchmark</i>
-
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model (2021-11-26)
-
Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis (2021-09-29)
<i>Note: New evaluation system, Pseudo Turing Test (PTT)</i>
-
ManiGAN: Text-Guided Image Manipulation (2019-12-12)
<i>Note: manipulative precision metric introduced</i>
-
Text Guided Person Image Synthesis (2019-04-10)
<i>Note: VQA perceptual score introduced</i>
<a name="3.4."></a>
3.4. Evaluation of Neural Style Transfer
<a name="3.5."></a>
3.5. Evaluation of Video Generation
3.5.1. Evaluation of Text-to-Video Generation
-
Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model (2024-07-31)
-
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation (2024-07-19)
-
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models (2024-07-08)
<i>Note: T2VSafetyBench introduced</i>
-
Evaluation of Text-to-Video Generation Models: A Dynamics Perspective (2024-07-01)
-
T2VBench: Benchmarking Temporal Dynamics for Text-to-Video Generation (2024-06)
-
Evaluating and Improving Compositional Text-to-Visual Generation (2024-06)
-
TlTScore: Towards Long-Tail Effects in Text-to-Visual Evaluation with Generative Foundation Models (2024-06)
-
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation (2024-06-26)
-
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation (2024-06-21)
-
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation (2024-06-12)
<i>Note: TC-Bench, TCR and TC-Score introduced</i>
-
VideoPhy: Evaluating Physical Commonsense for Video Generation (2024-06-05)
-
Illumination Histogram Consistency Metric for Quantitative Assessment of Video Sequences (2024-05-15)
-
The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective (2024-05-13)
<i>Note: New evaluation framework T2Vid2T, Evaluation for storytelling aspects of videos</i>
-
Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method (2024-05-07)
-
Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models (2024-05-07)
<i>Note: hallucination detection</i>
-
Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap (2024-04-21)
-
Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment (2024-03-18)
-
A dataset of text prompts, videos and video quality metrics from generative text-to-video AI models (2024-02-22)
-
Sora Generates Videos with Stunning Geometrical Consistency (2024-02-27)
-
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models (2024-01-30)
-
Towards A Better Metric for Text-to-Video Generation (2024-01-15)
-
PEEKABOO: Interactive Video Generation via Masked-Diffusion (2023-12-12)
<i> Note: Benchmark for interactive video generation </i>
-
VBench: Comprehensive Benchmark Suite for Video Generative Models (2023-11-29)
-
SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning (2023-11-29)
-
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation (2023-11-03)
-
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models (2023-10-17)
-
Measuring the Quality of Text-to-Video Model Outputs: Metrics and Dataset (2023-09-14)
-
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization (2023-08-22, NeurIPS 2023)
-
CelebV-Text: A Large-Scale Facial Text-Video Dataset (2023-03-26, CVPR 2023)
<i>Note: Benchmark on Facial Text-to-Video Generation</i>
-
Make It Move: Controllable Image-to-Video Generation with Text Descriptions (2021-12-06, CVPR 2022)
3.5.2. Evaluation of Image-to-Video Generation
-
I2V-Bench from ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation (2024-02-06)
-
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI (2024-01-03)
-
VBench-I2V (2024-03) from VBench: Comprehensive Benchmark Suite for Video Generative Models (2023-11-29)
-
A Benchmark for Controllable Text-Image-to-Video Generation (2023-06-12)
-
Temporal Shift GAN for Large Scale Video Generation (2020-04-04)
<i>Note: Symmetric-Similarity-Score introduced</i>
-
Video Imagination from a Single Image with Transformation Generation (2017-06-13)
<i>Note: RIQA metric introduced</i>
3.5.3. Evaluation of Talking Face Generation
-
OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance (2024-05-23)
<i>Note: VTCS to measures lip-readability in synthesized videos</i>
-
Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation (2024-05-07)
-
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time (2024-04-16)
<i>Note: Contrastive Audio and Pose Pretraining (CAPP) score introduced</i>
-
THQA: A Perceptual Quality Assessment Database for Talking Heads (2024-04-13)
-
A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos (2024-03-11)
-
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert (2023-03-29, CVPR 2023)
<i>Note: Measuring intelligibility of the generated videos</i>
-
Responsive Listening Head Generation: A Benchmark Dataset and Baseline (2021-12-27, ECCV 2022)
-
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild (2020-08-23)
<i>Note: new metric LSE-D and LSE-C introduced</i>
-
What comprises a good talking-head video generation?: A Survey and Benchmark (2020-05-07)
<a name="3.6."></a>
3.6. Evaluation of Text-to-Motion Generation
-
MoDiPO: text-to-motion alignment via AI-feedback-driven Direct Preference Optimization (2024-05-06)
-
What is the Best Automated Metric for Text to Motion Generation? (2023-09-19)
-
Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language (2023-05-25)
<i>Note: Evaluation protocol for assessing the quality of the retrieved motions</i>
-
Establishing a Unified Evaluation Framework for Human Motion Generation: A Comparative Analysis of Metrics (2024-05-13)
-
Evaluation of text-to-gesture generation model using convolutional neural network (2021-10-11)
<a name="3.7."></a>
3.7. Evaluation of Model Trustworthiness
3.7.1. Evaluation of Visual-Generation-Model Trustworthiness
-
BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM (2024-07-21)
-
Towards Understanding Unsafe Video Generation (2024-07-17)
<i>Note: Proposes Latent Variable Defense (LVD) which works within the model's internal sampling process</i>
-
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention (2024-06-29)
-
EquiPrompt: Debiasing Diffusion Models via Iterative Bootstrapping in Chain of Thoughts (2024-06-13)
<i>Note: Normalized Entropy metric introduced</i>
-
Latent Directions: A Simple Pathway to Bias Mitigation in Generative AI (2024-06-10)
-
Evaluating and Mitigating IP Infringement in Visual Generative AI (2024-06-07)
-
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance (2024-06-06)
-
AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark (2024-06-02)
-
FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models (2024-05-28)
-
ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users (2024-05-24)
-
Condition Likelihood Discrepancy from Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy (2024-05-23)
-
Could It Be Generated? Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models (2024-05-09)
-
Towards Geographic Inclusion in the Evaluation of Text-to-Image Models (2024-05-07)
-
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images (2024-05-06)
-
Espresso: Robust Concept Filtering in Text-to-Image Models (2024-04-30)
<i> Note: Paper is about filtering unacceptable concepts, not evaluation.</i>
-
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models (2024-04-18)
-
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models (2024-04-11)
-
Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation (2024-04-01)
-
VBench-Trustworthiness (2024-03) from VBench: Comprehensive Benchmark Suite for Video Generative Models (2023-11-29)
-
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts (2024-03-17, NAACL 2024)
-
Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis (2024-03-08)
-
Position: Towards Implicit Prompt For Text-To-Image Models (2024-03-04)
<i>Note: ImplicitBench, new benchmark</i>
-
The Male CEO and the Female Assistant: Probing Gender Biases in Text-To-Image Models Through Paired Stereotype Test (2024-02-16)
-
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You (2024-01-29)
-
Benchmarking the Fairness of Image Upsampling Methods (2024-01-24)
-
ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation (2024-01-02)
-
New Job, New Gender? Measuring the Social Bias in Image Generation Models (2024-01-01)
-
Distribution Bias, Jaccard Hallucination, Generative Miss Rate from Quantifying Bias in Text-to-Image Generative Models (2023-12-20)
-
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models (2023-12-03)
<i>Note: CAS and BAV novel metric introduced</i>
-
Holistic Evaluation of Text-To-Image Models (2023-11-07)
-
Sociotechnical Safety Evaluation of Generative AI Systems (2023-10-18)
-
Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models (2023-10-03)
<i>Note: Evaluate the cultural content of TTI-generated images</i>
-
ITI-GEN: Inclusive Text-to-Image Generation (2023-09-11, ICCV 2023)
-
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity (2023-08-11)
-
On the Cultural Gap in Text-to-Image Generation (2023-07-06)
-
Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks (2023-06-16)
-
Disparities in Text-to-Image Model Concept Possession Across Languages (2023-06-12)
<i>Note: Benchmark of multilingual parity in conceptual possession</i>
-
Evaluating the Social Impact of Generative AI Systems in Systems and Society (2023-06-09)
-
Word-Level Explanations for Analyzing Bias in Text-to-Image Models (2023-06-03)
-
Multilingual Conceptual Coverage in Text-to-Image Models (2023-06-02, ACL 2023)
<i>Note: CoCo-CroLa, benchmark for multilingual parity of text-to-image models</i>
-
T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation (2023-06-01)
-
SneakyPrompt: Jailbreaking Text-to-image Generative Models (2023-05-20)
-
Inspecting the Geographical Representativeness of Images from Text-to-Image Models (2023-05-18)
-
Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models (2023-04-26)
-
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias (2023-04-06, CVPR 2023)
-
Social Biases through the Text-to-Image Generation Lens (2023-03-30)
-
Stable Bias: Analyzing Societal Representations in Diffusion Models (2023-03-20)
-
Auditing Gender Presentation Differences in Text-to-Image Models (2023-02-07)
-
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models (2022-11-09, CVPR 2023)
<i>Note: SLD removes and suppresses inappropriate image parts during the diffusion process</i>
-
How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions? (2022-10-27)
-
Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis (2022-09-19)
-
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (2022-02-08, ICCV 2023)
<i>Note: PaintSkills, evaluation for visual reasoning capabilities and social biases</i>
3.7.2. Evaluation of Non-Visual-Generation-Model Trustworthiness
Not for visual generation, but related evaluations of other models like LLMs
-
The African Woman is Rhythmic and Soulful: Evaluation of Open-ended Generation for Implicit Biases (2024-07-01)
-
Extrinsic Evaluation of Cultural Competence in Large Language Models (2024-06-17)
-
Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study (2024-06-11)
-
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal (2024-02-06)
-
FACET: Fairness in Computer Vision Evaluation Benchmark (2023-08-31)
-
Gender Biases in Automatic Evaluation Metrics for Image Captioning (2023-05-24)
-
Fairness Indicators for Systematic Assessments of Visual Feature Extractors (2022-02-15)
<a name="3.8."></a>
3.8. Evaluation of Entity Relation
-
Scene Graph(SG)-IoU, Relation-IoU, and Entity-IoU (using GPT-4v) from SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance (2024-05-24)
-
Relation Accuracy & Entity Accuracy from ReVersion: Diffusion-Based Relation Inversion from Images (2023-03-23)
-
Testing Relational Understanding in Text-Guided Image Generation (2022-07-29)
<a name="4."></a>
4. Improving Visual Generation with Evaluation / Feedback / Reward
-
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models (2024-07-17)
-
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion (2024-07-17, ECCV 2024)
-
Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning (2024-07-16)
-
Video Diffusion Alignment via Reward Gradients (2024-07-11)
-
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning (2024-07-09)
-
Aligning Human Motion Generation with Human Perceptions (2024-07-02)
-
PopAlign: Population-Level Alignment for Fair Text-to-Image Generation (2024-06-28)
-
Prompt Refinement with Image Pivot for Text-to-Image Generation (2024-06-28, ACL 2024)
-
Diminishing Stereotype Bias in Image Generation Model using Reinforcemenlent Learning Feedback (2024-06-27)
-
Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation (2024-06-24)
-
Batch-Instructed Gradient for Prompt Evolution: Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis (2024-06-13)
-
InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning (2024-06-14)
-
Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization (2024-06-10)
<i>Note: new evaluation metric: style alignment</i>
-
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference (2024-06-10)
-
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization (2024-06-06)
-
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step (2024-06-06)
-
Improving GFlowNets for Text-to-Image Diffusion Alignment (2024-06-02)
<i>Note: Improves text-to-image alignment with reward function</i>
-
Enhancing Reinforcement Learning Finetuned Text-to-Image Generative Model Using Reward Ensemble (2024-06-01)
-
Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback (2024-05-30)
-
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback (2024-05-29)
-
Curriculum Direct Preference Optimization for Diffusion and Consistency Models (2024-05-22)
-
Class-Conditional self-reward mechanism for improved Text-to-Image models (2024-05-22)
-
Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning (2024-05-12)
-
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models (2024-05-01)
-
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning (2024-04-23)
-
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis (2024-04-21)
<i>Note: Human feedback learning to enhance model performance in low-steps regime</i>
-
Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding (2024-04-17)
-
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback (2024-04-11)
-
UniFL: Improve Stable Diffusion via Unified Feedback Learning (2024-04-08)
-
YaART: Yet Another ART Rendering Technology (2024-04-08)
-
ByteEdit: Boost, Comply and Accelerate Generative Image Editing (2024-04-07)
<i>Note: ByteEdit, feedback learning framework for Generative Image Editing tasks</i>
-
Aligning Diffusion Models by Optimizing Human Utility (2024-04-06)
-
Dynamic Prompt Optimizing for Text-to-Image Generation (2024-04-05)
-
Pixel-wise RL on Diffusion Models: Reinforcement Learning from Rich Feedback (2024-04-05)
-
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching (2024-04-04)
-
VersaT2I: Improving Text-to-Image Models with Versatile Reward (2024-03-27)
-
Improving Text-to-Image Consistency via Automatic Prompt Optimization (2024-03-26)
-
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation (2024-03-25)
-
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation (2024-03-20)
-
Reward Guided Latent Consistency Distillation (2024-03-16)
-
Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation (2024-03-12)
-
Debiasing Text-to-Image Diffusion Models (2024-02-22)
-
Universal Prompt Optimizer for Safe Text-to-Image Generation (2024-02-16, NAACL 2024)
-
Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community (2024-02-15, ICLR 2024)
-
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference (2024-02-13, ICML 2024)
-
Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases (2024-02-13, ICML 2024)
-
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models (2024-02-13)
-
Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example (2024-02-09)
-
Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation (2024-01-28)
-
Large-scale Reinforcement Learning for Diffusion Models (2024-01-20)
-
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation (2024-01-11)
-
InstructVideo: Instructing Video Diffusion Models with Human Feedback (2023-12-19)
-
Rich Human Feedback for Text-to-Image Generation (2023-12-15, CVPR 2024)
-
iDesigner: A High-Resolution and Complex-Prompt Following Text-to-Image Diffusion Model for Interior Design (2023-12-07)
-
InstructBooth: Instruction-following Personalized Text-to-Image Generation (2023-12-04)
-
DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback (2023-11-29)
-
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning (2023-11-27)
-
AdaDiff: Adaptive Step Selection for Fast Diffusion (2023-11-24)
-
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model (2023-11-22)
-
Diffusion Model Alignment Using Direct Preference Optimization (2023-11-21)
-
BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis (2023-11-12)
-
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization (2023-10-18, ICML 2024)
-
Aligning Text-to-Image Diffusion Models with Reward Backpropagation (2023-10-05)
-
Directly Fine-Tuning Diffusion Models on Differentiable Rewards (2023-09-29)
-
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation (2023-08-09, ACM MM 2023)
-
FABRIC: Personalizing Diffusion Models with Iterative Feedback (2023-07-19) [
-
Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback (2023-07-10, NeurIPS 2023)
-
Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback (2023-07-06, NeurIPS 2023)
<i>Note: Censored generation using a reward model</i>
-
StyleDrop: Text-to-Image Generation in Any Style (2023-06-01)
<i>Note: Iterative Training with Feedback</i>
-
RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment (2023-05-31)
-
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models (2023-05-25, NeurIPS 2023)
-
Training Diffusion Models with Reinforcement Learning (2023-05-22) Website](https://rl-diffusion.github.io/)
-
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation (2023-04-12)
-
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models (2023-04-02, ICLR 2024)
-
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference (2023-03-25)
-
HIVE: Harnessing Human Feedback for Instructional Visual Editing (2023-03-16) ](https://github.com/salesforce/HIVE)
-
Aligning Text-to-Image Models using Human Feedback (2023-02-23)
-
Optimizing Prompts for Text-to-Image Generation (2022-12-19, NeurIPS 2023)
<a name="5."></a>
5. Quality Assessment for AIGC
5.1. Image Quality Assessment for AIGC
-
Descriptive Image Quality Assessment in the Wild (2024-05-29)
-
PKU-AIGIQA-4K: A Perceptual Quality Assessment Database for Both Text-to-Image and Image-to-Image AI-Generated Images (2024-04-29)
-
Large Multi-modality Model Assisted AI-Generated Image Quality Assessment (2024-04-27)
-
Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment (2024-04-23)
-
PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition (2024-04-20)
-
AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment (2024-04-04)
-
AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images (2024-04-01)
-
Bringing Textual Prompt to AI-Generated Image Quality Assessment (2024-03-27, ICME 2024)
-
TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment (2024-01-08)
-
PSCR: Patches Sampling-based Contrastive Regression for AIGC Image Quality Assessment (2023-12-10)
-
Exploring the Naturalness of AI-Generated Images (2023-12-09)
-
PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images (2023-11-27)
-
Appeal and quality assessment for AI-generated images (2023-07-18)
-
AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment (2023-06-07)
-
A Perceptual Quality Assessment Exploration for AIGC Images (2023-03-22)
-
SPS: A Subjective Perception Score for Text-to-Image Synthesis (2021-04-27)
-
GIQA: Generated Image Quality Assessment (2020-03-19)
5.2. Aesthetic Predictors for Generated Images
-
Multi-modal Learnable Queries for Image Aesthetics Assessment (2024-05-02, ICME 2024)
-
Aesthetic Scorer extension for SD Automatic WebUI (2023-01-15)
-
Rethinking Image Aesthetics Assessment: Models, Datasets and Benchmarks (2022-07-01)
-
LAION-Aesthetics_Predictor V2: CLIP+MLP Aesthetic Score Predictor (2022-06-26)
<a name="6."></a>
6. Study and Rethinking
6.1. Evaluation of Evaluations
-
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos (2024-06-10)
-
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) (2024-04-05)
6.2. Survey
-
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models (2024-06-20)
-
From Sora What We Can See: A Survey of Text-to-Video Generation (2024-05-17)
<i> Note: Refer to Section 3.4 for Evaluation Datasets and Metrics</i>
-
A Survey on Personalized Content Synthesis with Diffusion Models (2024-05-09)
<i> Note: Refere to Section 6 for Evaluation Datasets and Metrics</i>
-
A Survey on Long Video Generation: Challenges, Methods, and Prospects (2024-03-25)
<i>Note: Refer to table 2 for evaluation metrics for long video generation</i>
-
Evaluating Text-to-Image Synthesis: Survey and Taxonomy of Image Quality Metrics (2024-03-18)
-
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation (2024-03-08)
-
State of the Art on Diffusion Models for Visual Computing (2023-10-11)
<i> Note: Refer to Section 9 for Metrics</i>
-
AI-Generated Images as Data Source: The Dawn of Synthetic Era (2023-10-03)
<i>Note: Refer to Section 4.2 for Evaluation Metrics</i>
-
A Survey on Video Diffusion Models (2023-10-06)
<i> Note: Refer to Section 2.3 for Evaluation Datasets and Metrics</i>
-
Text-to-image Diffusion Models in Generative AI: A Survey (2023-03-14)
<i> Note: Refer to Section 5 for Evaulation from Techincal and Ethical Perspective</i>
-
Image synthesis: a review of methods, datasets, evaluation metrics, and future outlook (2023-02-28)
<i> Note: Refer to section 4 for evaluation metrics</i>
-
Adversarial Text-to-Image Synthesis: A Review (2021-01-25)
<i> Note: Refer to Section 5 for Evaluation of T2I Models</i>
-
Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments (2020-05-27)
<i>Note: Refer to section 2.2 for Evaluation Metrics</i>
-
What comprises a good talking-head video generation?: A Survey and Benchmark (2020-05-07)
-
A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis (2019-10-21)
<i> Note: Refer to Section 5 for Benchmark and Evaluation</i>
-
Recent Progress on Generative Adversarial Networks (GANs): A Survey (2019-03-14)
<i>Note: Refer to section 5 for Evaluation Metrics</i>
-
Video Description: A Survey of Methods, Datasets and Evaluation Metrics (2018-06-01)
<i>Note: Refer to section 5 for Evaluation Metrics</i>
6.3. Study
-
A-Bench: Are LMMs Masters at Evaluating AI-generated Images? (2024-06-05)
-
On the Content Bias in Fréchet Video Distance (2024-04-18, CVPR 2024)
-
On the Evaluation of Generative Models in Distributed Learning Tasks (2023-10-18)
-
Recent Advances in Text-to-Image Synthesis: Approaches, Datasets and Future Research Prospects (2023-08-18)
-
Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models (2023-06-07, NeurIPS 2023)
-
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation (2023-04-04, CVPR 2023)
-
Revisiting the Evaluation of Image Synthesis with GANs (2023-04-04)
-
A Study on the Evaluation of Generative Models (2022-06-22)
-
REALY: Rethinking the Evaluation of 3D Face Reconstruction (2022-03-18)
-
On Aliased Resizing and Surprising Subtleties in GAN Evaluation (2021-04-22)
-
Pros and Cons of GAN Evaluation Measures: New Developments (2021-03-17)
-
On the Robustness of Quality Measures for GANs (2022-01-31, ECCV 2022)
-
Multimodal Image Synthesis and Editing: The Generative AI Era (2021-12-27)
-
An Analysis of Text-to-Image Synthesis (2021-05-25)
-
Pros and Cons of GAN Evaluation Measures (2018-02-09)
-
A Note on the Inception Score (2018-01-06)
-
An empirical study on evaluation metrics of generative adversarial networks (2018-06-19)
-
Are GANs Created Equal? A Large-Scale Study (2017-11-28, NeurIPS 2018)
-
A note on the evaluation of generative models (2015-11-05)
6.4. Competition
-
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge (2024-04-25)
-
CVPR 2023 Text Guided Video Editing Competition (2023-10-24)
<a name="7."></a>
7. Other Useful Resources
-
Stanford Course: CS236 "Deep Generative Models" - Lecture 15 "Evaluation of Generative Models" [slides]