Home

Awesome

Paper Project Page Visitor

This repository contains the implementation of the following work:

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models<br> Fan Zhang<sup></sup>, Shulin Tian<sup></sup>, Ziqi Huang<sup></sup>, Yu Qiao<sup>+</sup>, Ziwei Liu<sup>+</sup><br>

<a name="overview"></a>

:mega: Overview

Recent advancements in visual generative models have enabled high-quality image and video generation, opening diverse applications. However, evaluating these models often demands sampling hundreds or thousands of images or videos, making the process computationally expensive, especially for diffusion-based models with inherently slow sampling. Moreover, existing evaluation methods rely on rigid pipelines that overlook specific user needs and provide numerical results without clear explanations. In contrast, humans can quickly form impressions of a model's capabilities by observing only a few samples. To mimic this, we propose the Evaluation Agent framework, which employs human-like strategies for efficient, dynamic, multi-round evaluations using only a few samples per round, while offering detailed, user-tailored analyses. It offers four key advantages: 1) efficiency, 2) promptable evaluation tailored to diverse user needs, 3) explainability beyond single numerical scores, and 4) scalability across various models and tools. Experiments show that Evaluation Agent reduces evaluation time to 10% of traditional methods while delivering comparable results. The Evaluation Agent framework is fully open-sourced to advance research in visual generative models and their efficient evaluation.

Framework

Overview of Evaluation Agent Framework. This framework leverages LLM-powered agents for efficient and flexible visual model assessments. As shown, it consists of two stages: (a) the Proposal Stage, where user queries are decomposed into sub-aspects, and prompts are generated, and (b) the Execution Stage, where visual content is generated and evaluated using an Evaluation Toolkit. The two stages interact iteratively to dynamically assess models based on user queries.

<a name="installation"></a>

:hammer: Installation

  1. Clone the repository.
git clone https://github.com/Vchitect/Evaluation-Agent.git
cd Evaluation-Agent
  1. Install the environment.
conda create -n eval_agent python=3.10
conda activate eval_agent
pip install -r requirements.txt

<a name="usage"></a>

Usage

First, you need to configure the open_api_key. You can do it as follows:

export OPENAI_API_KEY="your_api_key_here"

Evaluation of Open-ended Questions on T2I Models

python open_ended_eval.py --user_query $USER_QUERY --model $MODEL

Evaluation Based on the VBench Tools on T2V Models

Preparation

  1. Configure the VBench Environment
  1. Prepare the Model to be Evaluated

Command

python eval_agent_for_vbench.py --user_query $USER_QUERY --model $MODEL

Evaluation Based on the T2I-CompBench Tools on T2I Models

Preparation

  1. Configure the T2I-CompBench Environment
  1. Prepare the Model to be Evaluated

Command

python eval_agent_for_t2i_compbench.py --user_query $USER_QUERY --model $MODEL

Open-Ended User Query Dataset

We propose the Open-Ended User Query Dataset, developed through a user study. As part of this process, we gathered questions from various sources, focusing on aspects users consider most important when evaluating new models. After cleaning, filtering, and expanding the initial set, we compiled a refined dataset of 100 open-ended user queries.

Check out the details of the open-ended user query dataset

statistic The three graphs give an overview of the distributions and types of our curated open queries set. Left: the distribution of question types, which are categorized as General or Specific. Middle: the distribution of the ability types, which are categorized as Prompt Following, Visual Quality, Creativity, Knowledge and Others. Right: the distribution of the content categories, which are categorized as History and Culture, Film and Entertainment, Science and Education, Fashion, Medical, Game Design, Architecture and Interior Design, Law.

Citation

If you find our repo useful for your research, please consider citing our paper:

@article{zhang2024evaluationagent,
    title = {Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models},
    author = {Zhang, Fan and Tian, Shulin and Huang, Ziqi and Qiao, Yu and Liu, Ziwei},
    journal={arXiv preprint arXiv:2412.09645},
    year = {2024}
}