Awesome
AIGCBench
:dart::dart: AIGCBench is a novel and comprehensive benchmark designed for evaluating the capabilities of state-of-the-art video generation algorithms. Official code for the paper:
<p align="center"> <img src="./source/I2VFramework.jpg" width="1080px"/> </p>AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI, BenchCouncil Transactions on Benchmarks, Standards and Evaluations (TBench).
Fanda Fan, Chunjie Luo, Wanling Gao, Jianfeng Zhan
<a href='https://arxiv.org/abs/2401.01651'><img src='https://img.shields.io/badge/arXiv-2401.01651-red'></a> <a href='https://www.benchcouncil.org/AIGCBench/'><img src='https://img.shields.io/badge/Project-Website-orange'></a> <a href='https://github.com/BenchCouncil/AIGCBench'><img src='https://img.shields.io/badge/Github-Code-green'></a> <a href='https://huggingface.co/datasets/stevenfan/AIGCBench_v1.0'><img src='https://img.shields.io/badge/Huggingface-Dataset-yellow'></a>
<em>Illustration of our AIGCBench. Our AIGCBench is divided into three modules: the evaluation dataset, the evaluation metrics, and the video generation models to be assessed.</em>
Key Features of AIGCBench:
- Diverse Datasets: AIGCBench incorporates a variety of datasets, including real-world video-text pairs and image-text pairs, to ensure a broad and realistic evaluation spectrum. Additionally, it includes a newly generated dataset created through an innovative text-to-image generation pipeline, enhancing the diversity and representativeness of the benchmark.
- Extensive Evaluation Metrics: AIGCBench introduces a set of evaluation metrics that cover four crucial dimensions of video generation—control-video alignment, motion effects, temporal consistency, and video quality. Our evaluation metrics encompass both reference video-based metrics and video-free metrics.
- Validated by Human Judgment: The benchmark's evaluation criteria are thoroughly verified against human preferences to confirm their reliability and alignment with human judgments.
- In-Depth Analysis: Through extensive evaluations, AIGCBench reveals insightful findings about the current strengths and limitations of existing I2V models, offering valuable guidance for future advancements in the field.
- Future Expansion: AIGCBench is not only comprehensive and scalable in its current form but also designed with the vision to encompass a wider range of video generation tasks in the future. This will allow for a unified and in-depth benchmarking of various aspects of AI-generated content (AIGC), setting a new standard for the evaluation of video generation technologies.
:fire:News
- [01/24/2024] Our paper has been accepted by BenchCouncil Transactions on Benchmarks, Standards and Evaluations (Tbench)!
- [01/10/2024] The evaluation dataset and evaluation code have been released.
Dataset
:smile:The Hugging Face link for our dataset.
This dataset is intended for the evaluation of video generation tasks. Our dataset includes image-text pairs and video-text pairs. The dataset comprises three parts:
Ours
- A custom generation of image-text samples.Webvid val
- A subset of 1000 video samples from the WebVid val dataset.Laion-aesthetics
- A subset of LAION dataset that includes 925 curated image-text samples.
Below are some images we generated, with the corresponding text:
Image | Description |
---|---|
<img src="source/265_Amidst the lush canopy of a deep jungle, a playful panda is brewing a potion, captured with the stark realism of a photo.png" width="200px" /> | Amidst the lush canopy of a deep jungle, a playful panda is brewing a potion, captured with the stark realism of a photo. |
<img src="source/426_Behold a noble king in the throes of skillfully strumming the guitar surrounded by the tranquil waters of a serene lake, envisioned in the style of an oil painting.png" width="200px" /> | Behold a noble king in the throes of skillfully strumming the guitar surrounded by the tranquil waters of a serene lake, envisioned in the style of an oil painting. |
<img src="source/619_Amidst a sun-dappled forest, a mischievous fairy is carefully repairing a broken robot, captured in the style of an oil painting.png" width="200px" /> | Amidst a sun-dappled forest, a mischievous fairy is carefully repairing a broken robot, captured in the style of an oil painting. |
<img src="source/824_Within the realm of the backdrop of an alien planet's red skies, a treasure-seeking pirate cleverly solving a puzzle, each moment immortalized in the style of an oil painting.png" width="200px" /> | Within the realm of the backdrop of an alien planet's red skies, a treasure-seeking pirate cleverly solving a puzzle, each moment immortalized in the style of an oil painting. |
Metrics
We have encapsulated the evaluation metrics used in our paper in eval.py
; for more details, please refer to the paper. To use the code, please first download the clip model file and replace the 'path_to_dir' with the actual path.
Below is a simple example:
batch_video_path = os.path.join('path_to_videos', '*.mp4')
video_path_list = sorted(glob.glob(batch_video_path))
sum_res = 0
cnt = 0
for video_path in video_path_list:
res = compute_video_video_similarity(ref_video_path, video_path)
sum_res += res['clip']
cnt += res["state"]
print(sum_res / cnt)
Evaluation Results
Quantitative analysis for different Image-to-Video algorithms. An upward arrow indicates that higher values are better, while a downward arrow means lower values are preferable.
Dimensions | Metrics | VideoCrafter | I2VGen-XL | SVD | Pika | Gen2 |
---|---|---|---|---|---|---|
Control-video Alignment | MSE (First) ↓ | 3929.65 | 4491.90 | 640.75 | 155.30 | 235.53 |
SSIM (First) ↑ | 0.300 | 0.354 | 0.612 | 0.800 | 0.803 | |
Image-GenVideo Clip ↑ | 0.830 | 0.832 | 0.919 | 0.930 | 0.939 | |
GenVideo-Text Clip ↑ | 0.23 | 0.24 | - | 0.271 | 0.270 | |
GenVideo-RefVideo CliP (Keyframes) ↑ | 0.763 | 0.764 | - | 0.824 | 0.820 | |
Motion Effects | Flow-Square-Mean | 1.24 | 1.80 | 2.52 | 0.281 | 1.18 |
GenVideo-RefVideo CliP (Corresponding frames) ↑ | 0.764 | 0.764 | 0.796 | 0.823 | 0.818 | |
Temporal Consistency | GenVideo Clip (Adjacent frames) ↑ | 0.980 | 0.971 | 0.974 | 0.996 | 0.995 |
GenVideo-RefVideo CliP (Corresponding frames) ↑ | 0.764 | 0.764 | 0.796 | 0.823 | 0.818 | |
Video Quality | Frame Count ↑ | 16 | 32 | 25 | 72 | 96 |
DOVER ↑ | 0.518 | 0.510 | 0.623 | 0.715 | 0.775 | |
GenVideo-RefVideo SSIM ↑ | 0.367 | 0.304 | 0.507 | 0.560 | 0.504 |
To validate the alignment of our proposed evaluation standards with human preferences, we conducted a study. We randomly selected 30 generated results from each of the five methods. Then, we asked participants to vote on the best algorithm outcomes across four dimensions: Image Fidelity, Motion Effects, Temporal Consistency, and Video Quality. A total of 42 individuals participated in the voting process. The specific results of the study are presented below:
<img src="source/radar_chart_high_res.jpg" alt="Alt text" width="600">Contact Us
:email: If you have any questions, please feel free to contact us via email at fanfanda@ict.ac.cn and jianfengzhan.benchcouncil@gmail.com.
Citation
If you find our work useful in your research, please consider citing our paper:
@article{fan2024aigcbench,
title={AIGCBench: Comprehensive evaluation of image-to-video content generated by AI},
author={Fan, Fanda and Luo, Chunjie and Gao, Wanling and Zhan, Jianfeng},
journal={BenchCouncil Transactions on Benchmarks, Standards and Evaluations},
pages={100152},
year={2024},
publisher={Elsevier}
}