Home

Awesome

LVBench: An Extreme Long Video Understanding Benchmark

VideoQA Multi-Modal LVBench
Gemini GPT-4o LLaVA-NEXT MovieChat

<font size=7><div align='center' > [🍎 Project Page] [📖 arXiv Paper] [📊 Dataset][🏆 Leaderboard][🏆 Huggingface Leaderboard] </div></font>

<p align="center"> <img src="./docs/images/cover.png" width="96%" height="50%"> </p>

LVBench is a benchmark designed to evaluate and enhance the capabilities of multimodal models in understanding and extracting information from long videos up to two hours in duration.


🔥 News

👀 Introduce to LVBench

LVBench is a benchmark designed to evaluate the capabilities of models in understanding long videos. We collected extensive long video data from public sources, annotated through a mix of manual effort and model assistance. Our benchmark provides a robust foundation for testing models on extended temporal contexts, ensuring high-quality assessment through meticulous human annotation and multi-stage quality control.

Features

  1. Core Capabilities: Six core capabilities for long video understanding, enabling the creation of complex and challenging questions for comprehensive model evaluation.
  2. Diverse Data: A diverse range of long video data, averaging five times longer than the longest existing datasets, covering various categories.
  3. High-Quality Annotations: Reliable benchmark with meticulous human annotation and multi-stage quality control processes.
<img src="./docs/images/example.jpg" width="100%" height="50%">

Dataset

License

Our dataset is under the CC-BY-NC-SA-4.0 license.

LVBench is only used for academic research. Commercial use in any form is prohibited. We do not own the copyright of any raw video files.

If there is any infringement in LVBench, please contact shiyu.huang@aminer.cn or directly raise an issue, and we will remove it immediately.

Download

Install video2dataset first:

pip install video2dataset
pip uninstall transformer-engine

Then you should download video_info.meta.jsonl from Huggingface and put it in the data directory.

Each entry in the video_info.meta.jsonl file has a key field corresponding to a YouTube video's ID. Users can download the corresponding video using this ID. Alternatively, users can use the download script we provide, download.sh, for downloading:

cd scripts
bash download.sh

After the execution, the video files will be stored in the script/videos directory.

Install LVBench

pip install -e .

Get Evaluation Results and Submit to Leaderboard

(Note: if you want to try the evaluation quickly, you can use the scripts/construct_random_answers.py to prepare a random answer file.)

cd scripts
python test_acc.py

After the execution, you will get an evaluation results file result.json in the scripts directory. You can submit the results to the leaderboard.

📈 Results

<p align="center"> <img src="./docs/images/leaderboard.png" width="96%" height="50%"> </p> <p align="center"> <img src="./docs/images/compare.png" width="96%" height="50%"> </p> <p align="center"> <img src="./docs/images/human.png" width="96%" height="50%"> </p> <p align="center"> <img src="./docs/images/distribution.png" width="96%" height="50%"> </p>

:black_nib: Citation

If you find our work helpful for your research, please consider citing our work.

@misc{wang2024lvbench,
      title={LVBench: An Extreme Long Video Understanding Benchmark}, 
      author={Weihan Wang and Zehai He and Wenyi Hong and Yean Cheng and Xiaohan Zhang and Ji Qi and Shiyu Huang and Bin Xu and Yuxiao Dong and Ming Ding and Jie Tang},
      year={2024},
      eprint={2406.08035},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}