Home

Awesome

<div align="center"> <h2 align="center"> <a href="https://arxiv.org/abs/2406.12235">Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM</a></h2> <h5 align="center"> If you like our project, please give us a star ⭐ on GitHub for latest update. </h2>

🎨 Project Page

</div>

📰 News

😮 Highlights

Towards open-ended Video Anomaly Detection (VAD), existing methods often exhibit biased detection when faced with challenging or unseen events and lack interpretability. To address these drawbacks, we propose Holmes-VAD, a novel framework that leverages precise temporal supervision and rich multimodal instructions to enable accurate anomaly localization and comprehensive explanations.

<!-- Model Image--> <section class="hero teaser"> <div class="container is-max-desktop"> <div class="hero-body"> <img src="assets/data_engine.png" alt="MY ALT TEXT"/> </div> </div> </section> <!-- End Model Image --> <!-- Model Image--> <section class="hero teaser"> <div class="container is-max-desktop"> <div class="hero-body"> <img src="assets/framework.png" alt="MY ALT TEXT"/> </div> </div> </section> <!-- End Model Image -->

🛠️ Requirements and Installation

# inference only
git clone https://github.com/pipixin321/HolmesVAD.git
cd HolmesVAD
conda create -n holmesvad python=3.10 -y
conda activate holmesvad
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install decord opencv-python pytorchvideo
# additional packages for training
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

🤗 Demo

CLI Inference

CUDA_VISIBLE_DEVICES=0 python demo/cli.py --model-path ./checkpoints/HolmesVAD-7B --file ./demo/examples/vad/RoadAccidents133_x264_270_451.mp4

Gradio Web UI

CUDA_VISIBLE_DEVICES=0 python demo/gradio_demo.py
<img src="assets/demo.gif" />

Stargazers over time

Stargazers over time

Citation

If you find this repo useful for your research, please consider citing our paper:

@article{zhang2024holmes,
  title={Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM},
  author={Zhang, Huaxin and Xu, Xiaohao and Wang, Xiang and Zuo, Jialong and Han, Chuchu and Huang, Xiaonan and Gao, Changxin and Wang, Yuehuan and Sang, Nong},
  journal={arXiv preprint arXiv:2406.12235},
  year={2024}
}