Awesome

Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models (ECCV'24)

This is the implementation for paper: Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models.

Description

The AnomalyRuler pipeline consists of two main stages: induction and deduction. The induction stage involves: i) visual perception transfers normal reference frames to text descriptions; ii) rule generation derives rules based on these descriptions to determine normality and anomaly; iii) rule aggregation employs a voting mechanism to mitigate errors in rules. The deduction stage involves: i) visual perception transfers continuous frames to descriptions; ii) perception smoothing adjusts these descriptions considering temporal consistency to ensure neighboring frames share similar characteristics; iii) robust reasoning rechecks the previous dummy answers and outputs reasoning.

Dependencies

pip install torch==2.1.0 torchvision==0.16.0 transformers==4.35.0 accelerate==0.24.1 sentencepiece==0.1.99 einops==0.7.0 xformers==0.0.22.post7 triton==2.1.0

pip install pandas pillow openai scikit-learn protobuf

Dataset

Download the datasets and put the {train} and {test} folder under the {dataset_name} folder, for example:

+-- SHTech
|   +-- train
|   +-- test
    |   +-- 01_0014
        |   +-- 000.jpg
        |   +-- ...

Download links:

Run

Step 1: Visual Perception

python image2text.py --data='SHTech'

Step 2: Rule Generation + Rule Aggregation

python main.py --data='SHTech' --induct --b=1 --bs=10

Step 3: Perception Smoothing

python majority_smooth.py --data='SHTech'

PS: You can also start from Step 3 to reuse the rules and simply reproduce the results.

Step 4: Robust Reasoning

python main.py --data='SHTech' --deduct

Citation

@inproceedings{yang2024anomalyruler,
    title={Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models},
    author={Yuchen Yang and Kwonjoon Lee and Behzad Dariush and Yinzhi Cao and Shao-Yuan Lo},
    year={2024},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}
}