Home

Awesome

SSRL

This repository is an official implementation of the paper Scale-aware Spatio-temporal Relation Learning for Video Anomaly Detection

Introduction

Abstract. Recent progress in video anomaly detection (VAD) has shown that feature discrimination is the key to effectively distinguishing anomalies from normal events. We observe that many anomalous events occur in limited local regions, and the severe background noise increases the difficulty of feature learning. In this paper, we propose a scale-aware weakly supervised learning approach to capture local and salient anomalous patterns from the background, using only coarse video-level labels as supervision. We achieve this by segmenting frames into non-overlapping patches and then capturing inconsistencies among different regions through our patch spatial relation (PSR) module, which consists of self-attention mechanisms and dilated convolutions. To address the scale variation of anomalies and enhance the robustness of our method, a multi-scale patch aggregation method is further introduced to enable local-to-global spatial perception by merging features of patches with different scales. Considering the importance of temporal cues, we extend the relation modeling from the spatial domain to the spatio-temporal domain with the help of the existing video temporal relation network to effectively encode the spatio-temporal dynamics in the video. Experimental results show that our proposed method achieves new state-of-the-art performance on UCF-Crime and ShanghaiTech benchmarks.

License

This project is released under the MIT license.

Installation

Requirements

Usage

Dataset preparation

Please download extracted i3d features and checkpoints for ShanghaiTech and UCF-Crime dataset from Baidu Wangpan (extract code: wxxy) and put them under the coderoot.

Training

Training on single node

Step-by-step training:

sh scripts/train_ssrl_stage1.sh

sh scripts/train_ssrl_stage2.sh

sh scripts/train_ssrl_stage3.sh

sh scripts/train_ssrl_stage4.sh