Home

Awesome

Networking Systems for Video Anomaly Detection: A Tutorial and Survey

This is the official repository for the paper titled "Networking Systems for Video Anomaly Detection: A Tutorial and Survey", submitted to πŸ“° ACM Computing Surveys (Paper arXiv). Serving as the inaugural tutorial paper focusing on video anomaly detection (VAD), this article provides a comprehensive review of the latest advancements in unsupervised, weakly supervised, and fully unsupervised VAD routes, along with detailed explanations of typical methodologies. We analyze the linage of VAD with AI, IoT, and communication communities from the perspective of Network Systems for Artificial Intelligence (NSAI), expanding its research boundaries to include deployable Networking Systems of VAD (NSVAD). Moreover, we introduce our explorations of NSVAD applications in industrial IoT, smart cities, and complex systems. Finally, we combine these with insights into the development trends and potential opportunities of NSVAD in line with the advancements in AI. This repository curates existing literature, available code, public datasets, libraries, and relevant tutorials to facilitate learning for beginners.[arXiv](Paper: https://arxiv.org/abs/2405.10347)

πŸ“– Table of contents

Datasets

Dataset#Videos#Normal#Abnormal#Scenes#Anomalies#Classes
UMN6,1651,5763113
Subway Entrance132,13812,1121515
Subway Exit60,4104,4911143
Street Scene$^{*}$81159,34143,9162051717
CUHK Avenue3726,8323,8201775
ShanghaiTech437300,30817,0901315811
UCSD Ped1709,9954,0051615
UCSD Ped2292,9241,6361215
UCF-Crime1,90095013
ShanghaiTech Weakly$^{**}$43711
XD-Violance4,7546
Ubnormal$^{***}$543147,88789,01529660
ADOC97,0301721
NWPU Campus5473052424328

$^{*}$ Following previous works, we set the frame rate to 15 fps.

$^{***}$ This dataset is reorganized from ShanghaiTech, so we provide the reorganized file list here.

$^{***}$ The Ubnormal contains a validation set with 64 videos totaling 14,237 normal and 13,938 abnormal frames.

Taxonomy

1. Unsupervised Video Anomaly Detection

1.1 Global Normality Learning

πŸ—“οΈ 2016 & Before

πŸ—“οΈ 2017

πŸ—“οΈ 2019

πŸ—“οΈ 2020

πŸ—“οΈ 2021

πŸ—“οΈ 2022

πŸ—“οΈ 2023

1.2 Local Prototype Modeling

1.2.1 Spatial-Temporal Patch-based Methods

πŸ—“οΈ 2016 & Before

πŸ—“οΈ 2017

πŸ—“οΈ 2018

πŸ—“οΈ 2019

πŸ—“οΈ 2020

πŸ—“οΈ 2021

πŸ—“οΈ 2022

πŸ—“οΈ 2023

1.2.2 Foreground Object-driven Methods

πŸ—“οΈ 2017

πŸ—“οΈ 2018

πŸ—“οΈ 2019

πŸ—“οΈ 2020

πŸ—“οΈ 2021

πŸ—“οΈ 2022

πŸ—“οΈ 2023

2. Weakly Supervised Video Abnormal Detection

2.1 Uni-modal Methods

πŸ—“οΈ 2018

πŸ—“οΈ 2019

πŸ—“οΈ 2020

πŸ—“οΈ 2021

πŸ—“οΈ 2022

πŸ—“οΈ 2023

2.2 Multi-modal Methods

πŸ—“οΈ 2020

πŸ—“οΈ 2021

πŸ—“οΈ 2022

3 Fully Unsupervised Video Anomaly Detection

Performance Comparison

Frame-level AUC Comparison

VAD aims at temporal localization of anomalous events in surveillance videos, i.e., determining the starting position of anomalous frames, so existing methods usually use frame-level metrics as the main performance evaluation indicators. The VAD model usually outputs an anomaly score in the range of [0, 1] for the test frame, while the label is a discrete 0 or 1 , where 0 refers to normal and 1 indicates that the frame contains an abnormal event. By selecting multiple thresholds and calculating frame-level AUCs, it is possible to more comprehensively assess the model's ability to reason about regular events and its level of response to diverse anomalies. We collate the performance of existing work reported on publicly available datasets (e.g., UCSD Ped1, UCSD Ped2, CUHK Avenue, and Shanghai) as follows:

UVAD Methods

MethodPed1Ped2AvenueShanghaiTech
DRAM92.190.8--
STVP93.994.6--
CMAC85.090.0--
FF-AE81.090.070.260.9
DEM92.5---
CFS82.084.0--
WTA-AE91.992.882.1-
EBM70.386.478.8-
CPE78.280.7--
LDGK-92.2--
sRNN-92.281.768.0
GANS97.493.5--
OGNG93.894.0--
FFP83.195.485.172.8
PP-CNN95.788.4--
FAED93.895.0--
NNC--88.9-
OC-AE-97.890.484.9
AMC-96.286.9-
MLR82.399.271.5-
memAE-94.183.371.2
MLEP--92.876.8
BMAN-96.690.076.2
Street Scene77.388.372.0-
IPR82.696.283.773.0
DFSN86.094.087.2-
MNAD(Recon)-90.282.869.8
MNAD(Pred)-97.088.570.5
DD-GAN-95.684.973.7
FSSA-96.285.877.9
VEC-97.389.674.8
Multispace-95.486.873.6
CDD-AE-96.586.073.3
CDD-AE+-96.787.173.7
Multi-task (object level)-99.891.989.3
Multi-task (frame level)-92.486.983.5
Multi-task (late fusion)-99.892.890.2
HF$^2$AVD-99.391.176.2
AST-AE-96.685.268.8
ROADMAP-96.388.376.6
CT-D2GAN-97.285.977.7
AMAE-97.488.273.6
STM-AE-98.189.873.8
BiP-97.486.773.6
AR-AE-98.390.378.1
TAC-Net-98.188.877.2
STC-Net-96.787.873.1
HSNBM-95.291.676.5

WsVAD Methods

MethodFeatureUCF-Crime AUCUCF-Crime FARShanghaiTech AUCShanghaiTech FAR
MIRC3D (RGB)75.401.9086.300.15
TCNC3D (RGB)78.70-82.500.10
GCLNCC3D (RGB)80.673.3076.44-
ARNetC3D (RGB)--85.010.57
I3D (RGB)--85.380.27
I3D (RGB+Optical Flow)--91.240.10
MISTC3D (RGB)81.402.1993.131.71
I3D (RGB)82.300.1394.830.05
RTFMC3D (RGB)83.28-91.51-
I3D (RGB)84.30-97.21-
SMRI3D (RGB+Optical Flow)81.70---
DTEDC3D (RGB)79.490.5087.42-

FuVAD Methods

MethodPed1Ped2AvenueShanghaiTech
SDOR-83.2--
CIL(ResNet50)+DCFD-97.985.9-
CIL(ResNet50)+DCFD+CTCE-99.487.3-
CIL(I3D-RGB)+DCFD+CTCE-98.790.3-
GCL$_{PT}$(RESNEXT)---78.93

Inference Speed Comparison

Due to differences in implementation platforms and experimental setups, directly comparing performance reported in papers may lead to unfair comparisons. To address this issue, we have avoided such discussions in the main paper and only provided the reported numbers in this repository for readers' reference. The inference speeds of available methods are as follows:

YearMethodAIS (FPS)Dataset
2010ADCS0.4UCSD Ped2
2011VParsing0.13UCSD Ped1
2013Avenue150CUHK Avenue
2013SR0.26UCSD Ped1
2014ADL1.25UCSD Ped2
2015RTAD200UCSD Ped 1 & Ped2, UMN
2015STVP1UCSD Ped 1 & Ped2
2015HFR2UCSD Ped1
2017DAF20CUHK Avenue
2017Deep-cascade130UCSD Ped 1 & Ped2, UMN
2017ST-AE143CUHK Avenue, Subway, UCSD Ped 1 & Ped2
2017stacked-RNN50UCSD Ped2
2018Deep-anomaly370UCSD Ped2
2018FFP25CUHK Avenue
2019NNC24CUHK Avenue, Subway, UMN
2019OC-AE11CUHK Avenue, UCSD Ped2, SHanghaitch, UMN
2019mem-AE38UCSD Ped2
2019AnoPCN10UCSD Ped2, CUHK Avenue, ShanghaiTech
2020Clustering32UCSD Ped2
2020MNAD67UCSD Ped2
2023HN-MUM34UCSD Ped2
2023CRC46CUHK Avenue

Relevant Workshops & Tutorials

Related Topics & Tasks

Tools

Citation

If you find our work useful, please cite our paper:

@misc{liu2024networking,
  title = {Networking {{Systems}} for {{Video Anomaly Detection}}: {{A Tutorial}} and {{Survey}}},
  shorttitle = {Networking {{Systems}} for {{Video Anomaly Detection}}},
  author = {Liu, Jing and Liu, Yang and Lin, Jieyu and Li, Jielin and Sun, Peng and Hu, Bo and Song, Liang and Boukerche, Azzedine and Leung, Victor C. M.},
  year = {2024},
  month = may,
  number = {arXiv:2405.10347},
  primaryclass = {cs},
}