Home

Awesome

MACIL_SD

PWC

[ACM MM 2022] Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection

Jiashuo Yu*, Jinyu Liu*, Ying Cheng, Rui Feng, Yuejie Zhang (* equal contribution)

Paper

Overview

<p align="center"> <img src=overview.png width="800" height="300"/> </p>

Results

Our model achieves state-of-the-art results on the XD-Violence dataset while maintaining low parameter amounts.

MethodModalityAP (%)Params
Ours (light)Audio & Visual82.170.347M
Ours (full)Audio & Visual83.400.678M

XD-Violence Dataset & Features

The audio and visual features of the XD-Violence dataset can be downloaded at this link. Note that in this paper, only the RGB and VGGish features are required. You can download the RGB.zip, RGBTest.zip, and vggish-features.zip and unzip them into the data/ folder.

Requirements

python==3.7.11  
torch==1.6.0  
cuda==10.1  
numpy==1.17.4

Note that the reported results are obtained by training on a single Tesla V100 GPU. We observe that different GPU types and torch/cuda versions can lead to slightly different results.

Training

python main.py --model_name=macil_sd

Testing

python infer.py --model_dir=macil_sd.pkl

Citation

If you find our work interesting and useful, please consider citing it.

@article{yu2022macil,
  title={Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection},
  author={Jiashuo Yu, Jinyu Liu, Ying Cheng, Rui Feng, Yuejie Zhang},
  journal={arXiv preprint arXiv:2207.05500},
  year={2022}
}  

License

This project is released under the MIT License.

Acknowledgements

The codes are based on XDVioDet and RTFM. We sincerely thank them for their efforts. If you have further questions, please contact us at jsyu19@fudan.edu.cn and jinyuliu20@fudan.edu.cn.