Awesome
Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization
This repo holds the code for the work presented on ACM Multimedia 2020 [Paper]
Usage Guide
Prerequisites
We provide the implementation in PyTorch for the ease of use.
Install the requirements by runing the following command:
pip install -r requirements.txt
Code and Data Preparation
We highly appreciate @YapengTian for the shared features and code.
Download Features
Two kinds of features (i.e., Visual features and Audio features) are required for experiments.
-
Visual Features: You can download the VGG visual features from here.
-
Audio Features: You can download the VGG-like audio features from here.
-
Additional Features: You can download the features of background videos here, which are required for the experiments of the weakly-supervised setting.
After downloading the features, please place them into the data
folder. The structure of the data
folder is shown as follows:
data
├── audio_feature.h5
├── audio_feature_noisy.h5
├── labels.h5
├── labels_noisy.h5
├── mil_labels.h5
├── test_order.h5
├── train_order.h5
├── val_order.h5
├── visual_feature.h5
└── visual_feature_noisy.h5
Download Datasets (Optional)
You can download the AVE dataset from the repo here.
Training and testing CMRAN in a fully-supervised setting
You can run the following command for training and testing the model.
We evaluate the model on the test set every epoch (set by the arg "eval_freq"
in the configs/default_config.yaml
file) when training.
bash supv_train.sh
# The argument "--snapshot_pref" denotes the path for saving checkpoints and code.
Evaluating
bash supv_test.sh
After training, there will be a checkpoint file whose name contains the accuracy on the test set and the number of epoch.
Training and testing CMRAN in a Weakly-supervised setting
Similar to training the model in a fully-supervised setting, you can run training and testing using the following commands:
Training
bash weak_train.sh
Evaluating
bash weak_test.sh
Citation
Please cite the following paper if you feel this repo useful to your research
@inproceedings{CMRAN2020Xu,
author = {Haoming Xu and
Runhao Zeng and
Qingyao Wu and
Mingkui Tan and
Chuang Gan},
title = {Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization},
booktitle = {{ACM} International Conference on Multimedia},
year = {2020},
}