

Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization

This repo holds the code for the work presented on ACM Multimedia 2020 [Paper]

Usage Guide


We provide the implementation in PyTorch for the ease of use.

Install the requirements by runing the following command:

pip install -r requirements.txt

Code and Data Preparation

We highly appreciate @YapengTian for the shared features and code.

Download Features

Two kinds of features (i.e., Visual features and Audio features) are required for experiments.

After downloading the features, please place them into the data folder. The structure of the data folder is shown as follows:

├── audio_feature.h5
├── audio_feature_noisy.h5
├── labels.h5
├── labels_noisy.h5
├── mil_labels.h5
├── test_order.h5
├── train_order.h5
├── val_order.h5
├── visual_feature.h5
└── visual_feature_noisy.h5

Download Datasets (Optional)

You can download the AVE dataset from the repo here.

Training and testing CMRAN in a fully-supervised setting

You can run the following command for training and testing the model. We evaluate the model on the test set every epoch (set by the arg "eval_freq" in the configs/default_config.yaml file) when training.

bash supv_train.sh
# The argument "--snapshot_pref" denotes the path for saving checkpoints and code.


bash supv_test.sh

After training, there will be a checkpoint file whose name contains the accuracy on the test set and the number of epoch.

Training and testing CMRAN in a Weakly-supervised setting

Similar to training the model in a fully-supervised setting, you can run training and testing using the following commands:


bash weak_train.sh


bash weak_test.sh


Please cite the following paper if you feel this repo useful to your research

  author    = {Haoming Xu and
               Runhao Zeng and
               Qingyao Wu and
               Mingkui Tan and
               Chuang Gan},
  title     = {Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization},
  booktitle   = {{ACM} International Conference on Multimedia},
  year      = {2020},