Awesome

Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

Code for CVPR 2023 paper Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

Paper Overview

Weakly-supervised Audio-Visual Video Parsing

Overview of CMPAE

<img src="./graph/framework_corrected.png" style="width:80%; display: block; margin: auto"> **Typo**: It should be noted that in the framework graph of the paper, we incorrectly labeled the name of "Absence/Presence Evidence Collecter". Here's the correct version. We are sorry for the typo.

Get Started

Dependencies

Here we list our used requirements and dependencies.

GPU: GeForce RTX 3090
Python: 3.8.6
PyTorch: 1.12.1
Other: Pandas, Openpyxl, Wandb (optional)

Prepare data

Please download the preprocessed audio and visual features from https://github.com/YapengTian/AVVP-ECCV20.
Put the downloaded features into data/feats/, and put the annotation files into data/annotations/.

Train your own models

Run ./train.sh.

Test the pre-trained model

Download the checkpoint file from Google Drive, and put it into save/pretrained/. Then run ./test.sh.

Citation

If you find the code useful in your research, please consider citing it:

@inproceedings{junyu2023CVPR_CMPAE,
  author = {Gao, Junyu and Chen, Mengyuan and Xu, Changsheng},
  title = {Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2023}
}

License

See MIT License

Acknowledgement

This repo contains modified codes from:

JoMoLD: for implementation of the backbone JoMoLD (ECCV-2022).

We sincerely thank the owners of the great repos!