Awesome
Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
Junyu Gao, Mengyuan Chen, Changsheng Xu
Code for CVPR 2023 paper Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
Paper Overview
Weakly-supervised Audio-Visual Video Parsing
<img src="./graph/task.png" style="width:70%; display: block; margin: auto">Overview of CMPAE
<img src="./graph/framework_corrected.png" style="width:80%; display: block; margin: auto"> **Typo**: It should be noted that in the framework graph of the paper, we incorrectly labeled the name of "Absence/Presence Evidence Collecter". Here's the correct version. We are sorry for the typo.Get Started
Dependencies
Here we list our used requirements and dependencies.
- GPU: GeForce RTX 3090
- Python: 3.8.6
- PyTorch: 1.12.1
- Other: Pandas, Openpyxl, Wandb (optional)
Prepare data
- Please download the preprocessed audio and visual features from https://github.com/YapengTian/AVVP-ECCV20.
- Put the downloaded features into
data/feats/
, and put the annotation files intodata/annotations/
.
Train your own models
Run ./train.sh
.
Test the pre-trained model
Download the checkpoint file from Google Drive, and put it into save/pretrained/
.
Then run ./test.sh
.
Citation
If you find the code useful in your research, please consider citing it:
@inproceedings{junyu2023CVPR_CMPAE,
author = {Gao, Junyu and Chen, Mengyuan and Xu, Changsheng},
title = {Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2023}
}
License
See MIT License
Acknowledgement
This repo contains modified codes from:
- JoMoLD: for implementation of the backbone JoMoLD (ECCV-2022).
We sincerely thank the owners of the great repos!