Awesome
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Official codebase for SLAVC.
SLAVC is a new approach for weakly-supervised visual sound source localization to identify negatives and solve significant overfitting problems.
A Closer Look at Weakly-Supervised Audio-Visual Source Localization <br>Shentong Mo, Pedro Morgado<br> NeurIPS 2022.
<div align="center"> <img width="100%" alt="SLAVC Illustration" src="images/framework.png"> </div>Environment
To setup the environment, please simply run
pip install -r requirements.txt
Datasets
Flickr-SoundNet
Data can be downloaded from Learning to localize sound sources
VGG-Sound Source
Data can be downloaded from Localizing Visual Sounds the Hard Way
Extended Flickr-SoundNet
Data can be downloaded from Extended-Flickr-SoundNet
Extended VGG-Sound Source
Data can be downloaded from Extended-VGG-Sound Source
Model Zoo
We release MoVSL model pre-trained on VGG-Sound 144k data and scripts on reproducing results on Extended Flickr-SoundNet and Extended VGG-Sound Source benchmarks.
Method | Train Set | Test Set | AP | max-F1 | Precision | url | Train | Test |
---|---|---|---|---|---|---|---|---|
SLAVC | VGG-Sound 144k | Extended Flickr-SoundNet | 51.63 | 59.10 | 83.60 | model | script | script |
SLAVC | VGG-Sound 144k | Extended VGG-SS | 32.95 | 40.00 | 37.79 | model | script | script |
Train
For training an SLAVC model, please run
python train.py --multiprocessing_distributed \
--train_data_path /path/to/VGGSound-all/ \
--test_data_path /path/to/Flickr-SoundNet/ \
--test_gt_path /path/to/Flickr-SoundNet/Annotations/ \
--experiment_name vggss144k_slavc \
--model 'slavc' \
--trainset 'vggss_144k' \
--testset 'flickr' \
--epochs 20 \
--batch_size 128 \
--init_lr 0.0001 \
--use_momentum --use_mom_eval \
--m_img 0.999 --m_aud 0.999 \
--dropout_img 0.9 --dropout_aud 0
Test
For testing and visualization, simply run
python test.py --test_data_path /path/to/Extended-VGGSound-test/ \
--model_dir checkpoints \
--experiment_name vggss144k_slavc \
--testset 'vggss_plus_silent' \
--alpha 0.9 \
--relative_prediction
Citation
If you find this repository useful, please cite our paper:
@inproceedings{mo2022SLAVC,
title={A Closer Look at Weakly-Supervised Audio-Visual Source Localization},
author={Mo, Shentong and Morgado, Pedro},
booktitle={Advances in Neural Information Processing Systems},
year={2022}
}