Awesome
Boosting Audio-visual Zero-shot Learning with Large Language Models
This is the official code of the paper: Boosting Audio-visual Zero-shot Learning with Large Language Models. Data sets and environments can be prepared by referring to AVCA.
Boosting Audio-visual Zero-shot Learning with Large Language Models
Haoxing Chen, Yaohui Li, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang, arXiv preprint arXiv: 2311.12268
Training and Evaluating KDA
For example, conduct experiment on UCF-GZSL-main dataset:
python main.py --root_dir avgzsl_benchmark_datasets/UCF/ --feature_extraction_method main_features --input_size_audio 512 --input_size_video 512 --lr_scheduler --dataset_name UCF --zero_shot_split main_split --epochs 50 --lr 0.001 --n_batches 50 --bs 2048 --kda --retrain_all --exp_name KDA_UCF_all_main
Pay attention, you need to set the three parameters (drop_rate_enc/drop_rate_proj/momentum) in model.py from lines 207 to 209 according to our paper. Additionally, when training the model on different datasets, modify lines 177-194 in utils.py to align the action IDs accordingly; and modify line 230 in model.py to match the description file.
Citing KDA
If you use KDA in your research, please use the following BibTeX entry.
@article{KDA_2023,
title={Boosting Audio-visual Zero-shot Learning with Large Language Models},
author={Chen, Haoxing and Li, Yaohui and Hong, Yan and Xu, Zhuoer and Gu, Zhangxuan and Lan, Jun and Zhu, Huijia and Wang, Weiqiang},
journal={arXiv preprint arXiv: 2311.12268},
year={2023}
}
Acknowledgement
This repo is built based on AVCA, thanks!
Contacts
Please feel free to contact us if you have any problems.