Awesome

Boosting Audio-visual Zero-shot Learning with Large Language Models

This is the official code of the paper: Boosting Audio-visual Zero-shot Learning with Large Language Models. Data sets and environments can be prepared by referring to AVCA.

Boosting Audio-visual Zero-shot Learning with Large Language Models
Haoxing Chen, Yaohui Li, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang, arXiv preprint arXiv: 2311.12268

Training and Evaluating KDA

For example, conduct experiment on UCF-GZSL-main dataset:

python main.py --root_dir avgzsl_benchmark_datasets/UCF/ --feature_extraction_method main_features --input_size_audio 512 --input_size_video 512 --lr_scheduler --dataset_name UCF --zero_shot_split main_split --epochs 50 --lr 0.001 --n_batches 50 --bs 2048 --kda --retrain_all --exp_name KDA_UCF_all_main

Pay attention, you need to set the three parameters (drop_rate_enc/drop_rate_proj/momentum) in model.py from lines 207 to 209 according to our paper. Additionally, when training the model on different datasets, modify lines 177-194 in utils.py to align the action IDs accordingly; and modify line 230 in model.py to match the description file.

Citing KDA

If you use KDA in your research, please use the following BibTeX entry.

@article{KDA_2023,
      title={Boosting Audio-visual Zero-shot Learning with Large Language Models},
      author={Chen, Haoxing and Li, Yaohui and Hong, Yan and Xu, Zhuoer and Gu, Zhangxuan and Lan, Jun and Zhu, Huijia and Wang, Weiqiang},
      journal={arXiv preprint arXiv: 2311.12268},
      year={2023}
}

Acknowledgement

This repo is built based on AVCA, thanks!

Contacts

Please feel free to contact us if you have any problems.

Email: haoxingchen@smail.nju.edu.cn or hx.chen@hotmail.com