Home

Awesome

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

<div align=center><img src=img/radar_compare_alldata_vast.png/ width="75%" height="75%"></div>

This is the official repository of VAST which will provide code, model checkpoint and dataset. They will be released after paper is accepted.

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

<div align=center><img src=img/VAST-model.jpg/></div>

Citation

If you find this code useful for your research, please consider citing:

@article{chen2023vast,
  title={VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset},
  author={Chen, Sihan and Li, Handong and Wang, Qunbo and Zhao, Zijia and Sun, Mingzhen and Zhu, Xinxin and Liu, Jing},
  journal={arXiv preprint arXiv:2305.18500},
  year={2023}
}