Awesome

Turning to Video for Transcript Sorting

This repo contains the official implementations of the two papers:

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

News

[2023.02] 🎉 TVTS is accepted to CVPR 2023.
[2023.03] The official code of TVTS has been released.
[2023.05] 🚀 TVTSv2 is comming out! Please refer to this link for details.
[2023.08] The official code of TVTSv2 and the pre-trained models have been released. All zero-shot evaluations are available on a single GPU. We provide scripts for extracting your own video features. Try it now 😎!

Introduction

Quickstart

Folder v1 contains the official code of TVTS. See v1-README for details.

Folder v2 contains the official code of TVTSv2, an upgraded version of TVTS that produces powerful video representations for out-of-the-box usage. See v2-README for details.

Citation

If you find our work helps, please cite our paper.

@InProceedings{Zeng_2023_CVPR,
    author    = {Zeng, Ziyun and Ge, Yuying and Liu, Xihui and Chen, Bin and Luo, Ping and Xia, Shu-Tao and Ge, Yixiao},
    title     = {Learning Transferable Spatiotemporal Representations From Natural Script Knowledge},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {23079-23089}
}

@misc{zeng2023tvtsv2,
      title={TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale}, 
      author={Ziyun Zeng and Yixiao Ge and Zhan Tong and Xihui Liu and Shu-Tao Xia and Ying Shan},
      year={2023},
      eprint={2305.14173},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}