Home

Awesome

HD-VG-130M Dataset

This is the dataset from the paper "VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation", Wenjing Wang, Huan Yang, Zixi Tuo, Huiguo He, Junchen Zhu, Jianlong Fu, Jiaying Liu.

We curate a large-scale video dataset called HD-VG-130M. This dataset comprises 130 million text-video pairs from the open-domain, ensuring high-definition, widescreen and watermark-free characters.

UPDATE: We have created a new higher-quality 40M subset, taking into account text, motion, and aesthetics. This subset will be released soon.


Download link: Google Drive.

🎉 Up to January 2024, our dataset has been downloaded by more than 50 universities and research institutes!


LICENSE AGREEMENT

By downloading or using the data, you understand, acknowledge, and agree to all the terms in the following agreement.


If you have further questions, you may contact: Wenjing Wang (daooshee@pku.edu.cn)

@article{videofactory,
  title={VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation},
  author={Wang, Wenjing and Yang, Huan and Tuo, Zixi and He, Huiguo and Zhu, Junchen and Fu, Jianlong and Liu, Jiaying},
  journal={arXiv preprint arXiv:2305.10874},
  year={2023}
}