Home

Awesome

Awesome-Text-to-Video-Generation Awesome

A curated (continually updated) list of Text-to-Video studies. It's based on our survey paper: From Sora What We Can See: A Survey of Text-to-Video Generation. In this survey, We have conducted a comprehensive exploration of existing works in the Text-to-Video field using OpenAI’s Sora as a clue, and we have also summarized 24 datasets and 9 evaluation metrics in this field. Specifically, we discussed the problems existing in this research area and Sora itself, combined with the advantages of Sora and the characteristics of related fields to provide future research directions. If our work can inspire you, feel free to cite our paper and star our repo.

This project is curated and maintained by Rui Sun and Yumin Zhang.

@article{sun2024sora,
  title={From Sora What We Can See: A Survey of Text-to-Video Generation},
  author={Sun, Rui and Zhang, Yumin and Shah, Tejal and Sun, Jiahao and Zhang, Shuoying and Li, Wenqi and Duan, Haoran and Wei, Bo and Ranjan, Rajiv},
  journal={arXiv preprint arXiv:2405.10674},
  year={2024}
}

Topics of this repo cover: <br> Text-to-Seq-Image, Text-to-Video

Table of Content

<a name="text_to_seq_image"></a> Text-to-Seq-Image

<a name="text_to_video"></a> Text-to-Video

<a name="dataset_and_metrics"></a> Datasets & Metrics

Datasets are divided according to their collected domains: Face, Open, Movie, Action, Instruct. <br> Metrics are divided as image-level, video-level. <br>

DatasetDomainAnnotated#Clips#SentLen_C(s)Len_S#VideosResolutionFPSDur(h)YearSource
CV-TextFaceGenerated70K1400K-67.2-480P--2023Online
MSR-VTTOpenManual10K200K15.0s9.37.2K240P30402016YouTube
DideMoOpenManual27K41K6.9s8.010.5K--872017Flickr
Y-T-180MOpenASR180M---6M---2021YouTube
WVid2MOpenAlt-text2.5M2.5M18.012.02.5M360P-13K2021Web
H-100MOpenASR103M-13.432.53.3M720P-371.5K2022YouTube
InternVidOpenGenerated234M-11.717.67.1M*720P-760.3K2023YouTube
H-130MOpenGenerated130M130M-10.0-720P--2023YouTube
Y-mPOpenManual10M10M54.2----150K2023Youku
V-27MOpenGenerated27M135M12.5-----2024YouTube
P-70MOpenGenerated-70.8M8.513.270.8M720P-166.8K2024YouTube
ChronoMagic-ProOpenGenerated--234.1-460K720P-30.0K2024YouTube
LSMDCMovieManual118K118K4.8s7.02001080P-1582017Movie
MADMovieManual-384K-12.7650--1.2K2022Movie
UCF-101ActionManual13K-7.2s--240P25272012YouTube
ANet-200ActionManual100K--13.52K*720P308492015YouTube
CharadesActionManual10K16K--10K--822016Home
KineticsActionManual306K-10.0s-306K---2017YouTube
ActNetActionManual100K100K36.0s13.520K--8492017YouTube
C-EgoActionManual----8K240P-692018Home
SS-V2ActionManual----220.1K-12-2018Daily
How2InstructManual80K80K90.020.013.1K--20002018YouTube
HT100MInstructASR136M136M3.64.01.2M240P-134.5K2019YouTube
YCook2CookingManual14K14K19.68.82K--1762018YouTube
E-KitCookingManual40K40K--432*1080P60552018Home

Acknowledgement and References

Citation

If you find this repository useful, please consider citing our paper and this list:

@article{sun2024sora,
  title={From Sora What We Can See: A Survey of Text-to-Video Generation},
  author={Sun, Rui and Zhang, Yumin and Shah, Tejal and Sun, Jiahao and Zhang, Shuoying and Li, Wenqi and Duan, Haoran and Wei, Bo and Ranjan, Rajiv},
  journal={arXiv preprint arXiv:2405.10674},
  year={2024}
}

@misc{sun2024t2vgenerationlist,
  title={Awesome-Text-to-Video-Generation},
  author={Sun, Rui and Zhang, Yumin},
  year={2024},
  publisher={GitHub},
  howpublished={\url{https://github.com/soraw-ai/Awesome-Text-to-Video-Generation}},
}