Home

Awesome

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

🔍 See our paper: "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models" Paper

🔍 See our newest Video Generation paper: "Mora: Enabling Generalist Video Generation via A Multi-Agent Framework" Paper GitHub)

📧 Please let us know if you find a mistake or have any suggestions by e-mail: lis221@lehigh.edu

Table of Contents

About

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and shows potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model's background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora's development and investigate the underlying technologies used to build this "world simulator". Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from filmmaking and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation.

<div align="center"> <img src="https://raw.githubusercontent.com/lichao-sun/SoraReview/main/image/sora_framework.png" width="85%"></div>

Updates

History of Generative AI in the Vision Domain

<div align="center"> <img src="https://raw.githubusercontent.com/lichao-sun/SoraReview/main/image/history.png" width="85%"></div>

Paper List

<div align="center"> <img src="https://raw.githubusercontent.com/lichao-sun/SoraReview/main/image/paper_list_structure.png" width="70%"></div>

Technology

Data Pre-processing

Modeling

Language Instruction Following

Prompt Engineering

Trustworthiness

Application

Movie

Education

Gaming

Healthcare

Robotics

Citation

@misc{2024SoraReview,
      title={Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models}, 
      author={Yixin Liu and Kai Zhang and Yuan Li and Zhiling Yan and Chujie Gao and Ruoxi Chen and Zhengqing Yuan and Yue Huang and Hanchi Sun and Jianfeng Gao and Lifang He and Lichao Sun},
      year={2024},
      eprint={2402.17177},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}