Home

Awesome

<h1 align="left"> <a href="">Open-Sora Plan</a></h1>

This project aims to create a simple and scalable repo, to reproduce Sora (OpenAI, but we prefer to call it "ClosedAI" ). We wish the open-source community can contribute to this project. Pull requests are welcome! The current code supports complete training and inference using the Huawei Ascend AI computing system. Models trained on Huawei Ascend can also output video quality comparable to industry standards.

本项目希望通过开源社区的力量复现Sora,由北大-兔展AIGC联合实验室共同发起,当前版本离目标差距仍然较大,仍需持续完善和快速迭代,欢迎Pull request!目前代码同时支持使用国产AI计算系统(华为昇腾)进行完整的训练和推理。基于昇腾训练出的模型,也可输出持平业界的视频质量。

<h5 align="left">

slack badge WeChat badge Twitter <br> License GitHub repo contributors GitHub Commit Pr GitHub issues GitHub closed issues <br> GitHub repo stars  GitHub repo forks  GitHub repo watchers  GitHub repo size

</h5> <h5 align="left"> If you like our project, please give us a star ⭐ on GitHub for latest update. </h2>

📣 News

😍 Gallery

Text & Image to Video Generation.

Demo Video of Open-Sora Plan V1.3

😮 Highlights

Open-Sora Plan shows excellent performance in video generation.

🔥 High performance CausalVideoVAE, but with fewer training cost

🚀 Video Diffusion Model based on 3D attention, joint learning of spatiotemporal features.

<p align="center"> <img src="https://s21.ax1x.com/2024/07/22/pk7cob8.png" width="650" style="margin-bottom: 0.2;"/> <p>

🤗 Demo

Gradio Web UI

Highly recommend trying out our web demo by the following command.

python -m opensora.serve.gradio_web_server --model_path "path/to/model" \
    --ae WFVAEModel_D8_4x8x8 --ae_path "path/to/vae" \
    --caption_refiner "path/to/refiner" \
    --text_encoder_name_1 "path/to/text_enc" --rescale_betas_zero_snr

ComfyUI

Coming soon...

🐳 Resource

VersionArchitectureDiffusion ModelCausalVideoVAEDataPrompt Refiner
v1.3.0 [4]Skiparse 3DAnysize in 93x640x640[3], Anysize in 93x640x640_i2v[3]Anysizeprompt_refinercheckpoint
v1.2.0Dense 3D93x720p, 29x720p[1], 93x480p[1,2], 29x480p, 1x480p, 93x480p_i2vAnysizeAnnotations-
v1.1.02+1D221x512x512, 65x512x512AnysizeData and Annotations-
v1.0.02+1D65x512x512, 65x256x256, 17x256x256AnysizeData and Annotations-

[1] Please note that the weights for v1.2.0 29×720p and 93×480p were trained on Panda70M and have not undergone final high-quality data fine-tuning, so they may produce watermarks.

[2] We fine-tuned 3.5k steps from 93×720p to get 93×480p for community research use.

[3] The model is trained arbitrarily on stride=32. So keep the resolution of the inference a multiple of 32. Frames needs to be 4n+1, e.g. 93, 77, 61, 45, 29, 1 (image).

[4] Model weights are also available at OpenMind and WiseModel.

[!Warning]

<div align="left"> <b> 🚨 For version 1.2.0, we no longer support 2+1D models. </b> </div>

⚙️ Requirements and Installation

  1. Clone this repository and navigate to Open-Sora-Plan folder
git clone https://github.com/PKU-YuanGroup/Open-Sora-Plan
cd Open-Sora-Plan
  1. Install required packages We recommend the requirements as follows.

GPU

conda create -n opensora python=3.8 -y
conda activate opensora
pip install -e .

NPU

pip install torch_npu==2.1.0.post6
# ref https://github.com/dmlc/decord
git clone --recursive https://github.com/dmlc/decord
mkdir build && cd build 
cmake .. -DUSE_CUDA=0 -DCMAKE_BUILD_TYPE=Release -DFFMPEG_DIR=/usr/local/ffmpeg 
make 
cd ../python 
pwd=$PWD 
echo "PYTHONPATH=$PYTHONPATH:$pwd" >> ~/.bashrc 
source ~/.bashrc 
python3 setup.py install --user
  1. Install optional requirements such as static type checking:
pip install -e '.[dev]'

🗝️ Training & Inferencing

🗜️ CausalVideoVAE

The data preparation, training, inferencing and evaluation can be found here

📖 Prompt Refiner

The data preparation, training, inferencing can be found here

📜 Text-to-Video

The data preparation, training and inferencing can be found here

🖼️ Image-to-Video

The data preparation, training and inferencing can be found here

⚡️ Extra Save Memory

🔆 Training

During training, the entire EMA model remains in VRAM. You can enable --offload_ema or disable --use_ema. Additionally, VAE tiling is disabled by default, but you can pass --enable_tiling or disable --vae_fp32. Finally, a temporary but extreme saving memory option is enable --extra_save_mem to offload the text encoder and VAE to the CPU when not in use, though this will significantly slow down performance.

We currently have two plans: one is to continue using the Deepspeed/FSDP approach, sharding the EMA and text encoder across ranks with Zero3, which is sufficient for training 10-15B models. The other is to adopt MindSpeed for various parallel strategies, enabling us to scale the model up to 30B.

⚡️ 24G VRAM Inferencing

Please first ensure that you understand how to inference. Refer to the inference instructions in Text-to-Video. Simply specify --save_memory, and during inference, enable_model_cpu_offload(), enable_sequential_cpu_offload(), and vae.vae.enable_tiling() will be automatically activated.

💡 How to Contribute

We greatly appreciate your contributions to the Open-Sora Plan open-source community and helping us make it even better than it is now!

For more details, please refer to the Contribution Guidelines

👍 Acknowledgement and Related Work

🔒 License

✨ Star History

Star History

✏️ Citing

BibTeX

@software{pku_yuan_lab_and_tuzhan_ai_etc_2024_10948109,
  author       = {PKU-Yuan Lab and Tuzhan AI etc.},
  title        = {Open-Sora-Plan},
  month        = apr,
  year         = 2024,
  publisher    = {GitHub},
  doi          = {10.5281/zenodo.10948109},
  url          = {https://doi.org/10.5281/zenodo.10948109}
}

Latest DOI

DOI

🤝 Community contributors

<a href="https://github.com/PKU-YuanGroup/Open-Sora-Plan/graphs/contributors"> <img src="https://contrib.rocks/image?repo=PKU-YuanGroup/Open-Sora-Plan" /> </a>