Home

Awesome

<p align="center"> <img width="55%" alt="VideoSys" src="./assets/figures/logo.png?raw=true"> </p> <h3 align="center"> An easy and efficient system for video generation </h3> <p align="center">| <a href="https://github.com/NUS-HPC-AI-Lab/VideoSys?tab=readme-ov-file#installation">Quick Start</a> | <a href="https://github.com/NUS-HPC-AI-Lab/VideoSys?tab=readme-ov-file#usage">Supported Models</a> | <a href="https://github.com/NUS-HPC-AI-Lab/VideoSys?tab=readme-ov-file#acceleration-techniques">Accelerations</a> | <a href="https://discord.gg/WhPmYm9FeG">Discord</a> | <a href="https://oahzxl.notion.site/VideoSys-News-42391db7e0a44f96a1f0c341450ae472?pvs=4">Media</a> | <a href="https://huggingface.co/VideoSys">HuggingFace Space</a> | </p>

Allegro

We release Allegro multi-card inference demo here. VideoSys support context-parallel inference and Pyramid Attention Broadcast (PAB) which remarkably reduce the inference time. With 8xH800, Allegro can generate a 100 step 88 frames 720p(720x1280) video in 3 minutes. Further with PAB, the generate time can be reduced to 2 minutes.

Note that Allegro only support context-parallel with 8,4,2 cards. The context parallel number should be the factor of attention head dim (24) and context-length (79,200). You just need to pass the context-parallel num in --num-gpus. For more details please refer to examples/allegro/sample.py.

See model weights and original github here.

Quick start

  1. Install requirements with VideoSys-the original repo's guidlines.
  2. run inference with
python examples/allegro/sample.py

For the consistency with the original repo, I hard-coded all the params. You can customize prompt, steps and other params in examples/allegro/sample.py.

Thanks again for the original author of VideoSys!

Latest News 🔥

About

VideoSys is an open-source project that provides a user-friendly and high-performance infrastructure for video generation. This comprehensive toolkit will support the entire pipeline from training and inference to serving and compression.

We are committed to continually integrating cutting-edge open-source video models and techniques. Stay tuned for exciting enhancements and new features on the horizon!

Installation

Prerequisites:

We strongly recommend using Anaconda to create a new environment (Python >= 3.10) to run our examples:

conda create -n videosys python=3.10 -y
conda activate videosys

Install VideoSys:

git clone https://github.com/NUS-HPC-AI-Lab/VideoSys
cd VideoSys
pip install -e .

Usage

VideoSys supports many diffusion models with our various acceleration techniques, enabling these models to run faster and consume less memory.

<b>You can find all available models and their supported acceleration techniques in the following table. Click Code to see how to use them.</b>

<table> <tr> <th rowspan="2">Model</th> <th rowspan="2">Train</th> <th rowspan="2">Infer</th> <th colspan="2">Acceleration Techniques</th> <th rowspan="2">Usage</th> </tr> <tr> <th><a href="https://github.com/NUS-HPC-AI-Lab/VideoSys?tab=readme-ov-file#dyanmic-sequence-parallelism-dsp-paperdoc">DSP</a></th> <th><a href="https://github.com/NUS-HPC-AI-Lab/VideoSys?tab=readme-ov-file#pyramid-attention-broadcast-pab-blogdoc">PAB</a></th> </tr> <tr> <td>Vchitect [<a href="https://github.com/Vchitect/Vchitect-2.0">source</a>]</td> <td align="center">/</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center"><a href="./examples/vchitect/sample.py">Code</a></td> </tr> <tr> <td>CogVideoX [<a href="https://github.com/THUDM/CogVideo">source</a>]</td> <td align="center">/</td> <td align="center">✅</td> <td align="center">/</td> <td align="center">✅</td> <td align="center"><a href="./examples/cogvideox/sample.py">Code</a></td> </tr> <tr> <td>Latte [<a href="https://github.com/Vchitect/Latte">source</a>]</td> <td align="center">/</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center"><a href="./examples/latte/sample.py">Code</a></td> </tr> <tr> <td>Open-Sora-Plan [<a href="https://github.com/PKU-YuanGroup/Open-Sora-Plan">source</a>]</td> <td align="center">/</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center"><a href="./examples/open_sora_plan/sample.py">Code</a></td> </tr> <tr> <td>Open-Sora [<a href="https://github.com/hpcaitech/Open-Sora">source</a>]</td> <td align="center">🟡</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center">✅</td> <td align="center"><a href="./examples/open_sora/sample.py">Code</a></td> </tr> </table>

You can also find easy demo with HuggingFace Space <a href="https://huggingface.co/VideoSys">[link]</a> and Gradio <a href="./gradio">[link]</a>.

Acceleration Techniques

Pyramid Attention Broadcast (PAB) [paper][blog][doc]

Real-Time Video Generation with Pyramid Attention Broadcast

Authors: Xuanlei Zhao<sup>1*</sup>, Xiaolong Jin<sup>2*</sup>, Kai Wang<sup>1*</sup>, and Yang You<sup>1</sup> (* indicates equal contribution)

<sup>1</sup>National University of Singapore, <sup>2</sup>Purdue University

method

PAB is the first approach to achieve <b>real-time</b> DiT-based video generation, delivering <b>lossless quality</b> without <b>requiring any training</b>. By mitigating redundant attention computation, PAB achieves up to 21.6 FPS with 10.6x acceleration, without sacrificing quality across popular DiT-based video generation models including Open-Sora, Latte and Open-Sora-Plan.

See its details here.


Dyanmic Sequence Parallelism (DSP) [paper][doc]

dsp_overview

DSP is a novel, elegant and super efficient sequence parallelism for Open-Sora, Latte and other multi-dimensional transformer architecture.

It achieves 3x speed for training and 2x speed for inference in Open-Sora compared with sota sequence parallelism (DeepSpeed Ulysses). For a 10s (80 frames) of 512x512 video, the inference latency of Open-Sora is:

Method1xH8008xH800 (DS Ulysses)8xH800 (DSP)
Latency(s)1064522

See its details here.

Contributing

We welcome and value any contributions and collaborations. Please check out CONTRIBUTING.md for how to get involved.

Contributors

<a href="https://github.com/NUS-HPC-AI-Lab/VideoSys/graphs/contributors"> <img src="https://contrib.rocks/image?repo=NUS-HPC-AI-Lab/VideoSys"/> </a>

Star History

Star History Chart

Citation

@misc{videosys2024,
  author={VideoSys Team},
  title={VideoSys: An Easy and Efficient System for Video Generation},
  year={2024},
  publisher={GitHub},
  url = {https://github.com/NUS-HPC-AI-Lab/VideoSys},
}

@misc{zhao2024pab,
  title={Real-Time Video Generation with Pyramid Attention Broadcast},
  author={Xuanlei Zhao and Xiaolong Jin and Kai Wang and Yang You},
  year={2024},
  eprint={2408.12588},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2408.12588},
}

@misc{zhao2024dsp,
  title={DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers},
  author={Xuanlei Zhao and Shenggan Cheng and Chang Chen and Zangwei Zheng and Ziming Liu and Zheming Yang and Yang You},
  year={2024},
  eprint={2403.10266},
  archivePrefix={arXiv},
  primaryClass={cs.DC},
  url={https://arxiv.org/abs/2403.10266},
}

@misc{zhao2024opendit,
  author={Xuanlei Zhao, Zhongkai Zhao, Ziming Liu, Haotian Zhou, Qianli Ma, and Yang You},
  title={OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference},
  year={2024},
  publisher={GitHub},
  url={https://github.com/NUS-HPC-AI-Lab/VideoSys/tree/v1.0.0},
}