Awesome
Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models
<p align="center"> <img src="./assets/attn-mask.png" width=100%> </p>Authors: Zhening Xing, Gereon Fox, Yanhong Zeng, Xingang Pan, Mohamed Elgharib, Christian Theobalt, Kai Chen † (†: corresponding author)
<a target="_blank" href="https://huggingface.co/spaces/Leoxing/Live2Diff"> <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HugginFace"/> </a>
Introduction Video
Release
- [2024/07/18] We release HuggingFace space, code, and checkpoints.
- [2024/07/22] We release Colab Demo
TODO List
- Support Colab
Key Features
<p align="center"> <img src="./assets/framework.jpg" width=100%> </p>- Uni-directional Temporal Attention with Warmup Mechanism
- Multitimestep KV-Cache for Temporal Attention during Inference
- Depth Prior for Better Structure Consistency
- Compatible with DreamBooth and LoRA for Various Styles
- TensorRT Supported
The speed evaluation is conducted on Ubuntu 20.04.6 LTS and Pytorch 2.2.2 with RTX 4090 GPU and Intel(R) Xeon(R) Platinum 8352V CPU. Denoising steps are set as 2.
Resolution | TensorRT | FPS |
---|---|---|
512 x 512 | On | 16.43 |
512 x 512 | Off | 6.91 |
768 x 512 | On | 12.15 |
768 x 512 | Off | 6.29 |
Installation
Step0: clone this repository and submodule
git clone https://github.com/open-mmlab/Live2Diff.git
# or vis ssh
git clone git@github.com:open-mmlab/Live2Diff.git
cd Live2Diff
git submodule update --init --recursive
Step1: Make Environment
Create virtual environment via conda:
conda create -n live2diff python=3.10
conda activate live2diff
Step2: Install PyTorch and xformers
Select the appropriate version for your system.
# CUDA 11.8
pip install torch torchvision xformers --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision xformers --index-url https://download.pytorch.org/whl/cu121
Please may refers to https://pytorch.org/ for more detail.
Step3: Install Project
If you want to use TensorRT acceleration (we recommend it), you can install it by the following command.
# for cuda 11.x
pip install ."[tensorrt_cu11]"
# for cuda 12.x
pip install ."[tensorrt_cu12]"
Otherwise, you can install it via
pip install .
If you want to install it with development mode (a.k.a. "Editable Installs"), you can add -e
option.
# for cuda 11.x
pip install -e ."[tensorrt_cu11]"
# for cuda 12.x
pip install -e ."[tensorrt_cu12]"
# or
pip install -e .
Step4: Download Checkpoints and Demo Data
- Download StableDiffusion-v1-5
huggingface-cli download runwayml/stable-diffusion-v1-5 --local-dir ./models/Model/stable-diffusion-v1-5
-
Download Checkpoint from HuggingFace and put it under
models
folder. -
Download Depth Detector from MiDaS's official release and put it under
models
folder. -
Apply the download token from civitAI and then download Dreambooths and LoRAs via the script:
# download all DreamBooth/Lora
bash scripts/download.sh all YOUR_TOKEN
# or download the one you want to use
bash scripts/download.sh disney YOUR_TOKEN
- Download demo data from OneDrive.
Then then data structure of models
folder should be like this:
./
|-- models
| |-- LoRA
| | |-- MoXinV1.safetensors
| | `-- ...
| |-- Model
| | |-- 3Guofeng3_v34.safetensors
| | |-- ...
| | `-- stable-diffusion-v1-5
| |-- live2diff.ckpt
| `-- dpt_hybrid_384.pt
`--data
|-- 1.mp4
|-- 2.mp4
|-- 3.mp4
`-- 4.mp4
Notification
The above installation steps (e.g. download script) are for Linux users and not well tested on Windows. If you face any difficulties, please feel free to open an issue 🤗.
Quick Start
You can try examples under data
directory. For example,
# with TensorRT acceleration, please pay patience for the first time, may take more than 20 minutes
python test.py ./data/1.mp4 ./configs/disneyPixar.yaml --max-frames -1 --prompt "1man is talking" --output work_dirs/1-disneyPixar.mp4 --height 512 --width 512 --acceleration tensorrt
# without TensorRT acceleration
python test.py ./data/2.mp4 ./configs/disneyPixar.yaml --max-frames -1 --prompt "1man is talking" --output work_dirs/1-disneyPixar.mp4 --height 512 --width 512 --acceleration none
You can adjust denoising strength via --num-inference-steps
, --strength
, and --t-index-list
. Please refers to test.py
for more detail.
Troubleshooting
- If you face Cuda Out-of-memory error with TensorRT, please try to reduce
t-index-list
orstrength
. When inference with TensorRT, we maintian a group of buffer for kv-cache, which consumes more memory. Reducet-index-list
orstrength
can reduce the size of kv-cache and save more GPU memory.
Real-Time Video2Video Demo
There is an interactive txt2img demo in demo
directory!
Please refers to demo/README.md
for more details.
Acknowledgements
The video and image demos in this GitHub repository were generated using LCM-LoRA. Stream batch in StreamDiffusion is used for model acceleration. The design of Video Diffusion Model is adopted from AnimateDiff. We use a third-party implementation of MiDaS implementation which support onnx export. Our online demo is modified from Real-Time-Latent-Consistency-Model.
BibTex
If you find it helpful, please consider citing our work:
@article{xing2024live2diff,
title={Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models},
author={Zhening Xing and Gereon Fox and Yanhong Zeng and Xingang Pan and Mohamed Elgharib and Christian Theobalt and Kai Chen},
booktitle={arXiv preprint arxiv:2407.08701},
year={2024}
}