Home

Awesome

<div align="center"> <!-- TITLE -->

Video Diffusion Alignment via Reward Gradient

VADER

arXiv Website Demo

</div>

This is the official implementation of our paper Video Diffusion Alignment via Reward Gradient by

Mihir Prabhudesai*, Russell Mendonca*, Zheyang Qin*, Katerina Fragkiadaki, Deepak Pathak .

<!-- DESCRIPTION -->

Abstract

We have made significant progress towards building foundational video diffusion models. As these models are trained using large-scale unsupervised data, it has become crucial to adapt these models to specific downstream tasks, such as video-text alignment or ethical video generation. Adapting these models via supervised fine-tuning requires collecting target datasets of videos, which is challenging and tedious. In this work, we instead utilize pre-trained reward models that are learned via preferences on top of powerful discriminative models. These models contain dense gradient information with respect to generated RGB pixels, which is critical to be able to learn efficiently in complex search spaces, such as videos. We show that our approach can enable alignment of video diffusion for aesthetic generations, similarity between text context and video, as well long horizon video generations that are 3X longer than the training sequence length. We show our approach can learn much more efficiently in terms of reward queries and compute than previous gradient-free approaches for video generation.

Features

Demo

<img src="assets/videos/8.gif" width=""><img src="assets/videos/5.gif" width=""><img src="assets/videos/7.gif" width="">
<img src="assets/videos/10.gif" width=""><img src="assets/videos/3.gif" width=""><img src="assets/videos/4.gif" width="">
<img src="assets/videos/9.gif" width=""><img src="assets/videos/1.gif" width=""><img src="assets/videos/11.gif" width="">

🌟 VADER-VideoCrafter

We highly recommend proceeding with the VADER-VideoCrafter model first, which performs better.

⚙️ Installation

Assuming you are in the VADER/ directory, you are able to create a Conda environments for VADER-VideoCrafter using the following commands:

cd VADER-VideoCrafter
conda create -n vader_videocrafter python=3.10
conda activate vader_videocrafter
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install xformers -c xformers
pip install -r requirements.txt
git clone https://github.com/tgxs002/HPSv2.git
cd HPSv2/
pip install -e .
cd ..

📺 Inference

Please run accelerate config as the first step to configure accelerator settings. If you are not familiar with the accelerator configuration, you can refer to VADER-VideoCrafter documentation.

Assuming you are in the VADER/ directory, you are able to do inference using the following commands:

cd VADER-VideoCrafter
sh scripts/run_text2video_inference.sh

🔧 Training

Please run accelerate config as the first step to configure accelerator settings. If you are not familiar with the accelerator configuration, you can refer to VADER-VideoCrafter documentation.

Assuming you are in the VADER/ directory, you are able to train the model using the following commands:

cd VADER-VideoCrafter
sh scripts/run_text2video_train.sh

🎬 VADER-Open-Sora

⚙️ Installation

Assuming you are in the VADER/ directory, you are able to create a Conda environments for VADER-Open-Sora using the following commands:

cd VADER-Open-Sora
conda create -n vader_opensora python=3.10
conda activate vader_opensora
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install xformers -c xformers
pip install -v -e .
git clone https://github.com/tgxs002/HPSv2.git
cd HPSv2/
pip install -e .
cd ..

📺 Inference

Please run accelerate config as the first step to configure accelerator settings. If you are not familiar with the accelerator configuration, you can refer to VADER-Open-Sora documentation.

Assuming you are in the VADER/ directory, you are able to do inference using the following commands:

cd VADER-Open-Sora
sh scripts/run_text2video_inference.sh

🔧 Training

Please run accelerate config as the first step to configure accelerator settings. If you are not familiar with the accelerator configuration, you can refer to VADER-Open-Sora documentation.

Assuming you are in the VADER/ directory, you are able to train the model using the following commands:

cd VADER-Open-Sora
sh scripts/run_text2video_train.sh

🎥 ModelScope

⚙️ Installation

Assuming you are in the VADER/ directory, you are able to create a Conda environments for VADER-ModelScope using the following commands:

cd VADER-ModelScope
conda create -n vader_modelscope python=3.10
conda activate vader_modelscope
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install xformers -c xformers
pip install -r requirements.txt
git clone https://github.com/tgxs002/HPSv2.git
cd HPSv2/
pip install -e .
cd ..

📺 Inference

Please run accelerate config as the first step to configure accelerator settings. If you are not familiar with the accelerator configuration, you can refer to VADER-ModelScope documentation.

Assuming you are in the VADER/ directory, you are able to do inference using the following commands:

cd VADER-ModelScope
sh run_text2video_inference.sh

🔧 Training

Please run accelerate config as the first step to configure accelerator settings. If you are not familiar with the accelerator configuration, you can refer to VADER-ModelScope documentation.

Assuming you are in the VADER/ directory, you are able to train the model using the following commands:

cd VADER-ModelScope
sh run_text2video_train.sh

💡 Tutorial

This section is to provide a tutorial on how to implement the VADER method on VideoCrafter and Open-Sora by yourself. We will provide a step-by-step guide to help you understand the modification details. Thus, you can easily adapt the VADER method to later versions of VideCrafter.

Acknowledgement

Our codebase is directly built on top of VideoCrafter, Open-Sora, and Animate Anything. We would like to thank the authors for open-sourcing their code.

Citation

If you find this work useful in your research, please cite:

@misc{prabhudesai2024videodiffusionalignmentreward,
      title={Video Diffusion Alignment via Reward Gradients}, 
      author={Mihir Prabhudesai and Russell Mendonca and Zheyang Qin and Katerina Fragkiadaki and Deepak Pathak},
      year={2024},
      eprint={2407.08737},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.08737}, 
}