Awesome
🎬 FVDM
Official Code for Paper Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach
Authors: Yaofang Liu, Yumeng REN, Xiaodong Cun, Aitor Artola, Yang Liu, Tieyong Zeng, Raymond H. Chan, Jean-michel Morel
FVDM (Frame-aware Video Diffusion Model) introduces a novel vectorized timestep variable (VTV) to revolutionize video generation, addressing limitations in current video diffusion models (VDMs). Unlike previous VDMs, our approach allows each frame to follow an independent noise schedule, enhancing the model's capacity to capture fine-grained temporal dependencies. FVDM's flexibility is demonstrated across multiple tasks, including standard video generation, image-to-video generation, video interpolation, and long video synthesis. Through a diverse set of VTV configurations, we achieve superior quality in generated videos, overcoming challenges such as catastrophic forgetting during fine-tuning and limited generalizability in zero-shot methods.
<div align="center"><img src="https://github.com/Yaofang-Liu/FVDM/blob/7053489819c7dae13f4a3def6e97f5a0c65b5e03/Teaser.png" width="75%"/></div>💡 Highlights
- 🎞️ Vectorized Timestep Variable (VTV) for fine-grained temporal modeling
- 🔄 Great flexibility across a wide range of video generation tasks (in a zero-shot way)
- 🚀 Superior quality in generated videos
- 🙌 No additional computation cost during training and inference
🎥 Demos
With different VTV configurations, FVDM can be extended to numerous tasks (in a zero-shot way).
<div align="center"><img src="https://github.com/Yaofang-Liu/FVDM/blob/6eca425bf0bbef8f2ae6e42310105ec98c115fdf/Pipeline.png" width="75%"/></div>Below are FVDM generated videos w.r.t. datasets FaceForensics, SkyTimelapse, Taichi-HD, and UCF101. Note that the models/checkpoints are the same across different tasks (reflects strong zero-shot capabilities), and currently they are only trained with 2*A6000 GPUs.
https://github.com/user-attachments/assets/1a2c988b-d231-4e7b-9a2d-be1f96e98502
🚀 Quick Start (Coming Soon)
git clone https://github.com/Yaofang-Liu/FVDM.git
cd FVDM
📜 Citation
If you find our work useful, please consider citing:
@misc{liu2024redefiningtemporalmodelingvideo,
title={Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach},
author={Yaofang Liu and Yumeng Ren and Xiaodong Cun and Aitor Artola and Yang Liu and Tieyong Zeng and Raymond H. Chan and Jean-michel Morel},
year={2024},
eprint={2410.03160},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.03160},
}
📞 Contact
For any questions or feedback, please contact yaofanliu2-c@my.cityu.edu.hk.