Awesome
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Model preparation
- VidRD LDM model: GoogleDrive
- VidRD Fine-tuned VAE: GoogleDrive
- StableDiffusion 2.1: HuggingFace
Below is an example structure of these model files.
assets/
├── ModelT2V.pth
├── vae_finetuned/
│ ├── diffusion_pytorch_model.bin
│ └── config.json
└── stable-diffusion-2-1-base/
├── scheduler/...
├── text_encoder/...
├── tokenizer/...
├── unet/...
├── vae/...
├── ...
└── README.md
Environment setup
Python version needs to be >=3.10.
pip install -r requirements.txt
Model inference
Configurations for model inferences are put in configs/examples.yaml
including text prompts for video generation.
python main.py --config-name="example" \
++model.ckpt_path="assets/ModelT2V.pth" \
++model.temporal_vae_path="assets/vae_finetuned/" \
++model.pretrained_model_path="assets/stable-diffusion-2-1-base/"
BibTex
@article{reuse2023,
title = {Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation},
journal = {arXiv preprint arXiv:2309.03549},
year = {2023}
}