Home

Awesome

Stable Video Diffusion Temporal Controlnet

Overview

Introducing the Stable Video Diffusion Temporal Controlnet! This tool uses a controlnet style encoder with the svd base. It's designed to enhance your video diffusion projects by providing precise temporal control.

Setup

Demo

combined_with_square_image_new_gif

Notes

Training

My example training config is configured like this:

accelerate launch train_svd.py \
 --pretrained_model_name_or_path="stabilityai/stable-video-diffusion-img2vid" \
 --output_dir="model_out" \
 --csv_path="path-to-your-csv" \
 --video_folder="path-to-your-videos" \
 --depth_folder="path-to-your-depth" \
 --motion_folder="path-to-your-motion" \
 --validation_image_folder="./validation_demo/rgb" \
 --validation_control_folder="./validation_demo/depth" \
 --width=512 \
 --height=512 \
 --learning_rate=2e-5 \
 --per_gpu_batch_size=8 \
 --num_train_epochs=5 \
 --mixed_precision="fp16" \
 --gradient_accumulation_steps=2 \
 --checkpointing_steps=2000 \
 --validation_steps=400 \
 --gradient_checkpointing

Acknowledgements