Awesome

🍰🎞️ Tiny AutoEncoder for Stable Diffusion Videos

What is TAESDV?

TAESDV is a Tiny AutoEncoder for Stable Diffusion Videos. TAESDV can decode sequences of Stable Diffusion latents into continuous videos with much smoother results than single-frame TAESD (but within the same tiny runtime budget).

Since TAESDV efficiently supports both parallel and sequential frame decoding, TAESDV should be useful for:

Fast batched previewing for video-generation systems like SVD or AnimateLCM.
Fast realtime decoding for interactive v2v systems like StreamDiffusion.

Original Video	TAESD Encode, TAESD Decode	TAESD Encode, TAESDV Decode

[!NOTE] Lots of TODOs still:

Add StreamDiffusion or other v2v example

Add performance metrics (it's like the same as TAESD)

Better / more example videos

Add to Diffusers somehow?

Even better checkpoint?

How can I use TAESDV for previewing generated videos?

See the AnimateLCM previewing example, which visualizes a TAESDV preview after each generation step.

How does TAESDV work?

TAESDV was created by giving TAESD's decoder additional cross-frame-memory and finetuning it on video data.

What are the limitations of TAESDV?

TAESDV is tiny and trying to work very quickly, so it tends to fudge fine details. If you want maximal quality, you should use the SVD VAE.