Awesome

our code is build upon open-sora (https://github.com/hpcaitech/Open-Sora), with the following features

autoregressive video generation, i.e., generating subsequent clips conditioned on last frames of previous clip
calsual generaion (by causal temporal attention)
cache sharing, the kv-cache is shared across all the denoising steps. This is differnet to the kv-cache implementation in live2diff
kv-cache queue, i.e., autoregressive generation without the redundant computation of overlapped conditional frames. the old kv-cache will be deququed
cyclic temporal positional embeddings (TPEs). i.e., we use cyclic shift to support the kv-cache queue
the key difference of our implementation compared to live2diff
- our kv-cache is shared across all the denoising steps. They store the kv-cache for all the denoising steps
- we use a cache queue structure to support the autoregressive generation, facilitated by the cyclic-TPEs

training script

an overfiting demo

bash scripts/train.sh \
    configs/causal_stdit/train_overfit_beach_demo.py \
    overfit_demo \
    9686 0

SkyTimelapse demo

bash scripts/train.sh \
    configs/causal_stdit/train_SkyTimelapse_demo.py \
    skytimelapse_demo \
    9686 0

refer to scripts/train.sh to config the ROOT_DATA_DIR

The code is preparing