Home

Awesome

<p align="center"> <img src="https://github.com/user-attachments/assets/fba781e5-497d-44fa-abb5-07b3b3e8a471" width="256" style="margin-bottom: 0.2;"/> <p> <h2 align="center"> <a href="https://github.com/PKU-YuanGroup/WF-VAE/">WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model</a></h2> <h5 align="center"> If you like our project, please give us a star ⭐ on GitHub for latest update. </h2> <h5 align="center">

hf arXiv License Hits GitHub repo stars

</h5> <details open><summary>💡 I also have other projects that may interest you ✨. </summary><p> <!-- may -->

Open-Sora Plan: Open-Source Large Video Generation Model <br> Bin Lin and Yunyang Ge and Xinhua Cheng and Zongjian Li and Bin Zhu and Shaodong Wang and Xianyi He and Yang Ye and Shenghai Yuan and Liuhan Chen and Tanghui Jia and Junwu Zhang and Zhenyu Tang and Yatian Pang and Bin She and Cen Yan and Zhiheng Hu and Xiaoyi Dong and Lin Chen and Zhang Pan and Xing Zhou and Shaoling Dong and Yonghong Tian and Li Yuan <br> github github arXiv <br>

</p></details>

📰 News

😮 Highlights

WF-VAE utilizes a multi-level wavelet transform to construct an efficient energy pathway, enabling low-frequency information from video data to flow into latent representation. This method achieves competitive reconstruction performance while markedly reducing computational costs.

💡 Simpler Architecture, Faster Encoding

🔥 Competitive Reconstruction Performance with SOTA VAEs

<div align="center"> <img src="https://github.com/user-attachments/assets/e14cfd31-c5c1-4b34-af60-5a5fc2071483" style="max-width: 80%;"> </div>

🚀 Main Results

Reconstruction

<div align="center"> <img src="https://github.com/user-attachments/assets/0b9d6203-ea31-47b0-86b6-fbfaf96ddb37" style="max-width: 80%;"> </div> <table> <thead> <tr> <th>WF-VAE</th> <th>CogVideoX</th> </tr> </thead> <tbody> <tr> <td> <img src="https://github.com/user-attachments/assets/da74cce6-7878-4aff-ba4a-ed2b3c23f530" alt="WF-VAE"> </td> <td> <img src="https://github.com/user-attachments/assets/a7c8c5f4-8487-485b-80d0-81caf2b01d9f" alt="CogVideoX"> </td> </tr> </tbody> </table>

Efficiency

We conduct efficiency tests at 33-frame videos using float32 precision on an H100 GPU. All models operated without block-wise inference strategies. Our model demonstrated performance comparable to state-of-the-art VAEs while significantly reducing encoding costs.

<div align="center"> <img src="https://github.com/user-attachments/assets/53f74160-81f0-486e-b294-10dbb5bed8e5" style="max-width: 80%;"> </div>

🛠️ Requirements and Installation

git clone https://github.com/PKU-YuanGroup/WF-VAE
cd WF-VAE
conda create -n wfvae python=3.10 -y
conda activate wfvae
pip install -r requirements.txt

🤖 Reconstructing Video or Image

To reconstruct a video or an image, execute the following commands:

Video Reconstruction

CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_video.py \
    --model_name WFVAE \
    --from_pretrained "Your VAE" \
    --video_path "Video Path" \
    --rec_path rec.mp4 \
    --device cuda \
    --sample_rate 1 \
    --num_frames 65 \
    --height 512 \
    --width 512 \
    --fps 30 \
    --enable_tiling

Image Reconstruction

CUDA_VISIBLE_DEVICES=1 python scripts/recon_single_image.py \
    --model_name WFVAE \
    --from_pretrained "Your VAE" \
    --image_path assets/gt_5544.jpg \
    --rec_path rec.jpg \
    --device cuda \
    --short_size 512 

For further guidance, refer to the example scripts: examples/rec_single_video.sh and examples/rec_single_image.sh.

🗝️ Training & Validating

The training & validating instruction is in TRAIN_AND_VALIDATE.md.

👍 Acknowledgement

✏️ Citation

@misc{li2024wfvaeenhancingvideovae,
      title={WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model}, 
      author={Zongjian Li and Bin Lin and Yang Ye and Liuhan Chen and Xinhua Cheng and Shenghai Yuan and Li Yuan},
      year={2024},
      eprint={2411.17459},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17459}, 
}

🔒 License

This project is released under the Apache 2.0 license as found in the LICENSE file.