Awesome

Spectral-Motion-Alignment (AAAI 2025)

This repository is the official implementation of SMA. [AAAI 2025] SMA: Spectral Motion Alignment for Video Motion Transfer using Diffusion Models. Geon Yeong Park*, Hyeonho Jeong*, Sang Wan Lee, Jong Chul Ye

<img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/images/SMA_model.png" width="100%"/> SMA framework distills the motion information in frequency-domain. Our regularization includes (1) global motion alignment based on 1D wavelet-transform, and (2) local motion refinement based on 2D Fourier transform.

News

[2024.03.29] Initial Code Release

Setup

Requirements

For the preliminary proof of concepts, this repository is build upon VMC (w/ Show-1 backbone).

(1) Install VMC requirements

pip install -r requirements.txt

(2) Install wavelet libraries

pytorch_wavelets

git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .

PyWavelets

pip install PyWavelets

Usage

The following command will run "train & inference" at the same time:

accelerate launch train_inference.py --config configs/man_skate.yml

Additional Data

We benefit from video dataset released by VMC.

PNG files: Google Drive Folder
GIF files: Google Drive Folder

Results

<table class="center"> <tr> <td style="text-align:center;">Input Videos</td> <td style="text-align:center;" colspan="1">Output Videos</td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/long/penguins_swimming2/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/long/penguins_swimming2/shark.gif"></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/demo/man_skate/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/demo/man_skate/astronaut_snow.gif"></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/cars_bridge1/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/cars_bridge1/with/turtle.gif"></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/rabbit_strawberry/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/rabbit_strawberry/with/raccoon_nuts.gif"></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/penguins_swimming1/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/penguins_swimming1/with/spaceships_space.gif"></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/butterfly/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/butterfly/with/snow.gif"></td> </tr> </table>

More results (w/ MotionDirector)

Hyperparameters

Most configurations follows VMC.

ld_global: Weight for global motion alignment ($\lambda_{g}$ in the paper). Default 0.4
ld_local: Weight for local motion refinement ($\lambda_{l}$ in the paper). Default 0.2
num_levels: Number of levels in discrete wavelet transform. Default 2 for 8-frames input video, 3 for 16-frames input video
ld_levels: Weight for the alignment of each wavelet coefficients. Default: [1]*(num_levels+1)

Citation

If you make use of our work, please cite our paper.

@article{park2024spectral,
  title={Spectral Motion Alignment for Video Motion Transfer using Diffusion Models},
  author={Park, Geon Yeong and Jeong, Hyeonho and Lee, Sang Wan and Ye, Jong Chul},
  journal={arXiv preprint arXiv:2403.15249},
  year={2024}
}

Shoutouts

SMA is validated on various open-source video/image diffusion models: Show-1, Zeroscope-V2, Stable Diffusion, and ControlNet.
SMA demonstrated its compatibility with four leading video-to-video frameworks: VMC, MotionDirector, Tune-A-Video, ControlVideo.