Awesome
Spectral-Motion-Alignment (AAAI 2025)
This repository is the official implementation of SMA.<br> [AAAI 2025] SMA: Spectral Motion Alignment for Video Motion Transfer using Diffusion Models. <br> Geon Yeong Park*, Hyeonho Jeong*, Sang Wan Lee, Jong Chul Ye
<p align="center"> <img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/images/SMA_model.png" width="100%"/> <br> <em>SMA framework distills the motion information in frequency-domain. Our regularization includes (1) global motion alignment based on 1D wavelet-transform, and (2) local motion refinement based on 2D Fourier transform. </em> </p>News
- [2024.03.29] Initial Code Release
Setup
Requirements
For the preliminary proof of concepts, this repository is build upon VMC (w/ Show-1 backbone).
(1) Install VMC requirements
pip install -r requirements.txt
(2) Install wavelet libraries
git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .
pip install PyWavelets
Usage
The following command will run "train & inference" at the same time:
accelerate launch train_inference.py --config configs/man_skate.yml
Additional Data
We benefit from video dataset released by VMC.
- PNG files: Google Drive Folder
- GIF files: Google Drive Folder
Results
<table class="center"> <tr> <td style="text-align:center;"><b>Input Videos</b></td> <td style="text-align:center;" colspan="1"><b>Output Videos</b></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/long/penguins_swimming2/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/long/penguins_swimming2/shark.gif"></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/demo/man_skate/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/demo/man_skate/astronaut_snow.gif"></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/cars_bridge1/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/cars_bridge1/with/turtle.gif"></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/rabbit_strawberry/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/rabbit_strawberry/with/raccoon_nuts.gif"></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/penguins_swimming1/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/penguins_swimming1/with/spaceships_space.gif"></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/butterfly/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/vmc_show1/butterfly/with/snow.gif"></td> </tr> </table>More results (w/ MotionDirector)
<table class="center"> <tr> <td style="text-align:center;"><b>Input Videos</b></td> <td style="text-align:center;" colspan="1"><b>Output Videos</b></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/motiondirector/seagull_walking/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/motiondirector/seagull_walking/with/chicken.gif"></td> </tr> <tr> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/motiondirector/seagull_skyline/input.gif"></td> <td><img src="https://geonyeong-park.github.io/spectral-motion-alignment/static/gifs/motiondirector/seagull_skyline/with/eagle.gif"></td> </tr> </table>Hyperparameters
Most configurations follows VMC.
-
ld_global
: Weight for global motion alignment ($\lambda_{g}$ in the paper). <i>Default0.4
</i> -
ld_local
: Weight for local motion refinement ($\lambda_{l}$ in the paper). <i>Default0.2
</i> -
num_levels
: Number of levels in discrete wavelet transform. <i>Default2
for 8-frames input video,3
for 16-frames input video</i> -
ld_levels
: Weight for the alignment of each wavelet coefficients. <i>Default:[1]*(num_levels+1)
</i>
Citation
If you make use of our work, please cite our paper.
@article{park2024spectral,
title={Spectral Motion Alignment for Video Motion Transfer using Diffusion Models},
author={Park, Geon Yeong and Jeong, Hyeonho and Lee, Sang Wan and Ye, Jong Chul},
journal={arXiv preprint arXiv:2403.15249},
year={2024}
}
Shoutouts
- SMA is validated on various open-source video/image diffusion models: Show-1, Zeroscope-V2, Stable Diffusion, and ControlNet.
- SMA demonstrated its compatibility with four leading video-to-video frameworks: VMC, MotionDirector, Tune-A-Video, ControlVideo.