Home

Awesome

SAVE: Protagonist Diversification with <U>S</U>tructure <U>A</U>gnostic <U>V</U>ideo <U>E</U>diting (ECCV 2024)

This repository contains the official implementation of <U>SAVE: Protagonist Diversification with Structure Agnostic Video Editing</U>.

Project Website arXiv 2312.02503

Teaser

<h4 align="center"> 🐱 A cat is roaring ➜ 🐶 A dog is < S<sub>mot</sub> > / 🐯 A tiger is < S<sub>mot</sub> > </h4> <p align="center"> <img src="assets/cat_flower/cat.gif" width="200" height="200"><img src="assets/cat_flower/Ours_dog.gif" width="200" height="200"><img src="assets/cat_flower/Ours_tiger.gif" width="200" height="200"> </p> <h4 align="center"> 😎 A man is skiing ➜ 🐻 A bear is < S<sub>mot</sub> > / 🐭 Mickey-Mouse is < S<sub>mot</sub> > </h4> <p align="center"> <img src="assets/man-skiing/man-skiing.gif" width="200" height="200"><img src="assets/man-skiing/Ours_bear.gif" width="200" height="200"><img src="assets/man-skiing/Ours_Mickey-Mouse.gif" width="200" height="200"> </p> <p align="center"> <em>SAVE reframes the video editing task as a motion inversion problem, seeking to find the motion word < S<sub>mot</sub> > in textual embedding space to well represent the motion in a source video. The video editing task can be achieved by isolating the motion from a single source video with < S<sub>mot</sub> > and then modifying the protagonist accordingly.</em> </p>

Setup

Requirements

pip install -r requirements.txt

Weights

We use Stable Diffusion v1-4 as our base text-to-image model and fine-tune it on a reference video for text-to-video generation. Example video weights are available at GoogleDrive.

Training

To fine-tune the text-to-image diffusion models on a custom video, run this command:

python run_train.py --config configs/<video-name>-train.yaml

Configuration file <video-name>-train.yaml contains the following arguments:

Video Editing

Once the updated weights are prepared, run this command:

python run_inference.py --config configs/<video-name>-inference.yaml

Configuration file <video-name>-inference.yaml contains the following arguments:

Citation

@inproceedings{song2025save,
  title={Save: Protagonist diversification with structure agnostic video editing},
  author={Song, Yeji and Shin, Wonsik and Lee, Junsoo and Kim, Jeesoo and Kwak, Nojun},
  booktitle={European Conference on Computer Vision},
  pages={41--57},
  year={2025},
  organization={Springer}
}

Acknowledgements

This code builds upon diffusers, Tune-A-Video and Video-P2P. Thank you for open-sourcing!