Awesome

Disentangled Motion Modeling for Video Frame Interpolation, AAAI 2025

Environmental settings

conda create -n momo python=3.10.9
conda activate momo
pip install -r requirements.txt

To configure specific GPUs to use in training / inference, use the accelerate config command before running the commands below.

Datasets

We use Vimeo90k for training, and use SNU-FILM, Xiph, Middlebury-others and Vimeo90k for validation. Download the datasets and put them under a root directory of datasets (e.g., /dataset).

The datasets should have a directory structure as follows:

└─ <dataroot>
    ├─ vimeo_triplet
    |   ├─ sequences/
    |   ├─ tri_trainlist.txt
    |   ├─ tri_testlist.txt
    |   └─ readme.txt
    |
    ├─ SNU_FILM
    |   ├─ eval_modes
    |   |   ├─ test-easy.txt
    |   |   ├─ test-medium.txt
    |   |   ├─ test-hard.txt
    |   |   ├─ test-extreme.txt
    |   └─ test
    |       ├─ GOPRO_test/
    |       └─ YouTube_test/
    |
    ├─ Middlebury
    |   └─ other-gt-interp
    |       ├─ Beanbags/
    |       ├─ DogDance/
    |       ...
    |       ├─ Urban3/
    |       └─ Walking/
    |
    └─ Xiph (could be downloaded / created with dataset.py)
        └─ frames
            ├─ BoxingPractice-001.png
            ├─ BoxingPractice-002.png
            ...
            ├─ Tango-099.png
            └─ Tango-100.png

Quick start

Download the pretrained model

The pretrained weights of our model can be downloaded from here.

Download experiments/diffusion which contain the full MoMo model. (This would be enough if you want the full model only.)

UPDATE (Dec 04, 2024) we additionally provide the weights of a lighter version of our model, MoMo-10M in the link provided above.

Extract and place the experiments directory right under this project.

It should then have a hierarchy like: MoMo/experiments/diffusion/momo_full/weights/model.pth.

For 10M model, it should be in a form like: MoMo/experiments/diffusion/momo_10m/weights/model.pth.

In case you individually want the weights of flow teacher network or the synthesis network, download experiments/flow_teacher or experiments/synthesis.

Running x2 interpolation of a video

For x2 interpolation of a video, simply run the command below

accelerate launch demo.py --video <path_to_video.mp4> --output_path <path_to_x2_video.mp4>

Note: you can also use python command instead of accelerate for single GPU inference.

python demo.py --video <path_to_video.mp4> --output_path <path_to_x2_video.mp4>

Interpolation result

Our interpolation result on Xiph can be found here. Below is an example x2 interpolation result with our model:

Input Video

Output Video

Training

The training process of our framework consist of several steps.

You can run the following script file for training, and details can be found in train_all.sh.

You may want to configure the path to the root directory where datasets are placed.

sh train_all.sh <path_to_dataroot>

Testing

Test full model (MoMo)

For evaluation on public benchmarks, run the command as below.

You can optionally choose to visualize the estimated flows by --visualize_flows.

You can save the results either as tensorboard logs by --logging, or as png by --save_as_png.

When saving the results as png files, make sure to set the directory to save the results by --png_save_dir.

# basic usage
accelerate launch eval_momo.py --name momo_full --dataroot <path_to_dataroot> --valid_dataset <dataset_name>

# example usage of testing on SNU-FILM-hard, log results on tensorboard
accelerate launch eval_momo.py --name momo_full --dataroot <path_to_dataroot> --visualize_flows --valid_dataset SNU_FILM_hard --logging

# example usage of testing on Xiph-2K data, visualize the estimated flow maps and save all results as png
accelerate launch eval_momo.py --name momo_full --dataroot <path_to_dataroot> --valid_dataset Xiph_2K --visualize_flows --save_as_png --png_save_dir ./momo_png_results/Xiph_2K

Test a lighter version

We also provide a lighter version of MoMo, with 10M parameters.

The basic usage is similar to that of the original full model, and can be used as below, with a slight change in the number of channel dimensions.

# basic usage of MoMo-10M
accelerate launch eval_momo.py --name momo_10m --dims 96 160 --dataroot <path_to_dataroot> --valid_dataset <dataset_name>

Test individual components (i.e., synthesis model, teacher flow model)

The individual components can be tested with the command below.

The weights of the synthesis model, and teacher flow model are also available here.

# synthesis model testing
accelerate launch eval_components.py --name momo_synth --dataroot <path_to_dataroot> --component synthesis

# teacher flow model testing
accelerate launch eval_components.py --name momo_teacher_Ls --dataroot <path_to_dataroot> --component flow_teacher

Citation

@article{lew2024disentangled,
  title={Disentangled Motion Modeling for Video Frame Interpolation},
  author={Lew, Jaihyun and Choi, Jooyoung and Shin, Chaehun and Jung, Dahuin and Yoon, Sungroh},
  journal={arXiv preprint arXiv:2406.17256},
  year={2024}
}