Home

Awesome

FILM: Frame Interpolation for Large Motion

Website | Paper | Google AI Blog | Tensorflow Hub Colab | YouTube <br>

The official Tensorflow 2 implementation of our high quality frame interpolation neural network. We present a unified single-network approach that doesn't use additional pre-trained networks, like optical flow or depth, and yet achieve state-of-the-art results. We use a multi-scale feature extractor that shares the same convolution weights across the scales. Our model is trainable from frame triplets alone. <br>

FILM: Frame Interpolation for Large Motion <br /> Fitsum Reda<sup>1</sup>, Janne Kontkanen<sup>1</sup>, Eric Tabellion<sup>1</sup>, Deqing Sun<sup>1</sup>, Caroline Pantofaru<sup>1</sup>, Brian Curless<sup>1,2</sup><br /> <sup>1</sup>Google Research, <sup>2</sup>University of Washington<br /> In ECCV 2022.

A sample 2 seconds moment. FILM transforms near-duplicate photos into a slow motion footage that look like it is shot with a video camera.

Web Demo

Integrated into Hugging Face Spaces šŸ¤— using Gradio. Try out the Web Demo: Hugging Face Spaces

Try the interpolation model with the replicate web demo at Replicate

Try FILM to interpolate between two or more images with the PyTTI-Tools at PyTTI-Tools:FILM

An alternative Colab for running FILM on arbitrarily more input images, not just on two images, FILM-Gdrive

Change Log

Installation

git clone https://github.com/google-research/frame-interpolation
cd frame-interpolation
docker pull gcr.io/deeplearning-platform-release/tf2-gpu.2-6:latest
pip3 install -r requirements.txt
sudo apt-get install -y ffmpeg

See WINDOWS_INSTALLATION for Windows Support

Pre-trained Models

mkdir -p <pretrained_models>

The downloaded folder should have the following structure:

<pretrained_models>/
ā”œā”€ā”€ film_net/
ā”‚   ā”œā”€ā”€ L1/
ā”‚   ā”œā”€ā”€ Style/
ā”‚   ā”œā”€ā”€ VGG/
ā”œā”€ā”€ vgg/
ā”‚   ā”œā”€ā”€ imagenet-vgg-verydeep-19.mat

Running the Codes

The following instructions run the interpolator on the photos provided in 'frame-interpolation/photos'.

One mid-frame interpolation

To generate an intermediate photo from the input near-duplicate photos, simply run:

python3 -m eval.interpolator_test \
   --frame1 photos/one.png \
   --frame2 photos/two.png \
   --model_path <pretrained_models>/film_net/Style/saved_model \
   --output_frame photos/output_middle.png

This will produce the sub-frame at t=0.5 and save as 'photos/output_middle.png'.

Many in-between frames interpolation

It takes in a set of directories identified by a glob (--pattern). Each directory is expected to contain at least two input frames, with each contiguous frame pair treated as an input to generate in-between frames. Frames should be named such that when sorted (naturally) with natsort, their desired order is unchanged.

python3 -m eval.interpolator_cli \
   --pattern "photos" \
   --model_path <pretrained_models>/film_net/Style/saved_model \
   --times_to_interpolate 6 \
   --output_video

You will find the interpolated frames (including the input frames) in 'photos/interpolated_frames/', and the interpolated video at 'photos/interpolated.mp4'.

The number of frames is determined by --times_to_interpolate, which controls the number of times the frame interpolator is invoked. When the number of frames in a directory is num_frames, the number of output frames will be (2^times_to_interpolate+1)*(num_frames-1).

Datasets

We use Vimeo-90K as our main training dataset. For quantitative evaluations, we rely on commonly used benchmark datasets, specifically:

Creating a TFRecord

The training and benchmark evaluation scripts expect the frame triplets in the TFRecord storage format. <br />

We have included scripts that encode the relevant frame triplets into a tf.train.Example data format, and export to a TFRecord file. <br />

You can use the commands python3 -m datasets.create_<dataset_name>_tfrecord --help for more information.

For example, run the command below to create a TFRecord for the Middlebury-other dataset. Download the images and point --input_dir to the unzipped folder path.

python3 -m datasets.create_middlebury_tfrecord \
  --input_dir=<root folder of middlebury-other> \
  --output_tfrecord_filepath=<output tfrecord filepath> \
  --num_shards=3

The above command will output a TFRecord file with 3 shards as <output tfrecord filepath>@3.

Training

Below are our training gin configuration files for the different loss function:

training/
ā”œā”€ā”€ config/
ā”‚   ā”œā”€ā”€ film_net-L1.gin
ā”‚   ā”œā”€ā”€ film_net-VGG.gin
ā”‚   ā”œā”€ā”€ film_net-Style.gin

To launch a training, simply pass the configuration filepath to the desired experiment. <br /> By default, it uses all visible GPUs for training. To debug or train on a CPU, append --mode cpu.

python3 -m training.train \
   --gin_config training/config/<config filename>.gin \
   --base_folder <base folder for all training runs> \
   --label <descriptive label for the run>
<base_folder>/
ā”œā”€ā”€ <label>/
ā”‚   ā”œā”€ā”€ config.gin
ā”‚   ā”œā”€ā”€ eval/
ā”‚   ā”œā”€ā”€ train/
ā”‚   ā”œā”€ā”€ saved_model/

Build a SavedModel

Optionally, to build a SavedModel format from a trained checkpoints folder, you can use this command:

python3 -m training.build_saved_model_cli \
   --base_folder <base folder of training sessions> \
   --label <the name of the run>

Evaluation on Benchmarks

Below, we provided the evaluation gin configuration files for the benchmarks we have considered:

eval/
ā”œā”€ā”€ config/
ā”‚   ā”œā”€ā”€ middlebury.gin
ā”‚   ā”œā”€ā”€ ucf101.gin
ā”‚   ā”œā”€ā”€ vimeo_90K.gin
ā”‚   ā”œā”€ā”€ xiph_2K.gin
ā”‚   ā”œā”€ā”€ xiph_4K.gin

To run an evaluation, simply pass the configuration file of the desired evaluation dataset. <br /> If a GPU is visible, it runs on it.

python3 -m eval.eval_cli \
   --gin_config eval/config/<eval_dataset>.gin \
   --model_path <pretrained_models>/film_net/L1/saved_model

The above command will produce the PSNR and SSIM scores presented in the paper.

Citation

If you find this implementation useful in your works, please acknowledge it appropriately by citing:

@inproceedings{reda2022film,
 title = {FILM: Frame Interpolation for Large Motion},
 author = {Fitsum Reda and Janne Kontkanen and Eric Tabellion and Deqing Sun and Caroline Pantofaru and Brian Curless},
 booktitle = {European Conference on Computer Vision (ECCV)},
 year = {2022}
}
@misc{film-tf,
  title = {Tensorflow 2 Implementation of "FILM: Frame Interpolation for Large Motion"},
  author = {Fitsum Reda and Janne Kontkanen and Eric Tabellion and Deqing Sun and Caroline Pantofaru and Brian Curless},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/google-research/frame-interpolation}}
}

Acknowledgments

We would like to thank Richard Tucker, Jason Lai and David Minnen. We would also like to thank Jamie Aspinall for the imagery included in this repository.

Coding style

Disclaimer

This is not an officially supported Google product.