Home

Awesome

VidStyleODE Disentangled Video Editing via StyleGAN and NeuralODEs (ICCV 2023)

Project Page arXiv

<img src="assets/teaser.jpg" width="1000"/>

VidStyleODE Disentangled Video Editing via StyleGAN and NeuralODEs (ICCV 2023)

<div align="justify"> <b>Abstract</b>: We propose VidStyleODE, a spatiotemporally continuous disentangled Video representation based upon StyleGAN and Neural-ODEs. Effective traversal of the latent space learned by Generative Adversarial Networks (GANs) has been the basis for recent breakthroughs in image editing. However, the applicability of such advancements to the video domain has been hindered by the difficulty of representing and controlling videos in the latent space of GANs. In particular, videos are composed of content (i.e., appearance) and complex motion components that require a special mechanism to disentangle and control. To achieve this, VidStyleODE encodes the video content in a pre-trained StyleGAN W+ space and benefits from a latent ODE component to summarize the spatiotemporal dynamics of the input video. Our novel continuous video generation process then combines the two to generate high-quality and temporally consistent videos with varying frame rates. We show that our proposed method enables a variety of applications on real videos: text-guided appearance manipulation, motion manipulation, image animation, and video interpolation and extrapolation. For more details, please visit our <a href='https://cyberiada.github.io/VidStyleODE/'>project webpage</a> or read our <a href='https://arxiv.org/abs/2304.06020'>paper</a>. </div> <br>

Content

  1. Environment Setup
  2. Dataset Preparation
  3. Training
  4. Applications
  5. Citation

Environment Setup

conda create -n vidstyleode python=3.10
conda activate vidstyleode
pip install -r requirements.txt

Dataset Preparation

Downloading and Arranging Training Datasets

Please refer to RAVDESSand Fashion Dataset official websites for instructions on downloading the datasets used in the paper. You may also experiment with your own dataset. The datasets should be arranged with the following structure

Folder1
    Video_1.mp4
    Video_2.mp4
    ..
Folder2
    Video_1.mp4
    Video_2.mp4
    ..

It is recommended to extract the frames of the video for easier training. To extract the frames, please run the following command

python scripts/extract_video_frames.py \
     --source_directory <path-to-video-directory> \
     --target_directory <path-to-output-target-directory>

The output folder will have the following structure

Folder1_1
    000.png
    001.png
    ..
Folder1_2
    000.png
    001.png
    ..

Setup StyleGAN2 Generator

Setup StyleGAN2 Inversion

Folder1_1
 000.pt
 001.pt
 ..
Folder1_2
 000.pt
 001.pt
 ..

(Optional) Setup Textual Descriptions

To enable style editing, you need to provide a textual description for each training video. Please store these descriptions in a file named text_descriptions.txt within the corresponding video frames folder. For example:

Folder1_1
 000.pt
 001.pt
 ..
 text_descriptions.txt

Training Validation Split

Training

python main.py --name <tag-for-your-experiment> \
               --base <path-to-config-file>
python main.py --name <tag-for-your-experiment> \
               --base <path-to-config-file> \
               --resume <path-to-log-directory> or <path-to-checkpoint>

By default, the training checkpoint and figures will be logged under logs folder as well as into wandb. Therefore, please log in to wandb by running

wandb login

Applications

Image Animation

To generate image animation results by using the motion from a driving video, please run the following script

python scripts/image_animation.py
    --model_dir <log-dir-to-pretrained-model> \
 --n_samples <number-of-sample-to-generate> \
    --output_dir <path-to-save-dir> \
 --n_frames <num-of-frames-to-generate-per-video> \
    --spv <num-of-dirving-videos-per-sample> \ # driving videos will be chosen randomly
    --video_list <txt-file-of-possible-target-videios> \
 --img_root <path-to-videos-root-dir> \
    --inversion_root <path-to-frames-inversion-root-dir> \

Appearance Manipulation

Instructions will be added later.

Frame Interpolation

Instructions will be added later.

Frame Extrapolation

Instructions will be added later.

Citation

If you find this paper useful in your research, please consider citing:

@misc{ali2023vidstyleodedisentangledvideoediting,
 title={VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs}, 
 author={Moayed Haji Ali and Andrew Bond and Tolga Birdal and Duygu Ceylan and Levent Karacan and Erkut Erdem and Aykut Erdem},
 year={2023},
 eprint={2304.06020},
 archivePrefix={arXiv},
 primaryClass={cs.CV},
 url={https://arxiv.org/abs/2304.06020}, 
}