Home

Awesome

MoCoGAN-HD

Project | OpenReview | arXiv | Talk | Slides

(AFHQ, VoxCeleb)

Pytorch implementation of our method for high-resolution (e.g. 1024x1024) and cross-domain video synthesis. <br> A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Yu Tian<sup>1</sup>, Jian Ren<sup>2</sup>, Menglei Chai<sup>2</sup>, Kyle Olszewski<sup>2</sup>, Xi Peng<sup>3</sup>, Dimitris N. Metaxas<sup>1</sup>, Sergey Tulyakov<sup>2</sup>
<sup>1</sup>Rutgers Univeristy, <sup>2</sup>Snap Inc., <sup>3</sup>University of Delaware <br> In ICLR 2021, Spotlight.

Pre-trained Image Generator & Video Datasets

In-domain Video Synthesis

UCF-101: image generator, video data, motion generator <br> FaceForensics: image generator, video data, motion generator <br> Sky-Timelapse: image generator, video data, motion generator <br>

Cross-domain Video Synthesis

(FFHQ, VoxCeleb): FFHQ image generator, VoxCeleb, motion generator<br> (AFHQ, VoxCeleb): AFHQ image generator, VoxCeleb, motion generator <br> (Anime, VoxCeleb): Anime image generator, VoxCeleb, motion generator <br> (FFHQ-1024, VoxCeleb): FFHQ-1024 image generator, VoxCeleb, motion generator <br> (LSUN-Church, TLVDB): LSUN-Church image generator, TLVDB

Calculated pca stats are saved here.

Training

Organise the video dataset as follows:

Video dataset
|-- video1
    |-- img_0000.png
    |-- img_0001.png
    |-- img_0002.png
    |-- ...
|-- video2
    |-- img_0000.png
    |-- img_0001.png
    |-- img_0002.png
    |-- ...
|-- video3
    |-- img_0000.png
    |-- img_0001.png
    |-- img_0002.png
    |-- ...
|-- ...

In-domain Video Synthesis

UCF-101

Collect the PCA components from a pre-trained image generator.

python get_stats_pca.py --batchSize 4000 \
  --save_pca_path pca_stats/ucf_101 \
  --pca_iterations 250 \
  --latent_dimension 512 \
  --img_g_weights /path/to/ucf_101_image_generator \
  --style_gan_size 256 \
  --gpu 0

Train the model

python -W ignore train.py --name ucf_101 \
  --time_step 2 \
  --lr 0.0001 \
  --save_pca_path pca_stats/ucf_101 \
  --latent_dimension 512 \
  --dataroot /path/to/ucf_101 \
  --checkpoints_dir checkpoints/ucf_101 \
  --img_g_weights /path/to/ucf_101_image_generator \
  --multiprocessing_distributed --world_size 1 --rank 0 \
  --batchSize 16 \
  --workers 8 \
  --style_gan_size 256 \
  --total_epoch 100 \

Inference

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ucf_101 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ucf_101_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch the_epoch_for_testing (should >= 0) \
  --results results/ucf_101 \
  --num_test_videos 10 \

FaceForensics

Collect the PCA components from a pre-trained image generator.

sh script/faceforensics/run_get_stats_pca.sh

Train the model

sh script/faceforensics/run_train.sh

Inference

sh script/faceforensics/run_evaluate.sh

Sky-Timelapse

Collect the PCA components from a pre-trained image generator.

sh script/sky_timelapse/run_get_stats_pca.sh

Train the model

sh script/sky_timelapse/run_train.sh

Inference

sh script/sky_timelapse/run_evaluate.sh

Cross-domain Video Synthesis

(FFHQ, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

python get_stats_pca.py --batchSize 4000 \
  --save_pca_path pca_stats/ffhq_256 \
  --pca_iterations 250 \
  --latent_dimension 512 \
  --img_g_weights /path/to/ffhq_image_generator \
  --style_gan_size 256 \
  --gpu 0

Train the model

python -W ignore train.py --name ffhq_256-voxel \
  --time_step 2 \
  --lr 0.0001 \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --dataroot /path/to/voxel_dataset \
  --checkpoints_dir checkpoints \
  --img_g_weights /path/to/ffhq_image_generator \
  --multiprocessing_distributed --world_size 1 --rank 0 \
  --batchSize 16 \
  --workers 8 \
  --style_gan_size 256 \
  --total_epoch 25 \
  --cross_domain \

Inference

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ffhq_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch the_epoch_for_testing (should >= 0) \
  --results results/ffhq_256 \
  --num_test_videos 10 \

(FFHQ-1024, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

sh script/ffhq-vox/run_get_stats_pca_1024.sh

Train the model

sh script/ffhq-vox/run_train_1024.sh

Inference

sh script/ffhq-vox/run_evaluate_1024.sh

(AFHQ, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

sh script/afhq-vox/run_get_stats_pca.sh

Train the model

sh script/afhq-vox/run_train.sh

Inference

sh script/afhq-vox/run_evaluate.sh

(Anime, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

sh script/anime-vox/run_get_stats_pca.sh

Train the model

sh script/anime-vox/run_train.sh

Inference

sh script/anime-vox/run_evaluate.sh

(LSUN-Church, TLVDB)

Collect the PCA components from a pre-trained image generator.

sh script/lsun_church-tlvdb/run_get_stats_pca.sh

Train the model

sh script/lsun_church-tlvdb/run_train.sh

Inference

sh script/lsun_church-tlvdb/run_evaluate.sh

Fine-tuning

If you wish to resume interupted training or fine-tune a pre-trained model, run (use UCF-101 as an example):

python -W ignore train.py --name ucf_101 \
  --time_step 2 \
  --lr 0.0001 \
  --save_pca_path pca_stats/ucf_101 \
  --latent_dimension 512 \
  --dataroot /path/to/ucf_101 \
  --checkpoints_dir checkpoints \
  --img_g_weights /path/to/ucf_101_image_generator \
  --multiprocessing_distributed --world_size 1 --rank 0 \
  --batchSize 16 \
  --workers 8 \
  --style_gan_size 256 \
  --total_epoch 100 \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch 0

Training Control With Options

--w_residual controls the step of motion residual, default value is 0.2, we recommand <= 0.5 <br> --n_pca # of PCA basis, used in the motion residual calculation, default value is 384 (out of 512 dim of StyleGAN2 w space), we recommand >= 256 <br> --q_len size of queue to save logits used in constrastive loss, default value is 4,096 <br> --video_frame_size spatial size of video frames for training, all synthesized video clips will be down-sampled to this size before feeding to the video discriminator, default value is 128, larger size may lead to better motion modeling <br> --cross_domain activate for cross-domain video synthesis, default value is False <br> --w_match weight for feature matching loss, default value is 1.0, large value improves content matching <br>

Long Sequence Generation

LSTM Unrolling

In inference, you can generate long sequence by LSTM unrolling with --n_frames_G

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ffhq_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch 0 \
  --n_frames_G 32

Interpolation

In inference, you can generate long sequence by interpolation with --interpolation

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ffhq_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch 0 \
  --interpolation

Examples of Generated Videos

UCF-101

FaceForensics

Sky Timelapse

(FFHQ, VoxCeleb)

(FFHQ-1024, VoxCeleb)

(Anime, VoxCeleb)

(LSUN-Church, TLVDB)

Citation

If you use the code for your work, please cite our paper.

@inproceedings{
tian2021a,
title={A Good Image Generator Is What You Need for High-Resolution Video Synthesis},
author={Yu Tian and Jian Ren and Menglei Chai and Kyle Olszewski and Xi Peng and Dimitris N. Metaxas and Sergey Tulyakov},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=6puCSjH3hwA}
}

Acknowledgments

This code borrows StyleGAN2 Image Generator, BigGAN Discriminator, PatchGAN Discriminator.