Awesome
MoCoGAN-HD
Project | OpenReview | arXiv | Talk | Slides
(AFHQ, VoxCeleb)
Pytorch implementation of our method for high-resolution (e.g. 1024x1024) and cross-domain video synthesis. <br>
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Yu Tian<sup>1</sup>, Jian Ren<sup>2</sup>, Menglei Chai<sup>2</sup>, Kyle Olszewski<sup>2</sup>, Xi Peng<sup>3</sup>, Dimitris N. Metaxas<sup>1</sup>, Sergey Tulyakov<sup>2</sup>
<sup>1</sup>Rutgers Univeristy, <sup>2</sup>Snap Inc., <sup>3</sup>University of Delaware <br>
In ICLR 2021, Spotlight.
Pre-trained Image Generator & Video Datasets
In-domain Video Synthesis
UCF-101: image generator, video data, motion generator <br> FaceForensics: image generator, video data, motion generator <br> Sky-Timelapse: image generator, video data, motion generator <br>
Cross-domain Video Synthesis
(FFHQ, VoxCeleb): FFHQ image generator, VoxCeleb, motion generator<br> (AFHQ, VoxCeleb): AFHQ image generator, VoxCeleb, motion generator <br> (Anime, VoxCeleb): Anime image generator, VoxCeleb, motion generator <br> (FFHQ-1024, VoxCeleb): FFHQ-1024 image generator, VoxCeleb, motion generator <br> (LSUN-Church, TLVDB): LSUN-Church image generator, TLVDB
Calculated pca stats are saved here.
Training
Organise the video dataset as follows:
Video dataset
|-- video1
|-- img_0000.png
|-- img_0001.png
|-- img_0002.png
|-- ...
|-- video2
|-- img_0000.png
|-- img_0001.png
|-- img_0002.png
|-- ...
|-- video3
|-- img_0000.png
|-- img_0001.png
|-- img_0002.png
|-- ...
|-- ...
In-domain Video Synthesis
UCF-101
Collect the PCA components from a pre-trained image generator.
python get_stats_pca.py --batchSize 4000 \
--save_pca_path pca_stats/ucf_101 \
--pca_iterations 250 \
--latent_dimension 512 \
--img_g_weights /path/to/ucf_101_image_generator \
--style_gan_size 256 \
--gpu 0
Train the model
python -W ignore train.py --name ucf_101 \
--time_step 2 \
--lr 0.0001 \
--save_pca_path pca_stats/ucf_101 \
--latent_dimension 512 \
--dataroot /path/to/ucf_101 \
--checkpoints_dir checkpoints/ucf_101 \
--img_g_weights /path/to/ucf_101_image_generator \
--multiprocessing_distributed --world_size 1 --rank 0 \
--batchSize 16 \
--workers 8 \
--style_gan_size 256 \
--total_epoch 100 \
Inference
python -W ignore evaluate.py \
--save_pca_path pca_stats/ucf_101 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ucf_101_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch the_epoch_for_testing (should >= 0) \
--results results/ucf_101 \
--num_test_videos 10 \
FaceForensics
Collect the PCA components from a pre-trained image generator.
sh script/faceforensics/run_get_stats_pca.sh
Train the model
sh script/faceforensics/run_train.sh
Inference
sh script/faceforensics/run_evaluate.sh
Sky-Timelapse
Collect the PCA components from a pre-trained image generator.
sh script/sky_timelapse/run_get_stats_pca.sh
Train the model
sh script/sky_timelapse/run_train.sh
Inference
sh script/sky_timelapse/run_evaluate.sh
Cross-domain Video Synthesis
(FFHQ, VoxCeleb)
Collect the PCA components from a pre-trained image generator.
python get_stats_pca.py --batchSize 4000 \
--save_pca_path pca_stats/ffhq_256 \
--pca_iterations 250 \
--latent_dimension 512 \
--img_g_weights /path/to/ffhq_image_generator \
--style_gan_size 256 \
--gpu 0
Train the model
python -W ignore train.py --name ffhq_256-voxel \
--time_step 2 \
--lr 0.0001 \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--dataroot /path/to/voxel_dataset \
--checkpoints_dir checkpoints \
--img_g_weights /path/to/ffhq_image_generator \
--multiprocessing_distributed --world_size 1 --rank 0 \
--batchSize 16 \
--workers 8 \
--style_gan_size 256 \
--total_epoch 25 \
--cross_domain \
Inference
python -W ignore evaluate.py \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ffhq_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch the_epoch_for_testing (should >= 0) \
--results results/ffhq_256 \
--num_test_videos 10 \
(FFHQ-1024, VoxCeleb)
Collect the PCA components from a pre-trained image generator.
sh script/ffhq-vox/run_get_stats_pca_1024.sh
Train the model
sh script/ffhq-vox/run_train_1024.sh
Inference
sh script/ffhq-vox/run_evaluate_1024.sh
(AFHQ, VoxCeleb)
Collect the PCA components from a pre-trained image generator.
sh script/afhq-vox/run_get_stats_pca.sh
Train the model
sh script/afhq-vox/run_train.sh
Inference
sh script/afhq-vox/run_evaluate.sh
(Anime, VoxCeleb)
Collect the PCA components from a pre-trained image generator.
sh script/anime-vox/run_get_stats_pca.sh
Train the model
sh script/anime-vox/run_train.sh
Inference
sh script/anime-vox/run_evaluate.sh
(LSUN-Church, TLVDB)
Collect the PCA components from a pre-trained image generator.
sh script/lsun_church-tlvdb/run_get_stats_pca.sh
Train the model
sh script/lsun_church-tlvdb/run_train.sh
Inference
sh script/lsun_church-tlvdb/run_evaluate.sh
Fine-tuning
If you wish to resume interupted training or fine-tune a pre-trained model, run (use UCF-101 as an example):
python -W ignore train.py --name ucf_101 \
--time_step 2 \
--lr 0.0001 \
--save_pca_path pca_stats/ucf_101 \
--latent_dimension 512 \
--dataroot /path/to/ucf_101 \
--checkpoints_dir checkpoints \
--img_g_weights /path/to/ucf_101_image_generator \
--multiprocessing_distributed --world_size 1 --rank 0 \
--batchSize 16 \
--workers 8 \
--style_gan_size 256 \
--total_epoch 100 \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch 0
Training Control With Options
--w_residual
controls the step of motion residual, default value is 0.2, we recommand <= 0.5 <br>
--n_pca
# of PCA basis, used in the motion residual calculation, default value is 384 (out of 512 dim of StyleGAN2 w space), we recommand >= 256 <br>
--q_len
size of queue to save logits used in constrastive loss, default value is 4,096 <br>
--video_frame_size
spatial size of video frames for training, all synthesized video clips will be down-sampled to this size before feeding to the video discriminator, default value is 128, larger size may lead to better motion modeling <br>
--cross_domain
activate for cross-domain video synthesis, default value is False <br>
--w_match
weight for feature matching loss, default value is 1.0, large value improves content matching <br>
Long Sequence Generation
LSTM Unrolling
In inference, you can generate long sequence by LSTM unrolling with --n_frames_G
python -W ignore evaluate.py \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ffhq_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch 0 \
--n_frames_G 32
Interpolation
In inference, you can generate long sequence by interpolation with --interpolation
python -W ignore evaluate.py \
--save_pca_path pca_stats/ffhq_256 \
--latent_dimension 512 \
--style_gan_size 256 \
--img_g_weights /path/to/ffhq_image_generator \
--load_pretrain_path /path/to/checkpoints \
--load_pretrain_epoch 0 \
--interpolation
Examples of Generated Videos
UCF-101
FaceForensics
Sky Timelapse
(FFHQ, VoxCeleb)
(FFHQ-1024, VoxCeleb)
(Anime, VoxCeleb)
(LSUN-Church, TLVDB)
Citation
If you use the code for your work, please cite our paper.
@inproceedings{
tian2021a,
title={A Good Image Generator Is What You Need for High-Resolution Video Synthesis},
author={Yu Tian and Jian Ren and Menglei Chai and Kyle Olszewski and Xi Peng and Dimitris N. Metaxas and Sergey Tulyakov},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=6puCSjH3hwA}
}
Acknowledgments
This code borrows StyleGAN2 Image Generator, BigGAN Discriminator, PatchGAN Discriminator.