Home

Awesome

3D-Aware Video Generation</sub>

Random Sample

3D-Aware Video Generation<br> Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Hao Tang, Gordon Wetzstein, Leonidas Guibas, Luc Van Gool, Radu Timofte<br>

Project Page | Paper<br>

Abstract: Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos. We show that our method learns a rich embedding of decomposable 3D structures and motions that enables new visual effects of spatio-temporal renderings while producing imagery with quality comparable to that of existing 3D or video GANs.

Requirements

The codebase is tested on

For additional python libraries, please install by:

pip install -r requirements.txt

Please refer to https://github.com/NVlabs/stylegan2-ada-pytorch for additional software/hardware requirements.

Dataset

Datasets have to be in a subdirectory, as the dataset class is setup for different splits / classes, e.g., position the videos as /path/to/dataset/subdirectory/{video_0...video_x}/{img_0...img_y}.png. Then specify /path/to/dataset as the dataset path Datasets can be downloaded here:

We resize FaceForensics and MEAD to the 256x256 resolution and TaiChi to the 128x128 resolution.

Pre-trained Checkpoints

You can download the pre-trained checkpoints used in our paper:

DatasetResolutionDownload
FaceForensics256Google Drive
FaceForensics (pre-trained on FFHQ)256Google Drive
MEAD256Google Drive
TaiChi128Google Drive

Train a new model

python run_train.py outdir=/path/to/experiment_output data=/path/to/dataset cache_metrics_dir=/path/to/experiment_output/metrics_cache spec=paper model=stylenerf_faceforensics resolution=256

Please check configuration files at conf/model and conf/spec. You can always add your own model config. More details on how to use hydra configuration please follow https://hydra.cc/docs/intro/.

Render videos with a pre-trained model

python generate.py --outdir /path/to/output --truncation_psi 1.0 --seeds 0 --network_pkl /path/to/network.pkl --render_program rotation_camera_yaw --time_steps 16 --n_steps 16

Or use visualize.sh to generate videos from 10 (or more) different seeds for all rendering programs directly.

Evaluate model

Use evaluate.sh to evaluate a trained model for the FVD metric.

License

Our main code is based on the StyleNeRF and DIGAN repositories, while our evaluation code follows the StyleGAN-V implementation. Hence, the majority of our code is licensed under CC-BY-NC, however, portions of this project are available under a separate license terms: all codes used or modified from stylegan2-ada-pytorch are under the Nvidia Source Code License.