Home

Awesome

3D Photo Stylization: Learning to Generate Stylized Novel Views from a Single Image

Paper | Project Page | Poster | Video Demo <br>

Fangzhou Mu<sup>1</sup>,Jian Wang <sup>2#</sup>, Yicheng Wu<sup>2#</sup>, Yin Li<sup>1#</sup> <br> <sup>1</sup>University of Wisconsin-Madison, <sup>2</sup>Snap Research<br> (<sup>#</sup>co-corresponding authors)<br> CVPR 2022 (Oral)<br>

<p align="center"> <img src="assets/teaser.png" alt="teaser" width="90%" height="90%"> </p>

Overview

This is the official PyTorch implementation of our method for 3D Photo Stylization. Our method takes a single content image and an arbitrary style image, and outputs high-quality stylized renderings that are consistent across views. Our implementation follows a multi-stage pipeline outlined below.

<p align="center"> <img src="assets/workflow.png" alt="workflow" width="90%" height="90%"> </p>

Note that our repository does not include third-party code for depth estimation and 3D photo inpainting. Please use the sample LDIs and style images in the samples folder to reproduce our results in the paper.

Quick Start

Preliminaries

git clone https://github.com/fmu2/3d_photo_stylization.git
cd 3d_photo_stylization
conda create -n 3d_photo_stylization python=3.7
conda activate 3d_photo_stylization
conda install -c pytorch pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=11.0
conda install -c conda-forge tensorboard tensorboardx
conda install -c anaconda scipy pyyaml
conda install -c conda-forge ffmpeg imageio imageio-ffmpeg
conda install -c conda-forge plyfile matplotlib
conda install -c open3d-admin open3d
python setup_render.py build_ext --inplace
python setup_pointnet2.py build_ext --inplace

Training

Download the pre-processed COCO2014 dataset here and the WikiArt dataset here. Unzip the files.

cd coco_pcd
tar -xvzf train.tar.gz
tar -xvzf val.tar.gz
cd ..
cd wikiart
tar -xvzf train.tar.gz
tar -xvzf val.tar.gz
cd ..

Images from the COCO2014 training split are resized to 448 x 448 and converted into LDIs following the aforementioned steps. Each LDI (equivalently 3D point cloud) contains approximately 300K points. We use 80K LDIs for training and the remaining for validation. The raw WikiArt images are extremely large. We down-sample them to 512 x 512 and keep the original training and validation splits.

3d_photo_stylization
│   README.md
│   ...    
│
└───data
│   └───coco_pcd/
│   │   └───train/
│   │   │   └───ldi/
│   │   │   │   └───ldi/
│   │   │   │   │   │   COCO_train2014_000000XXXXXX.mat
│   │   │   │   │   │   COCO_train2014_000000XXXXXX.mat
│   │   │   │   │   │   ...
│   │   └───val/
│   │   │   └───ldi/
│   │   │   │   └───ldi/
│   │   │   │   │   │   COCO_train2014_000000XXXXXX.mat
│   │   │   │   │   │   COCO_train2014_000000XXXXXX.mat
│   │   │   │   │   │   ...
│   └───wikiart/
│   │   └───train/
│   │   │   │   XXX.jpg
│   │   │   │   XXX.jpg
│   │   │   │   ...
│   │   └───val/
│   │   │   │   XXX.jpg
│   │   │   │   XXX.jpg
│   │   │   │   ...
└───configs
│   │   inpaint3d.yaml
│   │   stylize3d.yaml
│   │   │   │   ...
└───log
│   ...

First, train the encoder-decoder network for view synthesis.

python train_inpaint.py -d {data_path} -c configs/{file_name}.yaml -n {job_name} -g {gpu_id}

The latest model checkpoint inpaint-last.pth and the config file inpaint-config.yaml will be saved under log/{job_name}.

Next, freeze the encoder, train the style transfer module and fine-tune the decoder.

python train_stylize.py -d {data_path} -s {style_path} -c configs/{file_name}.yaml -n {job_name} -g {gpu_id}

Make sure that the two training stages share the same job name. The latest model checkpoint stylize-last.pth and the config file stylize-config.yaml will be saved under log/{job_name}.

Pre-trained models

Our pretrained models can be found in the ckpt folder. Use ckpt/inpaint.pth for view synthesis (without stylization), and ckpt/stylize.pth for stylization.

Evaluation

Our code supports rasterization of the input RGB point cloud without an encoder-decoder pass. This is similar to 3DPhoto except that we do not convert the point cloud into a mesh before rendering.

python test_ldi_render.py -n {job_name} -g {gpu_id} -ldi {ldi_path} -cam {cam_motion} -x {x_bounds} -y {y_bounds} -z {z_bounds} -f {num_frames}

The output video will be saved under test/out/ldi_render/{job_name}_{cam_motion}.

Alternatively, the encoder-decoder network is also capable of novel view synthesis (without stylization) when no style image is given. This is mostly for diagnostic purpose and is not the intended usage of our model.

python test_ldi_model.py -n {job_name} -g {gpu_id} -m {model_path} -ldi {ldi_path} -cam {cam_motion} -x {x_bound} -y {y_bound} -z {z_bound} -f {num_frames}

The output video will be saved at test/out/ldi_model/{job_name}_{cam_motion}.

To synthesizes a stylized 3D photo, run

python test_ldi_model.py -n {job_name} -g {gpu_id} -ldi {ldi_path} -s {style_path} -ss {style_size} -cam {cam_motion} -x {x_bound} -y {y_bound} -z {z_bound} -f {num_frames} -ndc

The output video will be saved under test/out/ldi_model/{job_name}_{style_name}_{cam_motion}.

Importantly, re-scale the point cloud for better results. This can be accomplished using the -pc flag. For example, -pc 2 works well for content images of size 1080x720.

Point cloud visualization

To save an inpainted point cloud (without normalization), run

python test_ldi_pcd.py -n {job_name} -ldi {ldi_path}

To visualize an inpainted point cloud in NDC space, run

python test_ldi_pcd.py -n {job_name} -ldi {ldi_path} -plot -ds 10

The point cloud (in .ply format) and its NDC-space visualization (in .png format) will be saved under test/out/ldi_pcd/{job_name}.

Extension for multi-view input

Our method readily supports stylized novel view synthesis given multiple input views. Following StyleScene, we provide selected input views, their associated camera poses and depth maps, and a pre-defined camera trajectory for each scene in the Tanks and Temples dataset.

To synthesize novel views without stylization, run

python mvs_render.py -n {job_name} -g {gpu_id} -d {data_path}

To synthesize stylized novel views, run

python test_mvs.py -n {job_name} -g {gpu_id} -m {model_path} -d {data_path} -s {style_path} -ss {style_size} -ndc

Contact

Fangzhou Mu (fmu2@wisc.edu)

Related Code Repos

Our code relies heavily on the following repos:

Our code is inspired by the following repos:

Reference

@inproceedings{mu20223d,
  title={3D Photo Stylization: Learning to Generate Stylized Novel Views from a Single Image},
  author={Mu, Fangzhou and Wang, Jian and Wu, Yicheng and Li, Yin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={16273--16282},
  year={2022}
}