Awesome
Total3DUnderstanding [Project Page][Oral Paper][Talk]
Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image <br> Yinyu Nie, Xiaoguang Han, Shihui Guo, Yujian Zheng, Jian Chang, Jian Jun Zhang <br> In CVPR, 2020.
<img src="demo/inputs/1/img.jpg" alt="img.jpg" width="20%" /> <img src="demo/outputs/1/3dbbox.png" alt="3dbbox.png" width="20%" /> <img src="demo/outputs/1/recon.png" alt="recon.png" width="20%" /> <br> <img src="demo/inputs/2/img.jpg" alt="img.jpg" width="20%" /> <img src="demo/outputs/2/3dbbox.png" alt="3dbbox.png" width="20%" /> <img src="demo/outputs/2/recon.png" alt="recon.png" width="20%" />
Install
This implementation uses Python 3.6, Pytorch1.1.0, cudatoolkit 9.0. We recommend to use conda to deploy the environment.
- Install with conda:
conda env create -f environment.yml
conda activate Total3D
- Install with pip:
pip install -r requirements.txt
Demo
The pretrained model can be download here. We also provide the pretrained Mesh Generation Net here. Put the pretrained models under
out/pretrained_models
A demo is illustrated below to see how the method works. vtk is used here to visualize the 3D scenes. The outputs will be saved under 'demo/outputs'. You can also play with your toy with this script.
cd Total3DUnderstanding
python main.py configs/total3d.yaml --mode demo --demo_path demo/inputs/1
Data preparation
In our paper, we use SUN-RGBD to train our Layout Estimation Net (LEN) and Object Detection Net (ODN), and use Pix3D to train our Mesh Generation Net (MGN).
Preprocess SUN-RGBD data
You can either directly download the processed training/testing data [link] to (recommended)
data/sunrgbd/sunrgbd_train_test_data
or <br> <br>
- Download the raw SUN-RGBD data to
data/sunrgbd/Dataset/SUNRGBD
- Download the 37 class labels of objects in SUN RGB-D images [link] to
data/sunrgbd/Dataset/SUNRGBD/train_test_labels
'data/sunrgbd/Dataset/data_clean'
- Follow this work to download the preprocessed ground-truth of SUN RGB-D [link], and put the '3dlayout' and 'updated_rtilt' folders respectively to
data/sunrgbd/Dataset/3dlayout
data/sunrgbd/Dataset/updated_rtilt
- Run below to generate training and testing data in 'data/sunrgbd/sunrgbd_train_test_data'.
python utils/generate_data.py
If everything goes smooth, a ground-truth scene will be visualized like
<img src="demo/gt_scene.png" alt="gt_scene.png" width="40%" align="center" />Preprocess Pix3D data
You can either directly download the preprocessed ground-truth data [link] to (recommended)
data/pix3d/train_test_data
Each sample contains the object class, 3D points (sampled on meshes), sample id and object image (w.o. mask). Samples in the training set are flipped for augmentation.
or <br> <br>
- Download the Pix3D dataset to
data/pix3d/metadata
- Run below to generate the train/test data into 'data/pix3d/train_test_data'
python utils/preprocess_pix3d.py
Training and Testing
We use the configuration file (see 'configs/****.yaml') to fully control the training/testing process. There are three subtasks in Total3D (layout estimation, object detection and mesh reconstruction). We first pretrain each task individually followed with joint training.
Pretraining
- Switch the keyword in 'configs/total3d.yaml' between ('layout_estimation', 'object_detection') as below to pretrain the two tasks individually.
train:
phase: 'layout_estimation' # or 'object_detection'
python main.py configs/total3d.yaml --mode train
The two pretrained models can be correspondingly found at
out/total3d/a_folder_named_with_script_time/model_best.pth
- Train the Mesh Generation Net by:
python main.py configs/mgnet.yaml --mode train
The pretrained model can be found at
out/mesh_gen/a_folder_named_with_script_time/model_best.pth
Joint training
List the addresses of the three pretrained models in 'configs/total3d.yaml', and modify the phase name to 'joint' as
weight: ['folder_to_layout_estimation/model_best.pth', 'folder_to_object_detection/model_best.pth', 'folder_to_mesh_recon/model_best.pth']
train:
phase: 'joint'
Then run below for joint training.
python main.py configs/total3d.yaml --mode train
The trained model can be found at
out/total3d/a_folder_named_with_script_time/model_best.pth
Testing
Please make sure the weight path is renewed as
weight: ['folder_to_fully_trained_model/model_best.pth']
and run
python main.py configs/total3d.yaml --mode test
This script generates all 3D scenes on the test set of SUN-RGBD under
out/total3d/a_folder_named_with_script_time/visualization
You can also visualize a 3D scene given the sample id as
python utils/visualize.py --result_path out/total3d/a_folder_named_with_script_time/visualization --sequence_id 274
Differences to the paper
- We retrained the model with the learning rate decreases to half if there is no gain within five steps, which is much more efficient.
- We do not provide the Faster RCNN code. Users can train their 2D detector with [link].
Citation
If you find our work is helpful, please cite
@InProceedings{Nie_2020_CVPR,
author = {Nie, Yinyu and Han, Xiaoguang and Guo, Shihui and Zheng, Yujian and Chang, Jian and Zhang, Jian Jun},
title = {Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
Our method partially follows the data processing steps in this work. If it is also helpful to you, please cite
@inproceedings{huang2018cooperative,
title={Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation},
author={Huang, Siyuan and Qi, Siyuan and Xiao, Yinxue and Zhu, Yixin and Wu, Ying Nian and Zhu, Song-Chun},
booktitle={Advances in Neural Information Processing Systems},
pages={206--217},
year={2018}
}