Home

Awesome

CleanerS

This repository contains the official PyTorch implementation of the following CVPR 2023 paper:

Title: CleanerS: Semantic Scene Completion with Cleaner Self PDF

Author: Fengyun Wang, Dong Zhang, Hanwang Zhang, Jinhui Tang, Qianru Sun

Affiliation: NJUST, HKUST, NTU, SMU

Abstract

<p align="justify"> Semantic Scene Completion (SSC) transforms an image of single-view depth and/or RGB 2D pixels into 3D voxels, each of whose semantic labels are predicted. SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF). Due to the sensory imperfection of the depth camera, most existing methods based on the noisy TSDF estimated from depth values suffer from 1) incomplete volumetric predictions and 2) confused semantic labels. To this end, we use the ground-truth 3D voxels to generate a perfect visible surface, called TSDF-CAD, and then train a "cleaner" SSC model. As the model is noise-free, it is expected to focus more on the "imagination" of unseen voxels. Then, we propose to distill the intermediate "cleaner" knowledge into another model with noisy TSDF input. In particular, we use the 3D occupancy feature and the semantic relations of the "cleaner self" to supervise the counterparts of the "noisy self" to respectively address the above two incorrect predictions. Experimental results validate that our method improves the noisy counterparts with 3.1% IoU and 2.2% mIoU for measuring scene completion and SSC, and also achieves new state-of-the-art accuracy on the popular NYU dataset.

Overall architecture

image

<p align="justify"> CleanerS mainly soncists of two networks: a teacher network, and a student network. These two networks share same architectures but have different weights. The distillation pipelines include a feature-based cleaner surface distillation (i.e., KD-T), and logit-based cleaner semantic distillations (i.e., KD-SC and KD-SA).

Pre-trained model

Segformer-B2Model ZooVisual Results
Teacher ModelGoogle Drive / Baidu Netdisk with code:3gewGoogle Drive / Baidu Netdisk with code:p9nl
Student ModelGoogle Drive / Baidu Netdisk with code:6ejaGoogle Drive / Baidu Netdisk with code:lktg

Comparisons with SOTA

image

Usage

Requirements

Suggested installation steps:

conda create -n CleanerS python=3.7 -y
conda activate CleanerS
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install mmcv-full==1.5.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10/index.html
pip install mmsegmentation==0.27.0
conda install scikit-learn
pip install pyyaml timm tqdm EasyConfig multimethod easydict termcolor shortuuid imageio

Data preparation

We follow the project of 3D-Sketch for dataset preparing.

After preparing, your_SSC_Dataset folder should look like:

-- your_SSC_Dataset
   | NYU
   |-- TSDF
   |-- Mapping
   |   |-- trainset
   |   |-- |-- RGB
   |   |-- |-- depth
   |   |-- |-- GT
   |   |-- testset
   |   |-- |-- RGB
   |   |-- |-- depth
   |   |-- |-- GT
   | NYUCAD
   |-- TSDF
   |   |-- trainset
   |   |-- |-- depth
   |   |-- testset
   |   |-- |-- depth

Training

  1. Download the pretrained Segformer-B2, mit_b2.pth;
  2. (optional) Download the teacher model and put it into ./teacher/Teacher_ckpt.pth;
  3. Run run.sh for training the CleanerS (if you skip the step 2, it will train both teacher and student models).
  1. Download the pretrained ResNet50.

Testing with our weights

  1. Download our weights and then put it in the ./checkpoint folder.
  2. Run python test_NYU.py --pretrained_path ./checkpoint/CleanerS_ckpt.pth. The visualized results will be in the ./visual_pred/CleanerS folder.
  3. (optional) Run python test_NYU.py --pretrained_path ./checkpoint/Teacher_ckpt.pth to get the results of the teacher model.

Citation

If this work is helpful for your research, please consider citing:

@inproceedings{wang2023semantic,
  title={Semantic scene completion with cleaner self},
  author={Wang, Fengyun and Zhang, Dong and Zhang, Hanwang and Tang, Jinhui and Sun, Qianru},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={867--877},
  year={2023}
}

TODO list

Acknowledgement

This code is based on 3D-Sketch.