Home

Awesome

[ECCV 2024] Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion

arXiv paper Code page

Demo:

<div align=center><img width="640" height="360" src="./assets/teaser.gif"/></div>

Framework:

<div align=center><img width="640" height="360" src="./assets/overall.png"/></div>

Abstract:

Camera-based 3D semantic scene completion (SSC) is pivotal for predicting complicated 3D layouts with limited 2D image observations. The existing mainstream solutions generally leverage temporal information by roughly stacking history frames to supplement the current frame, such straightforward temporal modeling inevitably diminishes valid clues and increases learning difficulty. To address this problem, we present HTCL, a novel Hierarchical Temporal Context Learning paradigm for improving camera-based semantic scene completion. The primary innovation of this work involves decomposing temporal context learning into two hierarchical steps: (a) cross-frame affinity measurement and (b) affinity-based dynamic refinement. Firstly, to separate critical relevant context from redundant information, we introduce the pattern affinity with scale-aware isolation and multiple independent learners for fine-grained contextual correspondence modeling. Subsequently, to dynamically compensate for incomplete observations, we adaptively refine the feature sampling locations based on initially identified locations with high affinity and their neighboring relevant regions. Our method ranks $1^{st}$ on the SemanticKITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU on the OpenOccupancy benchmark.

Table of Content

News

Quick Installation on A100

You can use our pre-picked environment on NVIDIA A100 with the following steps if using the same hardware:

a. Download the pre-picked package: occA100.

b. Unpack environment into directory occA100.

cd /opt/conda/envs/
mkdir -p occA100
tar -xzf occA100.tar.gz -C occA100 

c. Activate the environment. This adds occA100/bin to your path.

source occA100/bin/activate

You can also use Python executable file without activating or fixing the prefixes.

./occA100/bin/python

Step-by-step Installation Instructions

Following https://mmdetection3d.readthedocs.io/en/latest/getting_started.html#installation

a. Create a conda virtual environment and activate it. python > 3.7 may not be supported, because installing open3d-python with py>3.7 causes errors.

conda create -n occupancy python=3.7 -y
conda activate occupancy

b. Install PyTorch and torchvision following the official instructions.

conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge

c. Install gcc>=5 in conda env (optional). I do not use this step.

conda install -c omgarcia gcc-6 # gcc-6.2

c. Install mmcv-full.

pip install mmcv-full==1.4.0

d. Install mmdet and mmseg.

pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1

e. Install mmdet3d from source code.

cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install

f. Install other dependencies.

pip install timm
pip install open3d-python
pip install PyMCubes

Known problems

AttributeError: module 'distutils' has no attribute 'version'

The error appears due to the version of "setuptools", try:

pip install setuptools==59.5.0

Prepare Data

bash process_kitti.sh

Pretrained Model

Download Pretrained model on SemanticKITTI and Efficientnet-b7 pretrained model, put them in the folder ./pretrain.

Training & Evaluation

Single GPU

export PYTHONPATH="."  
python tools/train.py   \
            projects/configs/occupancy/semantickitti/temporal_baseline.py
export PYTHONPATH="."  
bash  run_eval_kitti.sh   \
            projects/configs/occupancy/semantickitti/temporal_baseline.py \
            pretrain/pretrain.pth 

Multiple GPUS

bash run.sh  \
        projects/configs/occupancy/semantickitti/temporal_baseline.py  n
 bash tools/dist_test.sh  \
            projects/configs/occupancy/semantickitti/temporal_baseline.py \
            pretrain/pretrain.pth  n

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgements

Many thanks to these excellent open source projects:

Citation

If you find our paper and code useful for your research, please consider citing:

@article{li2024hierarchical,
  title={Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion},
  author={Li, Bohan and Deng, Jiajun and Zhang, Wenyao and Liang, Zhujin and Du, Dalong and Jin, Xin and Zeng, Wenjun},
  journal={arXiv preprint arXiv:2407.02077},
  year={2024}
}