Awesome
[ECCV 2024] Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
Demo:
<div align=center><img width="640" height="360" src="./assets/teaser.gif"/></div>Framework:
<div align=center><img width="640" height="360" src="./assets/overall.png"/></div>Abstract:
Camera-based 3D semantic scene completion (SSC) is pivotal for predicting complicated 3D layouts with limited 2D image observations. The existing mainstream solutions generally leverage temporal information by roughly stacking history frames to supplement the current frame, such straightforward temporal modeling inevitably diminishes valid clues and increases learning difficulty. To address this problem, we present HTCL, a novel Hierarchical Temporal Context Learning paradigm for improving camera-based semantic scene completion. The primary innovation of this work involves decomposing temporal context learning into two hierarchical steps: (a) cross-frame affinity measurement and (b) affinity-based dynamic refinement. Firstly, to separate critical relevant context from redundant information, we introduce the pattern affinity with scale-aware isolation and multiple independent learners for fine-grained contextual correspondence modeling. Subsequently, to dynamically compensate for incomplete observations, we adaptively refine the feature sampling locations based on initially identified locations with high affinity and their neighboring relevant regions. Our method ranks $1^{st}$ on the SemanticKITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU on the OpenOccupancy benchmark.
Table of Content
- News
- Quick Start
- Installation
- Prepare Data
- Pretrained Model
- Training & Evaluation
- License
- Acknowledgements
News
- [2023/07]: Demo and code released.
- [2023/07]: Paper is on arxiv.
- [2023/07]: Paper is accepted on ECCV 2024.
Quick Installation on A100
You can use our pre-picked environment on NVIDIA A100 with the following steps if using the same hardware:
a. Download the pre-picked package: occA100.
b. Unpack environment into directory occA100.
cd /opt/conda/envs/
mkdir -p occA100
tar -xzf occA100.tar.gz -C occA100
c. Activate the environment. This adds occA100/bin to your path.
source occA100/bin/activate
You can also use Python executable file without activating or fixing the prefixes.
./occA100/bin/python
Step-by-step Installation Instructions
Following https://mmdetection3d.readthedocs.io/en/latest/getting_started.html#installation
a. Create a conda virtual environment and activate it. python > 3.7 may not be supported, because installing open3d-python with py>3.7 causes errors.
conda create -n occupancy python=3.7 -y
conda activate occupancy
b. Install PyTorch and torchvision following the official instructions.
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
c. Install gcc>=5 in conda env (optional). I do not use this step.
conda install -c omgarcia gcc-6 # gcc-6.2
c. Install mmcv-full.
pip install mmcv-full==1.4.0
d. Install mmdet and mmseg.
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1
e. Install mmdet3d from source code.
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install
f. Install other dependencies.
pip install timm
pip install open3d-python
pip install PyMCubes
Known problems
AttributeError: module 'distutils' has no attribute 'version'
The error appears due to the version of "setuptools", try:
pip install setuptools==59.5.0
Prepare Data
-
a. You need to download
- The Odometry calibration (Download odometry data set (calibration files)) and the RGB images (Download odometry data set (color)) from KITTI Odometry website, extract them to the folder
data/occupancy/semanticKITTI/RGB/
. - The Velodyne point clouds (Download data_odometry_velodyne) and the SemanticKITTI label data (Download data_odometry_labels) for sparse LIDAR supervision in training process, extract them to the folders
data/lidar/velodyne/
anddata/lidar/lidarseg/
, separately.
- The Odometry calibration (Download odometry data set (calibration files)) and the RGB images (Download odometry data set (color)) from KITTI Odometry website, extract them to the folder
-
b. Prepare KITTI voxel label (see sh file for more details)
bash process_kitti.sh
Pretrained Model
Download Pretrained model on SemanticKITTI and Efficientnet-b7 pretrained model, put them in the folder ./pretrain
.
Training & Evaluation
Single GPU
- Train with single GPU:
export PYTHONPATH="."
python tools/train.py \
projects/configs/occupancy/semantickitti/temporal_baseline.py
- Evaluate with single GPUs:
export PYTHONPATH="."
bash run_eval_kitti.sh \
projects/configs/occupancy/semantickitti/temporal_baseline.py \
pretrain/pretrain.pth
Multiple GPUS
- Train with n GPUs:
bash run.sh \
projects/configs/occupancy/semantickitti/temporal_baseline.py n
- Evaluate with n GPUs:
bash tools/dist_test.sh \
projects/configs/occupancy/semantickitti/temporal_baseline.py \
pretrain/pretrain.pth n
License
This repository is released under the Apache 2.0 license as found in the LICENSE file.
Acknowledgements
Many thanks to these excellent open source projects:
Citation
If you find our paper and code useful for your research, please consider citing:
@article{li2024hierarchical,
title={Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion},
author={Li, Bohan and Deng, Jiajun and Zhang, Wenyao and Liang, Zhujin and Du, Dalong and Jin, Xin and Zeng, Wenjun},
journal={arXiv preprint arXiv:2407.02077},
year={2024}
}