Home

Awesome

ViDAR: Visual Point Cloud Forecasting

Visual Point Cloud Forecasting enables Scalable Autonomous Driving [CVPR 2024 Highlight]

Zetong Yang, Li Chen, Yanan Sun, and Hongyang Li

Highlights <a name="highlights"></a>

:fire: Visual point cloud forecasting, a new self-supervised pre-training task for end-to-end autonomous driving, predicting future point clouds from historical visual inputs, joint modeling the 3D geometry and temporal dynamics for simultaneous perception, prediction, and planning.

:star2: ViDAR, the first visual point cloud forecasting architecture.

method

:trophy: Predictive world model, in the form of visual point cloud forecasting, will be a main track in the CVPR 2024 Autonomous Driving Challenge. Please stay tuned for further details!

News <a name="news"></a>

TODO List <a name="TODO List"></a>

Still in progress:

Table of Contents

  1. Results and Model Zoo
  2. Installation
  3. Prepare Datasets
  4. Train and Evaluate
  5. License and Citation
  6. Related Resources

Results and Model Zoo <a name="models"></a>

Visual point cloud forecasting pre-training

NuScenes Dataset:

Pre-train ModelDatasetConfigCD@1sCD@2sCD@3smodels & logs
ViDAR-RN101-nus-1-8-1futurenuScenes (12.5% Data)vidar-nusc-pretrain-1future---models / logs
ViDAR-RN101-nus-1-8-3futurenuScenes (12.5% Data)vidar-nusc-pretrain-3future1.251.481.79models / logs
ViDAR-RN101-nus-full-1futurenuScenes (100% Data)vidar-nusc-pretrain-1future---models

OpenScene Dataset:

Pre-train ModelDatasetConfigCD@1sCD@2sCD@3smodels & logs
ViDAR-RN101-OpenScene-3futureOpenScene-mini (12.5% Data)vidar-OpenScene-pretrain-3future-1-81.411.571.78models / logs
ViDAR-RN101-OpenScene-3futureOpenScene-mini-Full (100% Data)vidar-OpenScene-pretrain-3future-full1.031.151.35models / logs

Down-stream fine-tuning (Perception)

Downstream ModelDatasetpre-trainConfigNDSmAPmodels & logs
BEVFormer-Base (baseline)nuScenes (25% Data)FCOS3Dbevformer-base43.4035.47models / logs
BEVFormer-BasenuScenes (25% Data)ViDAR-RN101-nus-1-8-1futurevidar-nusc-finetune-1future45.7736.90models / logs
BEVFormer-BasenuScenes (25% Data)ViDAR-RN101-nus-1-8-3futurevidar-nusc-finetune-3future45.6136.84models / logs
BEVFormer-Base(baseline)nuScenes (100% Data)FCOS3Dbevformer-base51.741.6models
BEVFormer-BasenuScenes (100% Data)ViDAR-RN101-nus-full-1futurevidar-nusc-finetune-1future55.3345.20models

Down-stream fine-tuning (End-to-End)

Please refer to ViDAR-UniAD page.

Installation <a name="installation"></a>

The installation step is similar to BEVFormer. For convenience, we list the steps below:

conda create -n vidar python=3.8 -y
conda activate vidar

pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
conda install -c omgarcia gcc-6 # (optional) gcc-6.2

Install mm-series packages.

pip install mmcv-full==1.4.0
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1

# Install mmdetection3d from source codes.
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install

Install Detectron2 and Timm.

pip install einops fvcore seaborn iopath==0.1.9 timm==0.6.13  typing-extensions==4.5.0 pylint ipython==8.12  numpy==1.19.5 matplotlib==3.5.2 numba==0.48.0 pandas==1.4.4 scikit-image==0.19.3 setuptools==59.5.0
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Setup ViDAR project.

git clone https://github.com/OpenDriveLab/ViDAR

cd ViDAR
mkdir pretrained
cd pretrained & wget https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth

# Install chamferdistance library.
cd third_lib/chamfer_dist/chamferdist/
pip install .

Prepare Datasets <a name="prepare-datasets"></a>

Train and Evaluate <a name="train-and-evaluate"></a>

Train

We recommand using 8 A100 GPUs for training. The GPU memory usage is around 63G while pre-training.

CONFIG=path/to/config.py
GPU_NUM=8

./tools/dist_train.sh ${CONFIG} ${GPU_NUM}

Evaluate

CONFIG=path/to/vidar_config.py
CKPT=path/to/checkpoint.pth
GPU_NUM=8

./tools/dist_test.sh ${CONFIG} ${CKPT} ${GPU_NUM}

Visualize

CONFIG=path/to/vidar_config.py
CKPT=path/to/checkpoint.pth
GPU_NUM=1

./tools/dist_test.sh ${CONFIG} ${CKPT} ${GPU_NUM} \
  --cfg-options 'model._viz_pcd_flag=True' 'model._viz_pcd_path=/path/to/output'

License and Citation <a name="license-and-citation"></a>

All assets and code are under the Apache 2.0 license unless specified otherwise.

If this work is helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{yang2023vidar,
  title={Visual Point Cloud Forecasting enables Scalable Autonomous Driving},
  author={Yang, Zetong and Chen, Li and Sun, Yanan and Li, Hongyang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

Related Resources <a name="resources"></a>

We acknowledge all the open-source contributors for the following projects to make this work possible:

<a href="https://twitter.com/OpenDriveLab" target="_blank"> <img alt="Twitter Follow" src="https://img.shields.io/twitter/follow/OpenDriveLab?style=social&color=brightgreen&logo=twitter" /> </a>