

ViDAR: Visual Point Cloud Forecasting

Visual Point Cloud Forecasting enables Scalable Autonomous Driving [CVPR 2024 Highlight]

Zetong Yang, Li Chen, Yanan Sun, and Hongyang Li

Highlights <a name="highlights"></a>

:fire: Visual point cloud forecasting, a new self-supervised pre-training task for end-to-end autonomous driving, predicting future point clouds from historical visual inputs, joint modeling the 3D geometry and temporal dynamics for simultaneous perception, prediction, and planning.

:star2: ViDAR, the first visual point cloud forecasting architecture.


:trophy: Predictive world model, in the form of visual point cloud forecasting, will be a main track in the CVPR 2024 Autonomous Driving Challenge. Please stay tuned for further details!

Table of Contents

  1. Results and Model Zoo
  2. Installation
  3. Prepare Datasets
  4. Train and Evaluate
  5. License and Citation
  6. Related Resources

Results and Model Zoo <a name="models"></a>

Visual point cloud forecasting pre-training

NuScenes Dataset:

Pre-train ModelDatasetConfigCD@1sCD@2sCD@3smodels & logs
ViDAR-RN101-nus-1-8-1futurenuScenes (12.5% Data)vidar-nusc-pretrain-1future---models / logs
ViDAR-RN101-nus-1-8-3futurenuScenes (12.5% Data)vidar-nusc-pretrain-3future1.251.481.79models / logs
ViDAR-RN101-nus-full-1futurenuScenes (100% Data)vidar-nusc-pretrain-1future---models

OpenScene Dataset:

Pre-train ModelDatasetConfigCD@1sCD@2sCD@3smodels & logs
ViDAR-RN101-OpenScene-3futureOpenScene-mini (12.5% Data)vidar-OpenScene-pretrain-3future-1-81.411.571.78models / logs
ViDAR-RN101-OpenScene-3futureOpenScene-mini-Full (100% Data)vidar-OpenScene-pretrain-3future-full1.031.151.35models / logs

Down-stream fine-tuning (Perception)

Downstream ModelDatasetpre-trainConfigNDSmAPmodels & logs
BEVFormer-Base (baseline)nuScenes (25% Data)FCOS3Dbevformer-base43.4035.47models / logs
BEVFormer-BasenuScenes (25% Data)ViDAR-RN101-nus-1-8-1futurevidar-nusc-finetune-1future45.7736.90models / logs
BEVFormer-BasenuScenes (25% Data)ViDAR-RN101-nus-1-8-3futurevidar-nusc-finetune-3future45.6136.84models / logs
BEVFormer-Base(baseline)nuScenes (100% Data)FCOS3Dbevformer-base51.741.6models
BEVFormer-BasenuScenes (100% Data)ViDAR-RN101-nus-full-1futurevidar-nusc-finetune-1future55.3345.20models

Down-stream fine-tuning (End-to-End)

Please refer to ViDAR-UniAD page.

Installation <a name="installation"></a>

The installation step is similar to BEVFormer. For convenience, we list the steps below:

conda create -n vidar python=3.8 -y
conda activate vidar

pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
conda install -c omgarcia gcc-6 # (optional) gcc-6.2

Install mm-series packages.

pip install mmcv-full==1.4.0
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1

# Install mmdetection3d from source codes.
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install

Install Detectron2 and Timm.

pip install einops fvcore seaborn iopath==0.1.9 timm==0.6.13  typing-extensions==4.5.0 pylint ipython==8.12  numpy==1.19.5 matplotlib==3.5.2 numba==0.48.0 pandas==1.4.4 scikit-image==0.19.3 setuptools==59.5.0
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Setup ViDAR project.

git clone https://github.com/OpenDriveLab/ViDAR

cd ViDAR
mkdir pretrained
cd pretrained & wget https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth

# Install chamferdistance library.
cd third_lib/chamfer_dist/chamferdist/
pip install .

Prepare Datasets <a name="prepare-datasets"></a>

Train and Evaluate <a name="train-and-evaluate"></a>


We recommand using 8 A100 GPUs for training. The GPU memory usage is around 63G while pre-training.


./tools/dist_train.sh ${CONFIG} ${GPU_NUM}



./tools/dist_test.sh ${CONFIG} ${CKPT} ${GPU_NUM}



./tools/dist_test.sh ${CONFIG} ${CKPT} ${GPU_NUM} \
  --cfg-options 'model._viz_pcd_flag=True' 'model._viz_pcd_path=/path/to/output'

License and Citation <a name="license-and-citation"></a>

All assets and code are under the Apache 2.0 license unless specified otherwise.

If this work is helpful for your research, please consider citing the following BibTeX entry.

  title={Visual Point Cloud Forecasting enables Scalable Autonomous Driving},
  author={Yang, Zetong and Chen, Li and Sun, Yanan and Li, Hongyang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

Related Resources <a name="resources"></a>

We acknowledge all the open-source contributors for the following projects to make this work possible:

