Home

Awesome

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection (CVPR2023)

PWC

This repository is an official implementation of CAPE

<div align="center"> <img src="figs/overview.png"/> </div><br/>

CAPE is a simple yet effective method for multi-view 3D object detection. CAPE forms the 3D position embedding under the local camera-view system rather than the global coordinate system, which largely reduces the difficulty of the view transformation learning. And CAPE supports temporal modeling by exploiting the fusion between separated queries for multi frames.

Preparation

This implementation is built upon PETR, and can be constructed as the install.md.

Train & inference

cd CAPE

You can train the model following:

sh train.sh

You can evaluate the model following:

sh test.sh

Main Results

configmAPNDSconfigdownload
cape_r50_1408x512_24ep_wocbgs_imagenet_pretrain34.7%40.6%configlog / checkpoint
capet_r50_704x256_24ep_wocbgs_imagenet_pretrain31.8%44.2%configlog / checkpoint
capet_VoV99_800x320_24ep_wocbgs_load_dd3d_pretrain44.7%54.36%configlog / checkpoint

Acknowledgement

Many thanks to the authors of mmdetection3d. Special thanks to the authors of PETR.

Citation

If you find this project useful for your research, please consider citing:

@article{Xiong2023CAPE,
  title={CAPE: Camera View Position Embedding for Multi-View 3D Object Detection},
  author={Xiong, Kaixin and Gong, Shi and Ye, Xiaoqing and Tan, Xiao and Wan, Ji and Ding, Errui and Wang, Jingdong and Bai, Xiang},
  booktitle={Computer Vision and Pattern Recognition},
  year={2023}
}

Contact

If you have any questions, feel free to open an issue or contact us at kaixinxiong@hust.edu.cn or gongshi@baidu.com or yexiaoqing@baidu.com.