Home

Awesome

OV-3DET: Open-Vocabulary Point-Cloud Object Detection without 3D Annotation

OV-3DET: An Open Vocabulary 3D DETector.

Paper | BibTeX

<p align="center"> <img src='Assets/overview.png' align="center" height="300px"> </p>

OV-3DET: Open-Vocabulary Point-Cloud Object Detection without 3D Annotation,
Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer and Shanghang Zhang,
Accepted to CVPR2023

Features

Installation

See installation instructions.

Dataset preparation

See dataset instructions, or directly download the processed dataset.

Training OV-3DET

Phase 1

Learn to Localize 3D Objects from 2D Pretrained Detector:

# ScanNet
bash scripts/scannet_train_loc.sh
# SUN RGB-D
bash scripts/sunrgbd_train_loc.sh

Phase 2

Learn to Classify 3D Objects from 2D Pretrained vision-language Model:

# ScanNet
bash scripts/scannet_train_dtcc.sh
# SUN RGB-D
bash scripts/sunrgbd_train_dtcc.sh

Evaluate OV-3DET

To evaluate OV-3DET, simply by running:

# ScanNet
bash scripts/evaluate_scannet.sh
# SUN RGB-D
bash scripts/evaluate_sunrgbd.sh

Pretrained Models

We provide the pretrained model weights for both "Phase 1" and "Phase 2".

<table> <tr> <th>Dataset</th> <th>Phase</th> <th>Epochs</th> <th>Model weights</th> </tr> <tr> <td>ScanNet</td> <td>1</td> <td>400</td> <td><a href="https://pan.baidu.com/s/1NxwuIsQZjHLA4Wj_7TUl_A?pwd=mdj0">weights</a></td> </tr> <tr> <td>ScanNet</td> <td>2</td> <td>50</td> <td><a href="https://pan.baidu.com/s/1hdtddyazILxZoFc8Vc2Idw?pwd=oesw">weights</a></td> </tr> <tr> <td>SUN RGB-D</td> <td>1</td> <td>400</td> <td><a href="https://pan.baidu.com/s/10blPxIgvKgRk5UkjNBZpCw?pwd=14wp">weights</a></td> </tr> <tr> <td>SUN RGB-D</td> <td>2</td> <td>50</td> <td><a href="https://pan.baidu.com/s/1ZswaKhN-NYxMzHqhLg_4eQ?pwd=31th">weights</a></td> </tr> </table>

Acknowledgement

This codebase is modified base on 3DETR [1], CLIP [2] and Detic [3], we sincerely appreciate their contributions!

[1] An end-to-end transformer model for 3d object detection. ICCV. 2021.
[2] Learning transferable visual models from natural language supervision. ICML. 2021.
[3] Detecting twenty-thousand classes using image-level supervision. ECCV. 2022.

Citation

If you find this repository helpful, please consider citing our work:

@article{lu2023open,
  title={Open-Vocabulary Point-Cloud Object Detection without 3D Annotation},
  author={Lu, Yuheng and Xu, Chenfeng and Wei, Xiaobao and Xie, Xiaodong and Tomizuka, Masayoshi and Keutzer, Kurt and Zhang, Shanghang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}