Awesome
<h2 align="center"> <b>γNeurIPS 2024 π¨π¦γImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images</b> </h2>- We are the first to accomplish Open-Vocabulary 3D Object Detection tasks without using any 3D ground truth data.
- Thank you for π our ImOV3D.
Timing Yang*, Yuanliang Ju*, Li Yi <br> Shanghai Qi Zhi Institute, IIIS Tsinghua University, Shanghai AI Lab<br>
Overall Pipeline
<p align="center"> <img src='img/pipe7.png' align="center" height="400px"> </p> <!-- ## Main Results <p align="center"> <img src='img/mainresults.png' align="center" height="400px"> </p> ## More Ablation Study and Visualization <p align="center"> <img src='img/abl_1.png' align="center" height="250px"> </p> <p align="center"> <img src='img/abl_2_vis.png' align="center" height="400px"> </p> -->Environment Setup
To set up the project environment, follow this step:
Create a virtual environment:
conda env create -f environment.yml
After creating the virtual environment, activate it with:
conda activate ImOV3D
PointNet++ Backbone Installation
cd pointnet2
python setup.py install
cd ..
Dataset Preparation
Pretrain Stage
For detailed guidance on setting up the dataset for the pretraining stage, see the dataset instructions.
Adaptation
See Data Preparation for SUNRGBD or ScanNet.
You can also download Data from Baidu.
Format
--[data_name] # Root directory of the dataset
βββ [data_name]_2d_bbox_train # Training data with 2D bounding boxes
βββ [data_name]_2d_bbox_val # Validation data with 2D bounding boxes
βββ [data_name]_pc_bbox_votes_train # Training data with point cloud bounding box votes
βββ [data_name]_pc_bbox_votes_val # Validation data with point cloud bounding box votes
βββ [data_name]_trainval_train # Training data (2D image + Calib)
βββ [data_name]_trainval_eval # Evaluation data (2D image + Calib)
Pretrain Weight
Module | Description |
---|---|
PointCloudRender | Finetuned ControlNet |
DataSet | Description | Logs |
---|---|---|
LVIS | Pretrain Stage | SUNRGBD,ScanNet |
SUNRGBD | Adaptation Stage | SUNRGBD |
ScanNet | Adaptation Stage | ScanNet |
You can download then from Baidu.
Training and Evaluation
1οΈβ£ Pretrain
Pretrain ImOV3D on the LVIS dataset:
bash ./scripts/train_lvis.sh
2οΈβ£ Adapation
For the SUNRGBD dataset:
bash ./scripts/train_sunrgbd.sh
For the ScanNet dataset:
bash ./scripts/train_scannet.sh
3οΈβ£ Evaluation
To measure the effectiveness of model, proceed to the evaluation phase.
bash ./scripts/eval.sh
Contect
If you have any questions, please feel free to contact us:
Timing Yang: timingya@usc.edu Yuanliang Ju: yuanliang.ju@mail.utoronto.ca
Acknowledgement
Our code is based on ImVoteNet, OV-3DET, Detic, ControlNet, ZoeDepth, surface_normal_uncertainty.
Citation
@article{yang2024imov3d,
title={ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images},
author={Yang, Timing and Ju, Yuanliang and Yi, Li},
journal={NeurIPS 2024},
year={2024}
}