Home

Awesome

<h2 align="center"> <b>【NeurIPS 2024 πŸ‡¨πŸ‡¦γ€‘ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images</b> </h2>

ImOV3D Project on arXiv ImOV3D Project Page

Timing Yang*, Yuanliang Ju*, Li Yi <br> Shanghai Qi Zhi Institute, IIIS Tsinghua University, Shanghai AI Lab<br>

Overall Pipeline

<p align="center"> <img src='img/pipe7.png' align="center" height="400px"> </p> <!-- ## Main Results <p align="center"> <img src='img/mainresults.png' align="center" height="400px"> </p> ## More Ablation Study and Visualization <p align="center"> <img src='img/abl_1.png' align="center" height="250px"> </p> <p align="center"> <img src='img/abl_2_vis.png' align="center" height="400px"> </p> -->

Environment Setup

To set up the project environment, follow this step:

Create a virtual environment:

conda env create -f environment.yml

After creating the virtual environment, activate it with:

conda activate ImOV3D

PointNet++ Backbone Installation

cd pointnet2
python setup.py install
cd ..

Dataset Preparation

Pretrain Stage

For detailed guidance on setting up the dataset for the pretraining stage, see the dataset instructions.

Adaptation

See Data Preparation for SUNRGBD or ScanNet.

You can also download Data from Baidu.

Format

--[data_name]  # Root directory of the dataset
  β”œβ”€β”€ [data_name]_2d_bbox_train       # Training data with 2D bounding boxes
  β”œβ”€β”€ [data_name]_2d_bbox_val         # Validation data with 2D bounding boxes
  β”œβ”€β”€ [data_name]_pc_bbox_votes_train # Training data with point cloud bounding box votes
  β”œβ”€β”€ [data_name]_pc_bbox_votes_val   # Validation data with point cloud bounding box votes
  β”œβ”€β”€ [data_name]_trainval_train      # Training data (2D image + Calib)
  └── [data_name]_trainval_eval       # Evaluation data (2D image + Calib)

Pretrain Weight

ModuleDescription
PointCloudRenderFinetuned ControlNet
DataSetDescriptionLogs
LVISPretrain StageSUNRGBD,ScanNet
SUNRGBDAdaptation StageSUNRGBD
ScanNetAdaptation StageScanNet

You can download then from Baidu.

Training and Evaluation

1️⃣ Pretrain

Pretrain ImOV3D on the LVIS dataset:

bash ./scripts/train_lvis.sh

2️⃣ Adapation

For the SUNRGBD dataset:

bash ./scripts/train_sunrgbd.sh

For the ScanNet dataset:

bash ./scripts/train_scannet.sh

3️⃣ Evaluation

To measure the effectiveness of model, proceed to the evaluation phase.

bash ./scripts/eval.sh

Contect

If you have any questions, please feel free to contact us:

Timing Yang: timingya@usc.edu Yuanliang Ju: yuanliang.ju@mail.utoronto.ca

Acknowledgement

Our code is based on ImVoteNet, OV-3DET, Detic, ControlNet, ZoeDepth, surface_normal_uncertainty.

Citation

@article{yang2024imov3d,
  title={ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images},
  author={Yang, Timing and Ju, Yuanliang and Yi, Li},
  journal={NeurIPS 2024},
  year={2024}
}