Awesome
3D-SPS
Code for our CVPR 2022 Oral paper "3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection".
<div align="center"> <img src="docs/framework.png"/> </div><br/>Dataset
If you would like to access to the ScanRefer dataset, please fill out this form. Once your request is accepted, you will receive an email with the download link.
Note: In addition to language annotations in ScanRefer dataset, you also need to access the original ScanNet dataset. Please refer to the ScanNet Instructions for more details.
Download the dataset by simply executing the wget command:
wget <download_link>
Data format
"scene_id": [ScanNet scene id, e.g. "scene0000_00"],
"object_id": [ScanNet object id (corresponds to "objectId" in ScanNet aggregation file), e.g. "34"],
"object_name": [ScanNet object name (corresponds to "label" in ScanNet aggregation file), e.g. "coffee_table"],
"ann_id": [description id, e.g. "1"],
"description": [...],
"token": [a list of tokens from the tokenized description]
Setup
The code is now compatiable with PyTorch 1.6! Please execute the following command to install PyTorch
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch
Install the necessary packages listed out in requirements.txt
:
pip install -r requirements.txt
After all packages are properly installed, please run the following commands to compile the CUDA modules for the PointNet++ backbone:
cd lib/pointnet2
python setup.py install
Before moving on to the next step, please don't forget to set the project root path to the CONF.PATH.BASE
in config/default.yaml
.
Data preparation
-
Download the ScanRefer dataset and unzip it under
data/
. -
Download the text embeddings:
- preprocessed GLoVE embeddings (~990MB) and put them under
data/
. - preprocessed CLIP embeddings (~1.29MB) and put them under
data/
.
- preprocessed GLoVE embeddings (~990MB) and put them under
-
Download the ScanNetV2 dataset and put (or link)
scans/
under (or to)data/scannet/scans/
(Please follow the ScanNet Instructions for downloading the ScanNet dataset).
After this step, there should be folders containing the ScanNet scene data under the
data/scannet/scans/
with names likescene0000_00
- Pre-process ScanNet data. A folder named
scannet_data/
will be generated underdata/scannet/
after running the following command. Roughly 3.8GB free space is needed for this step:
cd data/scannet/
python batch_load_scannet_data.py
After this step, you can check if the processed scene data is valid by running:
python visualize.py --scene_id scene0000_00
-
Download the pre-trained PointNet++ backbone (
Google Drive
orBaidu Drive
(passcode: likl
)]) -
(Optional) Pre-process the multiview features from ENet.
a. Download the ENet pretrained weights (1.4MB) and put it under
data/
b. Download and decompress the extracted ScanNet frames (~13GB).
c. Change the data paths in
config.py
marked with TODO accordingly.d. Extract the ENet features:
python script/compute_multiview_features.py
e. Project ENet features from ScanNet frames to point clouds; you need ~36GB to store the generated HDF5 database:
python script/project_multiview_features.py --maxpool
You can check if the projections make sense by projecting the semantic labels from image to the target point cloud by:
python script/project_multiview_labels.py --scene_id scene0000_00 --maxpool
Usage
Training
python scripts/train.py --config ./config/default.yaml
For more training options (like using preprocessed multiview features), please see details in default.yaml
.
Evaluation
To evaluate the trained ScanRefer models, please download the trained model(Google Drive
or Baidu Drive
(passcode: x3vl
)]) and put it in the <folder_name>
under outputs/
and run :
python scripts/eval.py --config ./config/default.yaml --folder <folder_name> --reference --no_nms --force
Acknowledgement
We would like to thank the authors of ScanRefer and Group-Free for their open-source release.
License
3D-SPS
is released under the MIT license.
<a name="CitingSPS"></a>Citation
Consider cite 3D-SPS in your publications if it helps your research.
@article{luo20223d,
title={3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection},
author={Luo, Junyu and Fu, Jiahui and Kong, Xianghao and Gao, Chen and Ren, Haibing and Shen, Hao and Xia, Huaxia and Liu, Si},
journal={arXiv preprint arXiv:2204.06272},
year={2022}
}