Home

Awesome

Multi-Attribute Interactions Matter for 3D Visual Grounding

Installation

  1. The code is now compatible with PyTorch 1.10. You can follow the instructions to build the environment.
conda create -n MA2Trans python=3.8
conda activate MA2Trans

conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
  1. To use a PointNet++ visual encoder, you need to compile its CUDA layers for PointNet++.
cd lib/pointnet2
python setup.py install
  1. We adopt bert-base-uncased from huggingface, which can be installed using pip as follows:
pip install transformers

Data Preparation

  1. For the ScanRefer dataset, you can access the original ScanNet dataset and please refer to the ScanNet Instructions. The data format is as:
"scene_id": [ScanNet scene id, e.g. "scene0000_00"],
"object_id": [ScanNet object id (corresponds to "objectId" in ScanNet aggregation file), e.g. "34"],
"object_name": [ScanNet object name (corresponds to "label" in ScanNet aggregation file), e.g. "coffee_table"],
"ann_id": [description id, e.g. "1"],
"description": [...],
"token": [a list of tokens from the tokenized description]
  1. For Nr3D and Sr3D datasets, you can refer the data preparation from referit3d.
  2. Please follow the data preprocess in vil3dref and change the PROCESSED_DATA_DIR folder according to your setting.
  3. You can download the pre-trained weight in this page and put them into the PATH_OF_BERT folder according to your setting.

Training

python main.py

Evaluation

Citation

If you find this work useful, please consider citing:

@InProceedings{xu2024multi,
author       = {Xu, Can and Han, Yuehui and Xu, Rui and Hui, Le and Xie, Jin and Yang, Jian},
title        = {Multi Attributes Interactions Matters for 3D Visual Grounding},
booktitle    = {CVPR},
year         = {2024},
}

Acknowledgement

Some of the codes are built upon referit3d and thanks for the great work.