Home

Awesome

LanguageRefer: Spatial-Language Model for 3D Visual Grounding

This is an implementation of CoRL 2021 paper "LanguageRefer: Spatial-Language Model for 3D Visual Grounding" by Roh et al. [pdf][project]

For a video, examples of ReferIt3D datasets, qualitative results of the model, and a link to the orientation annotation, please visit the project page (https://sites.google.com/view/language-refer).

LR Figure

Instruction to run the code

For running the code, we have to install the prerequisites, setup an environment, and then run the code. You would be able to run the evaluation code if you follow the instruction step by step.

WARNING: one of scripts contains the code that modifies your ~/.bashrc file. Please make a copy of your ~/.bashrc file.

We recommend using anaconda3 to setup the environment for the code. Here's a list important libraries that are used in the code:

These libraries will be installed if you follow the guide below:

  1. Install anaconda3 manually or by running bash ./install_anaconda.sh. (Note that this will modify your ~/.bashrc file.)
  2. Setup an environment by running conda env create -f env.yml.
  3. Activate the environment by running conda activate lr.
  4. Download pre-trained model files for nr3d and sr3d.
  5. Extract files to ./resources/models/nr3d/model.pt and ./resources/models/sr3d/model.pt.
  6. With the conda environment lr activated,
  7. For nr3d, run python eval.py --dataset-name nr3d --pretrain-path $(PROJECT_PATH)/resources/models/nr3d.
  8. For sr3d, run python eval.py --dataset-name sr3d --pretrain-path $(PROJECT_PATH)/resources/models/sr3d.

Instruction to train a model

Try the command below:

python train.py --experiment-tag $(use-your-own-description) --per-device-train-batch-size $(batch-size)

Please check some arguments by running python train.py --help.

Citing the paper

If you use "LanguageRefer: Spatial-Language Model for 3D Visual Grounding" in your research, please cite the paper:

@inproceedings{Roh2021Language,
  title={{L}anguage{R}efer: Spatial-Language Model for 3D Visual Grounding},
  author={Junha Roh and Karthik Desingh and Ali Farhadi and Dieter Fox},
  booktitle={Proceedings of the Conference on Robot Learning},
  year={2021},
}