Awesome
LanguageRefer: Spatial-Language Model for 3D Visual Grounding
This is an implementation of CoRL 2021 paper "LanguageRefer: Spatial-Language Model for 3D Visual Grounding" by Roh et al. [pdf][project]
For a video, examples of ReferIt3D datasets, qualitative results of the model, and a link to the orientation annotation, please visit the project page (https://sites.google.com/view/language-refer).
Instruction to run the code
For running the code, we have to install the prerequisites, setup an environment, and then run the code. You would be able to run the evaluation code if you follow the instruction step by step.
WARNING: one of scripts contains the code that modifies your ~/.bashrc
file.
Please make a copy of your ~/.bashrc
file.
We recommend using anaconda3
to setup the environment for the code.
Here's a list important libraries that are used in the code:
- python==3.8
- pytorch==1.9
These libraries will be installed if you follow the guide below:
- Install
anaconda3
manually or by runningbash ./install_anaconda.sh
. (Note that this will modify your~/.bashrc
file.) - Setup an environment by running
conda env create -f env.yml
. - Activate the environment by running
conda activate lr
. - Download pre-trained model files for nr3d and sr3d.
- Extract files to
./resources/models/nr3d/model.pt
and./resources/models/sr3d/model.pt
. - With the conda environment
lr
activated, - For
nr3d
, runpython eval.py --dataset-name nr3d --pretrain-path $(PROJECT_PATH)/resources/models/nr3d
. - For
sr3d
, runpython eval.py --dataset-name sr3d --pretrain-path $(PROJECT_PATH)/resources/models/sr3d
.
Instruction to train a model
Try the command below:
python train.py --experiment-tag $(use-your-own-description) --per-device-train-batch-size $(batch-size)
Please check some arguments by running python train.py --help
.
Citing the paper
If you use "LanguageRefer: Spatial-Language Model for 3D Visual Grounding" in your research, please cite the paper:
@inproceedings{Roh2021Language,
title={{L}anguage{R}efer: Spatial-Language Model for 3D Visual Grounding},
author={Junha Roh and Karthik Desingh and Ali Farhadi and Dieter Fox},
booktitle={Proceedings of the Conference on Robot Learning},
year={2021},
}