Home

Awesome

X-Trans2Cap

[CVPR2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning [Arxiv Paper]

Zhihao Yuan, Xu Yan, Yinghong Liao, Yao Guo, Guanbin Li, Shuguang Cui, Zhen Li*

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Yuan_2022_CVPR,
    author    = {Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Guo, Yao and Li, Guanbin and Cui, Shuguang and Li, Zhen},
    title     = {X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {8563-8573}
}

Prerequisites

Installation

Data

ScanRefer

If you would like to access to the ScanRefer dataset, please fill out this form. Once your request is accepted, you will receive an email with the download link.

Note: In addition to language annotations in ScanRefer dataset, you also need to access the original ScanNet dataset. Please refer to the ScanNet Instructions for more details.

Download the dataset by simply executing the wget command:

wget <download_link>

Run this commoand to organize the ScanRefer data:

python scripts/organize_data.py

Processed 2D Features

You can download the processed 2D Image features from OneDrive. The feature extraction code is borrowed from bottom-up-attention.pytorch.

Change the data path in lib/config.py.

Training

Run this command to train the model:

python scripts/train.py --config config/xtrans_scanrefer.yaml

Run CIDEr optimization:

python scripts/train.py --config config/xtrans_scanrefer_rl.yaml

Our code also support training on Nr3D/Sr3D dataset. Please organize data as ScanRefer, and change the argument dataset in config file.

Evaluation

python scripts/eval.py --config config/xtrans_scanrefer.yaml --use_pretrained xtrans_scanrefer_rl --force