Awesome
X-Trans2Cap
[CVPR2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning [Arxiv Paper]
Zhihao Yuan, Xu Yan, Yinghong Liao, Yao Guo, Guanbin Li, Shuguang Cui, Zhen Li*
Citation
If you find our work useful in your research, please consider citing:
@InProceedings{Yuan_2022_CVPR,
author = {Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Guo, Yao and Li, Guanbin and Cui, Shuguang and Li, Zhen},
title = {X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {8563-8573}
}
Prerequisites
- Python 3.6.9 (e.g., conda create -n xtrans_env python=3.6.9)
- Pytorch 1.7.1 (e.g., conda install pytorch==1.7.1 cudatoolkit=11.0 -c pytorch)
- Install other common packages (numpy, transformers, etc.)
Installation
-
Clone the repository
git clone https://github.com/CurryYuan/X-Trans2Cap.git
-
To use a PointNet++ visual-encoder you need to compile its CUDA layers for PointNet++:
Note: To do this compilation also need: gcc5.4 or later.
cd lib/pointnet2 python setup.py install
Data
ScanRefer
If you would like to access to the ScanRefer dataset, please fill out this form. Once your request is accepted, you will receive an email with the download link.
Note: In addition to language annotations in ScanRefer dataset, you also need to access the original ScanNet dataset. Please refer to the ScanNet Instructions for more details.
Download the dataset by simply executing the wget command:
wget <download_link>
Run this commoand to organize the ScanRefer data:
python scripts/organize_data.py
Processed 2D Features
You can download the processed 2D Image features from OneDrive. The feature extraction code is borrowed from bottom-up-attention.pytorch.
Change the data path in lib/config.py
.
Training
Run this command to train the model:
python scripts/train.py --config config/xtrans_scanrefer.yaml
Run CIDEr optimization:
python scripts/train.py --config config/xtrans_scanrefer_rl.yaml
Our code also support training on Nr3D/Sr3D dataset. Please organize data as ScanRefer, and change the argument dataset
in config file.
Evaluation
python scripts/eval.py --config config/xtrans_scanrefer.yaml --use_pretrained xtrans_scanrefer_rl --force