Home

Awesome

Long-Range-Grouping-Transformer

Official PyTorch implementation of the paper:

Long-Range Grouping Transformer for Multi-View 3D Reconstruction

Authors: Liying Yang, Zhenwei Zhu, Xuxin Lin, Jian Nong, Yanyan Liang.

<img src="./imgs/LGA.gif" width="300"/> <img src="./imgs/SGA.gif" width="300"/> <img src="./imgs/FRA.gif" width="300"/> <img src="./imgs/TGA.gif" width="300"/>

Performance

Methods1 view2 views3 views4 views5 views8 views12 views16 views20 views
3D-R2N20.560 / 0.3510.603 / 0.3680.617 / 0.3720.625 / 0.3780.634 / 0.3820.635 / 0.3830.636 / 0.3820.636 / 0.3820.636 / 0.383
AttSets0.642 / 0.3950.662 / 0.4180.670 / 0.4260.675 / 0.4300.677 / 0.4320.685 / 0.4440.688 / 0.4450.692 / 0.4470.693 / 0.448
Pix2Vox++0.670 / 0.4360.695 / 0.4520.704 / 0.4550.708 / 0.4570.711 / 0.4580.715 / 0.4590.717 / 0.4600.718 / 0.4610.719 / 0.462
GARNet0.673 / 0.4180.705 / 0.4550.716 / 0.4680.722 / 0.4750.726 / 0.4790.731 / 0.4860.734 / 0.4890.736 / 0.4910.737 / 0.492
GARNet+0.655 / 0.3990.696 / 0.4460.712 / 0.4650.719 / 0.4750.725 / 0.4810.733 / 0.4910.737 / 0.4980.740 / 0.5010.742 / 0.504
EVolT- / -- / -- / -0.609 / 0.358- / -0.698 / 0.4480.720 / 0.4750.729 / 0.4860.735 / 0.492
LegoFormer0.519 / 0.2820.644 / 0.3920.679 / 0.4280.694 / 0.4440.703 / 0.4530.713 / 0.4640.717 / 0.4700.719 / 0.4720.721 / 0.472
3D-C2FT0.629 / 0.3710.678 / 0.4240.695 / 0.4430.702 / 0.4520.702 / 0.4580.716 / 0.4680.720 / 0.4750.723 / 0.4770.724 / 0.479
3D-RETR <br> <font size=2>(3 view)</font>0.674 / -0.707 / -0.716 / -0.720 / -0.723 / -0.727 / -0.729 / -0.730 / -0.731 / -
3D-RETR*0.680 / -0.701 / -0.716 / -0.725 / -0.736 / -0.739 / -0.747 / -0.755 / -0.757 / -
UMIFormer0.6802 / 0.42810.7384 / 0.49190.7518 / 0.50670.7573 / 0.51270.7612 / 0.51680.7661 / 0.52130.7682 / 0.52320.7696 / 0.52450.7702 / 0.5251
UMIFormer+0.5672 / 0.31770.7115 / 0.45680.7447 / 0.49470.7588 / 0.51040.7681 / 0.52160.7790 / 0.53480.7843 / 0.54150.7873 / 0.54510.7886 / 0.5466
LRGT (Ours)0.6962 / 0.44610.7462 / 0.50050.7590 / 0.51480.7653 / 0.52140.7692 / 0.52570.7744 / 0.53110.7766 / 0.53370.7781 / 0.53470.7786 / 0.5353
LRGT+ (Ours)0.5847 / 0.33780.7145 / 0.46180.7476 / 0.49890.7625 / 0.51610.7719 / 0.52710.7833 / 0.54030.7888 / 0.54670.7912 / 0.54970.7922 / 0.5510
* The results in this row are derived from models that train individually for the various number of input views.

TODO

The code and pretrain models are coming soon.

Installation

The environment was tested on Ubuntu 16.04.5 LTS and Ubuntu 20.04.5 LTS. We trained LRGT on 2 Tesla V100s for about 1 day and LRGT+ on 8 Tesla V100s for about 2.5 days.

Clone the code repository

git clone https://github.com/LiyingCV/Long-Range-Grouping-Transformer.git

Create a new environment from environment.yml

conda env create -f environment.yml
conda activate lrgt

Or install Python dependencies

cd Long-Range-Grouping-Transformer
conda create -n lrgt python=3.6
pip install -r requirements.txt

Demo

<img src="imgs/visualize.gif" width="900"/>

Datasets

We use the ShapeNet and Pix3D in our experiments, which are available below:

Get start

Training

We provide the training script, which you can run as following: sh train.sh.

We use torch.distributed for multiple GPU training; therefore, you can change CUDA_VISIBLE_DEVICES and nproc_per_node to use more devices or only one device.

Evaluation

We provide the testing script, which you can run as following: sh test.sh

Citation

If you find our code or paper useful in your research, please consider citing:

@InProceedings{Yang_2023_ICCV,
    author    = {Yang, Liying and Zhu, Zhenwei and Lin, Xuxin and Nong, Jian and Liang, Yanyan},
    title     = {Long-Range Grouping Transformer for Multi-View 3D Reconstruction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {18257-18267}
}

Futher Information

Please check out other works on multi-view reconstruction from our group: