Home

Awesome

UMIFormer

This repository contains the source code for the paper UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction.

Highlight

Architecture

Performance

Methods1 view2 views3 views4 views5 views8 views12 views16 views20 views
3D-R2N20.560 / 0.3510.603 / 0.3680.617 / 0.3720.625 / 0.3780.634 / 0.3820.635 / 0.3830.636 / 0.3820.636 / 0.3820.636 / 0.383
AttSets0.642 / 0.3950.662 / 0.4180.670 / 0.4260.675 / 0.4300.677 / 0.4320.685 / 0.4440.688 / 0.4450.692 / 0.4470.693 / 0.448
Pix2Vox++0.670 / 0.4360.695 / 0.4520.704 / 0.4550.708 / 0.4570.711 / 0.4580.715 / 0.4590.717 / 0.4600.718 / 0.4610.719 / 0.462
GARNet0.673 / 0.4180.705 / 0.4550.716 / 0.4680.722 / 0.4750.726 / 0.4790.731 / 0.4860.734 / 0.4890.736 / 0.4910.737 / 0.492
GARNet+0.655 / 0.3990.696 / 0.4460.712 / 0.4650.719 / 0.4750.725 / 0.4810.733 / 0.4910.737 / 0.4980.740 / 0.5010.742 / 0.504
EVolT- / -- / -- / -0.609 / 0.358- / -0.698 / 0.4480.720 / 0.4750.729 / 0.4860.735 / 0.492
LegoFormer0.519 / 0.2820.644 / 0.3920.679 / 0.4280.694 / 0.4440.703 / 0.4530.713 / 0.4640.717 / 0.4700.719 / 0.4720.721 / 0.472
3D-C2FT0.629 / 0.3710.678 / 0.4240.695 / 0.4430.702 / 0.4520.702 / 0.4580.716 / 0.4680.720 / 0.4750.723 / 0.4770.724 / 0.479
3D-RETR <br> <font size=2>(3 view)</font>0.674 / -0.707 / -0.716 / -0.720 / -0.723 / -0.727 / -0.729 / -0.730 / -0.731 / -
3D-RETR*0.680 / -0.701 / -0.716 / -0.725 / -0.736 / -0.739 / -0.747 / -0.755 / -0.757 / -
UMIFormer0.6802 / 0.42810.7384 / 0.49190.7518 / 0.50670.7573 / 0.51270.7612 / 0.51680.7661 / 0.52130.7682 / 0.52320.7696 / 0.52450.7702 / 0.5251
UMIFormer+0.5672 / 0.31770.7115 / 0.45680.7447 / 0.49470.7588 / 0.51040.7681 / 0.52160.7790 / 0.53480.7843 / 0.54150.7873 / 0.54510.7886 / 0.5466
* The results in this row are derived from models that train individually for the various number of input views.

Demo

Cite this work

@article{zhu2023umiformer,
  title={UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction},
  author={Zhu, Zhenwei and Yang, Liying and Li, Ning and Jiang, Chaohao and Liang, Yanyan},
  journal={arXiv preprint arXiv:2302.13987},
  year={2023}
}

Datasets

We use the ShapeNet in our experiments, which are available below:

Pretrained Models

The pretrained models on ShapeNet are available as follows:

Please download them and put into ./pths/

Prerequisites

Clone the Code Repository

git clone https://github.com/GaryZhu1996/UMIFormer

Install Python Dependencies

cd UMIFormer
conda env create -f environment.yml

Modify the path of datasets

Modify __C.DATASETS.SHAPENET.RENDERING_PATH and __C.DATASETS.SHAPENET.VOXEL_PATH in config.py to the correct path of ShapeNet dataset.

3D Reconstruction Model

For training, please use the following command:

CUDA_VISIBLE_DEVICES=gpu_ids python -m torch.distributed.launch --nproc_per_node=num_of_gpu runner.py

For testing, please follow the steps below:

  1. Update the setting of __C.CONST.WEIGHTS in config.py as the path of the reconstruction model;
  2. Run the following command to evaluate the performance of the model when facing the number of input views defined by __C.CONST.N_VIEWS_RENDERING in config.py:
CUDA_VISIBLE_DEVICES=gpu_ids python -m torch.distributed.launch --nproc_per_node=num_of_gpu runner.py --test
  1. Run the following command to evaluate the performance of the model when facing various numbers of input views mentioned in the paper:
CUDA_VISIBLE_DEVICES=gpu_ids python -m torch.distributed.launch --nproc_per_node=num_of_gpu runner.py --batch_test

Our Other Works on Multi-View 3D Reconstruction

@article{zhu2023garnet,
  title={GARNet: Global-Aware Multi-View 3D Reconstruction Network and the Cost-Performance Tradeoff},
  author={Zhu, Zhenwei and Yang, Liying and Lin, Xuxin and Yang, Lin and Liang, Yanyan},
  journal={Pattern Recognition},
  pages={109674},
  year={2023},
  publisher={Elsevier}
}
@InProceedings{Yang_2023_ICCV,
    author    = {Yang, Liying and Zhu, Zhenwei and Lin, Xuxin and Nong, Jian and Liang, Yanyan},
    title     = {Long-Range Grouping Transformer for Multi-View 3D Reconstruction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {18257-18267}
}

License

This project is open sourced under MIT license.