Home

Awesome

This repo supplements our 3D Vision with Transformers Survey

Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

This repo includes all the 3D computer vision papers with Transformers which are presented in our paper, and we aim to frequently update the latest relevant papers.

<p align="center"> <img src="https://user-images.githubusercontent.com/14073587/183882596-ada49e17-bbd5-4b09-962b-e0ff1d8291c0.png" width="600"> </p>

Content

Object Classification

Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning [RS 2022][PDF] <br>

Masked Autoencoders for Point Cloud Self-supervised Learning [ECCV 2022][PDF][Code] <br>

3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [T-ITS 2022][PDF] <br>

LFT-Net: Local Feature Transformer Network for Point Clouds Analysis [T-ITS 2022][PDF] <br>

Sewer defect detection from 3D point clouds using a transformer-based deep learning model [Automation in Construction 2022][PDF] <br>

3d medical point transformer: Introducing convolution to attention networks for medical point cloud analysis [arXiv 2021][PDF][Code] <br>

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [CVPR 2022][PDF][Code] <br>

CpT: Convolutional Point Transformer for 3D Point Cloud Processing [ACCVW 2022][PDF] <br>

PatchFormer: An Efficient Point Transformer With Patch Attention [CVPR 2022][PDF] <br>

PVT: Point-Voxel Transformer for Point Cloud Learning [arXiv 2021][PDF][Code] <br>

Adaptive Wavelet Transformer Network for 3D Shape Representation Learning [ICLR 2021][PDF] <br>

Point cloud learning with transformer [arXiv 2021][PDF] <br>

3crossnet: Cross-level cross-scale cross-attention network for point cloud representation [RA-L 2022][PDF] <br>

Dual Transformer for Point Cloud Analysis [IEEE Trans Multimedia][PDF] <br>

Centroid transformers: Learning to abstract with attention [arXiv 2021][PDF] <br>

PCT: Point cloud transformer [CVPR 2019][PDF][Code] <br>

Point Transformer [ICCV 2021][PDF][Code] <br>

Point Transformer [IEEE Access 2021][PDF][Code] <br>

Modeling point clouds with self-attention and gumbel subset sampling [CVPR 2019][PDF] <br>

Attentional shapecontextnet for point cloud recognition [CVPR 2018][PDF][Code] <br>

3D Object Detection

Bridged Transformer for Vision and Point Cloud 3D Object Detection [CVPR 2022][PDF] <br>

Multimodal Token Fusion for Vision Transformers [CVPR 2022][PDF][Code] <br>

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection [CVPR 2022][PDF] <br>

Focused Decoding Enables 3D Anatomical Detection by Transformers [arXiv 2022][PDF][Code] <br>

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection [arXiv 2022][PDF][Code] <br>

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [CVPR 2022][PDF][Code] <br>

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds [CVPR 2022][PDF][Code] <br>

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [CVPR 2022][PDF][Code] <br>

Point Density-Aware Voxels for LiDAR 3D Object Detection [CVPR 2022][PDF][Code] <br>

PETR: Position Embedding Transformation for Multi-View 3D Object Detection [ECCV 2022][PDF][Code] <br>

ARM3D: Attention-based relation module for indoor 3D object detection [Comput. Vis.][PDF][Code] <br>

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer [CVPR 2022][PDF][Code] <br>

Attention-based Proposals Refinement for 3D Object Detection [IV 2022][PDF][Code] <br>

Embracing Single Stride 3D Object Detector with Sparse Transformer [CVPR 2022][PDF][Code] <br>

Fast Point Transformer [CVPR 2022][PDF][Code] <br>

BoxeR: Box-Attention for 2D and 3D Transformers [CVPR 2022][PDF][Code] <br>

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [CoRL 2022][PDF][Code] <br>

An End-to-End Transformer Model for 3D Object Detection [ICCV 2021][PDF][Code] <br>

Voxel Transformer for 3D Object Detection [ICCV 2021][PDF][Code] <br>

Improving 3D Object Detection with Channel-wise Transformer [ICCV 2021][PDF][Code] <br>

M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers [WACV 2022][PDF][Code] <br>

Group-Free 3D Object Detection via Transformers [ICCV 2021][PDF][Code] <br>

SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [ICCVW 2021][PDF][Code] <br>

3D object detection with pointformer [CVPR 2021][PDF][Code] <br>

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving [IEEE Trans. Circuits Syst.][PDF] <br>

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection [CVPR 2020][PDF][Code] <br>

LiDAR-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention [CVPR 2020][PDF][Code] <br>

SCANet: Spatial-channel attention network for 3d object detection [ICASSP 2019][PDF][Code] <br>

3D Segmentation

For part segmentation, check Object Classification

Complete Scenes Segmentation

Stratified Transformer for 3D Point Cloud Segmentation [CVPR 2022][PDF][Code] <br>

Multimodal Token Fusion for Vision Transformers [CVPR 2022][PDF][Code] <br>

Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation [AAAI 2022][PDF] <br>

Fast Point Transformer [CVPR 2022][PDF][Code] <br>

Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation [CVPR 2022][PDF] <br>

Point Cloud Video Segmentation

Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling [TPAMI 2022][PDF] <br>

Spatial-Temporal Transformer for 3D Point Cloud Sequences [WACV 2022][PDF] <br>

Point 4D transformer networks for spatio-temporal modeling in point cloud videos [CVPR 2021][PDF][Code] <br>

Medical Imaging Segmentation

Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images [MICCAI BrainLes 2022][PDF][Code] <br>

D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation [Neural Comput Appl 2022][PDF] <br>

A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation [MICCAI 2022][PDF][Code] <br>

T-AutoML: Automated Machine Learning for Lesion Segmentation using Transformers in 3D Medical Imaging [ICCV 2021][PDF] <br>

After-unet: Axial fusion transformer unet for medical image segmentation [WACV 2022][PDF] <br>

Bitr-unet: a cnn-transformer combined network for mri brain tumor segmentation [MICCAI BrainLes 2022][PDF] <br>

nnformer: Interleaved transformer for volumetric segmentation [arXiv 2021][PDF][Code] <br>

UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [MICCAI 2022][PDF][Code] <br>

Medical image segmentation using squeezeand-expansion transformers [IJCAI 2021][PDF][Code] <br>

Unetr: Transformers for 3d medical image segmentation [WACV 2022][PDF][Code] <br>

Transbts: Multimodal brain tumor segmentation using transformer [MICCAI 2021][PDF][Code] <br>

Spectr: Spectral transformer for hyperspectral pathology image segmentation [arXiv 2021][PDF][Code] <br>

Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [MICCAI 2021][PDF][Code] <br>

Convolution-free medical image segmentation using transformers [MICCAI 2021][PDF] <br>

Transfuse: Fusing transformers and cnns for medical image segmentation [MICCAI 2021][PDF][Code] <br>

3D Point Cloud Completion

Learning Local Displacements for Point Cloud Completion [CVPR 2022][PDF][Code] <br>

AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation [CVPR 2022][PDF][Code] <br>

PointAttN: You Only Need Attention for Point Cloud Completion [arXiv 2022][PDF][Code] <br>

Point cloud completion on structured feature map with feedback network [CVM 2022][PDF] <br>

ShapeFormer: Transformer-based Shape Completion via Sparse Representation [CVPR 2022][PDF][Code] <br>

A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion [ICLR 2021][PDF][Code] <br>

MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching [arXiv 2021][PDF] <br>

PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion [IROS 2021][PDF][Code] <br>

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers [ICCV 2021][PDF][Code] <br>

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer [ICCV 2021][PDF][Code] <br>

3D Pose Estimation

Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation [arXiv 2022][PDF] <br>

Zero-Shot Category-Level Object Pose Estimation [ECCV 2022][PDF][Code] <br>

Efficient Virtual View Selection for 3D Hand Pose Estimation [AAAI 2022][PDF][Code] <br>

Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World [ECCV 2022][PDF][Code] <br>

CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation [arXiv 2022][PDF][Code] <br>

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers [ECCV 2022][PDF] <br>

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation [ECCV 2022][PDF][Code] <br>

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video [CVPR 2022][PDF][Code] <br>

6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning [TIP 2022][PDF] <br>

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation [CVPR 2022][PDF][Code] <br>

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation [IEEE Trans. Multimed. 2022][PDF][Code] <br>

3D Human Pose Estimation with Spatial and Temporal Transformers [ICCV 2021][PDF][Code] <br>

End-to-End Human Pose and Mesh Reconstruction with Transformers [CVPR 2021][PDF][Code] <br>

PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation [WACV 2021][PDF][Code] <br>

HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation [ACM MM 2020][PDF] <br>

Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [ECCV 2020][PDF] <br>

Epipolar Transformer for Multi-view Human Pose Estimation [CVPRW 2020][PDF][Code] <br>

Other Tasks

3D Tracking

Pttr: Relational 3d point cloud object tracking with transformer [CVPR 2022][PDF][Code] <br>

3d object tracking with transformer [BMVC 2021][PDF] <br>

3D Motion Prediction

Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction [CVPRW 2022][PDF] <br>

Gimo: Gaze-informed human motion prediction in context [ECCV 2022][PDF][Code] <br>

Pose transformers (potr): Human motion prediction with non-autoregressive transformer [ICCVW 2021][PDF][Code] <br>

Learning progressive joint propagation for human motion prediction [ECCV 2020][PDF] <br>

History repeats itself: Human motion prediction via motion attention [ECCV 2020][PDF][Code] <br>

A spatio-temporal transformer for 3d human motion prediction [3DV 2021][PDF][Code] <br>

3D Reconstruction

Vpfusion: Joint 3d volume and pixel-aligned feature fusion for single and multi-view 3d reconstruction [arXiv 2022][PDF] <br>

Thundr: Transformer-based 3d human reconstruction with marker [ICCV 2021][PDF] <br>

Multi-view 3d reconstruction with transformer [ICCV 2021][PDF] <br>

Point Cloud Registration

Regtr: End-to-end point cloud correspondences with transformer [CVPR 2022][PDF][Code] <br>

LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction [CVPR 2021][PDF][Code] <br>

Robust point cloud registra tion framework based on deep graph matching [CVPR 2021][PDF][Code] <br>

Deep closest point: Learning representations for point cloud registration [ICCV 2019][PDF][Code] <br>

Citation

If you find the listing or the survey useful for your work, please cite our paper:

@misc{lahoud20223d,
      title={3D Vision with Transformers: A Survey}, 
      author={Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
      year={2022},
      eprint={2208.04309},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}