Awesome

This repo supplements our 3D Vision with Transformers Survey

Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

This repo includes all the 3D computer vision papers with Transformers which are presented in our paper, and we aim to frequently update the latest relevant papers.

Content

Object Classification
3D Object Detection
3D Segmentation 
- Complete Scenes Segmentation 
- Point Cloud Video Segmentation 
- Medical Imaging Segmentation
3D Point Cloud Completion
3D Pose Estimation
Other Tasks 
- 3D Tracking 
- 3D Motion Prediction 
- 3D Reconstruction 
- Point Cloud Registration

Object Classification

Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning [RS 2022][PDF]

Masked Autoencoders for Point Cloud Self-supervised Learning [ECCV 2022][PDF][Code]

3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [T-ITS 2022][PDF]

LFT-Net: Local Feature Transformer Network for Point Clouds Analysis [T-ITS 2022][PDF]

Sewer defect detection from 3D point clouds using a transformer-based deep learning model [Automation in Construction 2022][PDF]

3d medical point transformer: Introducing convolution to attention networks for medical point cloud analysis [arXiv 2021][PDF][Code]

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [CVPR 2022][PDF][Code]

CpT: Convolutional Point Transformer for 3D Point Cloud Processing [ACCVW 2022][PDF]

PatchFormer: An Efficient Point Transformer With Patch Attention [CVPR 2022][PDF]

PVT: Point-Voxel Transformer for Point Cloud Learning [arXiv 2021][PDF][Code]

Adaptive Wavelet Transformer Network for 3D Shape Representation Learning [ICLR 2021][PDF]

Point cloud learning with transformer [arXiv 2021][PDF]

3crossnet: Cross-level cross-scale cross-attention network for point cloud representation [RA-L 2022][PDF]

Dual Transformer for Point Cloud Analysis [IEEE Trans Multimedia][PDF]

Centroid transformers: Learning to abstract with attention [arXiv 2021][PDF]

PCT: Point cloud transformer [CVPR 2019][PDF][Code]

Point Transformer [ICCV 2021][PDF][Code]

Point Transformer [IEEE Access 2021][PDF][Code]

Modeling point clouds with self-attention and gumbel subset sampling [CVPR 2019][PDF]

Attentional shapecontextnet for point cloud recognition [CVPR 2018][PDF][Code]

3D Object Detection

Bridged Transformer for Vision and Point Cloud 3D Object Detection [CVPR 2022][PDF]

Multimodal Token Fusion for Vision Transformers [CVPR 2022][PDF][Code]

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection [CVPR 2022][PDF]

Focused Decoding Enables 3D Anatomical Detection by Transformers [arXiv 2022][PDF][Code]

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection [arXiv 2022][PDF][Code]

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [CVPR 2022][PDF][Code]

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds [CVPR 2022][PDF][Code]

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [CVPR 2022][PDF][Code]

Point Density-Aware Voxels for LiDAR 3D Object Detection [CVPR 2022][PDF][Code]

PETR: Position Embedding Transformation for Multi-View 3D Object Detection [ECCV 2022][PDF][Code]

ARM3D: Attention-based relation module for indoor 3D object detection [Comput. Vis.][PDF][Code]

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer [CVPR 2022][PDF][Code]

Attention-based Proposals Refinement for 3D Object Detection [IV 2022][PDF][Code]

Embracing Single Stride 3D Object Detector with Sparse Transformer [CVPR 2022][PDF][Code]

Fast Point Transformer [CVPR 2022][PDF][Code]

BoxeR: Box-Attention for 2D and 3D Transformers [CVPR 2022][PDF][Code]

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [CoRL 2022][PDF][Code]

An End-to-End Transformer Model for 3D Object Detection [ICCV 2021][PDF][Code]

Voxel Transformer for 3D Object Detection [ICCV 2021][PDF][Code]

Improving 3D Object Detection with Channel-wise Transformer [ICCV 2021][PDF][Code]

M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers [WACV 2022][PDF][Code]

Group-Free 3D Object Detection via Transformers [ICCV 2021][PDF][Code]

SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [ICCVW 2021][PDF][Code]

3D object detection with pointformer [CVPR 2021][PDF][Code]

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving [IEEE Trans. Circuits Syst.][PDF]

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection [CVPR 2020][PDF][Code]

LiDAR-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention [CVPR 2020][PDF][Code]

SCANet: Spatial-channel attention network for 3d object detection [ICASSP 2019][PDF][Code]

3D Segmentation

For part segmentation, check Object Classification

Complete Scenes Segmentation

Stratified Transformer for 3D Point Cloud Segmentation [CVPR 2022][PDF][Code]

Multimodal Token Fusion for Vision Transformers [CVPR 2022][PDF][Code]

Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation [AAAI 2022][PDF]

Fast Point Transformer [CVPR 2022][PDF][Code]

Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation [CVPR 2022][PDF]

Point Cloud Video Segmentation

Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling [TPAMI 2022][PDF]

Spatial-Temporal Transformer for 3D Point Cloud Sequences [WACV 2022][PDF]

Point 4D transformer networks for spatio-temporal modeling in point cloud videos [CVPR 2021][PDF][Code]

Medical Imaging Segmentation

Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images [MICCAI BrainLes 2022][PDF][Code]

D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation [Neural Comput Appl 2022][PDF]

A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation [MICCAI 2022][PDF][Code]

T-AutoML: Automated Machine Learning for Lesion Segmentation using Transformers in 3D Medical Imaging [ICCV 2021][PDF]

After-unet: Axial fusion transformer unet for medical image segmentation [WACV 2022][PDF]

Bitr-unet: a cnn-transformer combined network for mri brain tumor segmentation [MICCAI BrainLes 2022][PDF]

nnformer: Interleaved transformer for volumetric segmentation [arXiv 2021][PDF][Code]

UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [MICCAI 2022][PDF][Code]

Medical image segmentation using squeezeand-expansion transformers [IJCAI 2021][PDF][Code]

Unetr: Transformers for 3d medical image segmentation [WACV 2022][PDF][Code]

Transbts: Multimodal brain tumor segmentation using transformer [MICCAI 2021][PDF][Code]

Spectr: Spectral transformer for hyperspectral pathology image segmentation [arXiv 2021][PDF][Code]

Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [MICCAI 2021][PDF][Code]

Convolution-free medical image segmentation using transformers [MICCAI 2021][PDF]

Transfuse: Fusing transformers and cnns for medical image segmentation [MICCAI 2021][PDF][Code]

3D Point Cloud Completion

Learning Local Displacements for Point Cloud Completion [CVPR 2022][PDF][Code]

AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation [CVPR 2022][PDF][Code]

PointAttN: You Only Need Attention for Point Cloud Completion [arXiv 2022][PDF][Code]

Point cloud completion on structured feature map with feedback network [CVM 2022][PDF]

ShapeFormer: Transformer-based Shape Completion via Sparse Representation [CVPR 2022][PDF][Code]

A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion [ICLR 2021][PDF][Code]

MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching [arXiv 2021][PDF]

PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion [IROS 2021][PDF][Code]

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers [ICCV 2021][PDF][Code]

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer [ICCV 2021][PDF][Code]

3D Pose Estimation

Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation [arXiv 2022][PDF]

Zero-Shot Category-Level Object Pose Estimation [ECCV 2022][PDF][Code]

Efficient Virtual View Selection for 3D Hand Pose Estimation [AAAI 2022][PDF][Code]

Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World [ECCV 2022][PDF][Code]

CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation [arXiv 2022][PDF][Code]

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers [ECCV 2022][PDF]

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation [ECCV 2022][PDF][Code]

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video [CVPR 2022][PDF][Code]

6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning [TIP 2022][PDF]

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation [CVPR 2022][PDF][Code]

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation [IEEE Trans. Multimed. 2022][PDF][Code]

3D Human Pose Estimation with Spatial and Temporal Transformers [ICCV 2021][PDF][Code]

End-to-End Human Pose and Mesh Reconstruction with Transformers [CVPR 2021][PDF][Code]

PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation [WACV 2021][PDF][Code]

HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation [ACM MM 2020][PDF]

Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [ECCV 2020][PDF]

Epipolar Transformer for Multi-view Human Pose Estimation [CVPRW 2020][PDF][Code]

Other Tasks

3D Tracking

Pttr: Relational 3d point cloud object tracking with transformer [CVPR 2022][PDF][Code]

3d object tracking with transformer [BMVC 2021][PDF]

3D Motion Prediction

Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction [CVPRW 2022][PDF]

Gimo: Gaze-informed human motion prediction in context [ECCV 2022][PDF][Code]

Pose transformers (potr): Human motion prediction with non-autoregressive transformer [ICCVW 2021][PDF][Code]

Learning progressive joint propagation for human motion prediction [ECCV 2020][PDF]

History repeats itself: Human motion prediction via motion attention [ECCV 2020][PDF][Code]

A spatio-temporal transformer for 3d human motion prediction [3DV 2021][PDF][Code]

3D Reconstruction

Vpfusion: Joint 3d volume and pixel-aligned feature fusion for single and multi-view 3d reconstruction [arXiv 2022][PDF]

Thundr: Transformer-based 3d human reconstruction with marker [ICCV 2021][PDF]

Multi-view 3d reconstruction with transformer [ICCV 2021][PDF]

Point Cloud Registration

Regtr: End-to-end point cloud correspondences with transformer [CVPR 2022][PDF][Code]

LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction [CVPR 2021][PDF][Code]

Robust point cloud registra tion framework based on deep graph matching [CVPR 2021][PDF][Code]

Deep closest point: Learning representations for point cloud registration [ICCV 2019][PDF][Code]

Citation

If you find the listing or the survey useful for your work, please cite our paper:

@misc{lahoud20223d,
      title={3D Vision with Transformers: A Survey}, 
      author={Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
      year={2022},
      eprint={2208.04309},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}