Awesome
This repo supplements our 3D Vision with Transformers Survey
Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang
This repo includes all the 3D computer vision papers with Transformers which are presented in our paper, and we aim to frequently update the latest relevant papers.
<p align="center"> <img src="https://user-images.githubusercontent.com/14073587/183882596-ada49e17-bbd5-4b09-962b-e0ff1d8291c0.png" width="600"> </p>Content
- Object Classification<br>
- 3D Object Detection<br>
- 3D Segmentation<br>
- 3D Point Cloud Completion<br>
- 3D Pose Estimation<br>
- Other Tasks<br>
- 3D Tracking<br>
- 3D Motion Prediction<br>
- 3D Reconstruction<br>
- Point Cloud Registration<br>
Object Classification
Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning [RS 2022][PDF] <br>
Masked Autoencoders for Point Cloud Self-supervised Learning [ECCV 2022][PDF][Code] <br>
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [T-ITS 2022][PDF] <br>
LFT-Net: Local Feature Transformer Network for Point Clouds Analysis [T-ITS 2022][PDF] <br>
Sewer defect detection from 3D point clouds using a transformer-based deep learning model [Automation in Construction 2022][PDF] <br>
3d medical point transformer: Introducing convolution to attention networks for medical point cloud analysis [arXiv 2021][PDF][Code] <br>
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [CVPR 2022][PDF][Code] <br>
CpT: Convolutional Point Transformer for 3D Point Cloud Processing [ACCVW 2022][PDF] <br>
PatchFormer: An Efficient Point Transformer With Patch Attention [CVPR 2022][PDF] <br>
PVT: Point-Voxel Transformer for Point Cloud Learning [arXiv 2021][PDF][Code] <br>
Adaptive Wavelet Transformer Network for 3D Shape Representation Learning [ICLR 2021][PDF] <br>
Point cloud learning with transformer [arXiv 2021][PDF] <br>
3crossnet: Cross-level cross-scale cross-attention network for point cloud representation [RA-L 2022][PDF] <br>
Dual Transformer for Point Cloud Analysis [IEEE Trans Multimedia][PDF] <br>
Centroid transformers: Learning to abstract with attention [arXiv 2021][PDF] <br>
PCT: Point cloud transformer [CVPR 2019][PDF][Code] <br>
Point Transformer [ICCV 2021][PDF][Code] <br>
Point Transformer [IEEE Access 2021][PDF][Code] <br>
Modeling point clouds with self-attention and gumbel subset sampling [CVPR 2019][PDF] <br>
Attentional shapecontextnet for point cloud recognition [CVPR 2018][PDF][Code] <br>
3D Object Detection
Bridged Transformer for Vision and Point Cloud 3D Object Detection [CVPR 2022][PDF] <br>
Multimodal Token Fusion for Vision Transformers [CVPR 2022][PDF][Code] <br>
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection [CVPR 2022][PDF] <br>
Focused Decoding Enables 3D Anatomical Detection by Transformers [arXiv 2022][PDF][Code] <br>
MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection [arXiv 2022][PDF][Code] <br>
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [CVPR 2022][PDF][Code] <br>
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds [CVPR 2022][PDF][Code] <br>
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [CVPR 2022][PDF][Code] <br>
Point Density-Aware Voxels for LiDAR 3D Object Detection [CVPR 2022][PDF][Code] <br>
PETR: Position Embedding Transformation for Multi-View 3D Object Detection [ECCV 2022][PDF][Code] <br>
ARM3D: Attention-based relation module for indoor 3D object detection [Comput. Vis.][PDF][Code] <br>
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer [CVPR 2022][PDF][Code] <br>
Attention-based Proposals Refinement for 3D Object Detection [IV 2022][PDF][Code] <br>
Embracing Single Stride 3D Object Detector with Sparse Transformer [CVPR 2022][PDF][Code] <br>
Fast Point Transformer [CVPR 2022][PDF][Code] <br>
BoxeR: Box-Attention for 2D and 3D Transformers [CVPR 2022][PDF][Code] <br>
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [CoRL 2022][PDF][Code] <br>
An End-to-End Transformer Model for 3D Object Detection [ICCV 2021][PDF][Code] <br>
Voxel Transformer for 3D Object Detection [ICCV 2021][PDF][Code] <br>
Improving 3D Object Detection with Channel-wise Transformer [ICCV 2021][PDF][Code] <br>
M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers [WACV 2022][PDF][Code] <br>
Group-Free 3D Object Detection via Transformers [ICCV 2021][PDF][Code] <br>
SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [ICCVW 2021][PDF][Code] <br>
3D object detection with pointformer [CVPR 2021][PDF][Code] <br>
Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving [IEEE Trans. Circuits Syst.][PDF] <br>
MLCVNet: Multi-Level Context VoteNet for 3D Object Detection [CVPR 2020][PDF][Code] <br>
LiDAR-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention [CVPR 2020][PDF][Code] <br>
SCANet: Spatial-channel attention network for 3d object detection [ICASSP 2019][PDF][Code] <br>
3D Segmentation
For part segmentation, check Object Classification
Complete Scenes Segmentation
Stratified Transformer for 3D Point Cloud Segmentation [CVPR 2022][PDF][Code] <br>
Multimodal Token Fusion for Vision Transformers [CVPR 2022][PDF][Code] <br>
Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation [AAAI 2022][PDF] <br>
Fast Point Transformer [CVPR 2022][PDF][Code] <br>
Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation [CVPR 2022][PDF] <br>
Point Cloud Video Segmentation
Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling [TPAMI 2022][PDF] <br>
Spatial-Temporal Transformer for 3D Point Cloud Sequences [WACV 2022][PDF] <br>
Point 4D transformer networks for spatio-temporal modeling in point cloud videos [CVPR 2021][PDF][Code] <br>
Medical Imaging Segmentation
Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images [MICCAI BrainLes 2022][PDF][Code] <br>
D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation [Neural Comput Appl 2022][PDF] <br>
A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation [MICCAI 2022][PDF][Code] <br>
T-AutoML: Automated Machine Learning for Lesion Segmentation using Transformers in 3D Medical Imaging [ICCV 2021][PDF] <br>
After-unet: Axial fusion transformer unet for medical image segmentation [WACV 2022][PDF] <br>
Bitr-unet: a cnn-transformer combined network for mri brain tumor segmentation [MICCAI BrainLes 2022][PDF] <br>
nnformer: Interleaved transformer for volumetric segmentation [arXiv 2021][PDF][Code] <br>
UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [MICCAI 2022][PDF][Code] <br>
Medical image segmentation using squeezeand-expansion transformers [IJCAI 2021][PDF][Code] <br>
Unetr: Transformers for 3d medical image segmentation [WACV 2022][PDF][Code] <br>
Transbts: Multimodal brain tumor segmentation using transformer [MICCAI 2021][PDF][Code] <br>
Spectr: Spectral transformer for hyperspectral pathology image segmentation [arXiv 2021][PDF][Code] <br>
Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [MICCAI 2021][PDF][Code] <br>
Convolution-free medical image segmentation using transformers [MICCAI 2021][PDF] <br>
Transfuse: Fusing transformers and cnns for medical image segmentation [MICCAI 2021][PDF][Code] <br>
3D Point Cloud Completion
Learning Local Displacements for Point Cloud Completion [CVPR 2022][PDF][Code] <br>
AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation [CVPR 2022][PDF][Code] <br>
PointAttN: You Only Need Attention for Point Cloud Completion [arXiv 2022][PDF][Code] <br>
Point cloud completion on structured feature map with feedback network [CVM 2022][PDF] <br>
ShapeFormer: Transformer-based Shape Completion via Sparse Representation [CVPR 2022][PDF][Code] <br>
A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion [ICLR 2021][PDF][Code] <br>
MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching [arXiv 2021][PDF] <br>
PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion [IROS 2021][PDF][Code] <br>
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers [ICCV 2021][PDF][Code] <br>
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer [ICCV 2021][PDF][Code] <br>
3D Pose Estimation
Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation [arXiv 2022][PDF] <br>
Zero-Shot Category-Level Object Pose Estimation [ECCV 2022][PDF][Code] <br>
Efficient Virtual View Selection for 3D Hand Pose Estimation [AAAI 2022][PDF][Code] <br>
Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World [ECCV 2022][PDF][Code] <br>
CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation [arXiv 2022][PDF][Code] <br>
RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers [ECCV 2022][PDF] <br>
P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation [ECCV 2022][PDF][Code] <br>
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video [CVPR 2022][PDF][Code] <br>
6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning [TIP 2022][PDF] <br>
Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation [CVPR 2022][PDF][Code] <br>
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation [IEEE Trans. Multimed. 2022][PDF][Code] <br>
3D Human Pose Estimation with Spatial and Temporal Transformers [ICCV 2021][PDF][Code] <br>
End-to-End Human Pose and Mesh Reconstruction with Transformers [CVPR 2021][PDF][Code] <br>
PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation [WACV 2021][PDF][Code] <br>
HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation [ACM MM 2020][PDF] <br>
Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [ECCV 2020][PDF] <br>
Epipolar Transformer for Multi-view Human Pose Estimation [CVPRW 2020][PDF][Code] <br>
Other Tasks
3D Tracking
Pttr: Relational 3d point cloud object tracking with transformer [CVPR 2022][PDF][Code] <br>
3d object tracking with transformer [BMVC 2021][PDF] <br>
3D Motion Prediction
Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction [CVPRW 2022][PDF] <br>
Gimo: Gaze-informed human motion prediction in context [ECCV 2022][PDF][Code] <br>
Pose transformers (potr): Human motion prediction with non-autoregressive transformer [ICCVW 2021][PDF][Code] <br>
Learning progressive joint propagation for human motion prediction [ECCV 2020][PDF] <br>
History repeats itself: Human motion prediction via motion attention [ECCV 2020][PDF][Code] <br>
A spatio-temporal transformer for 3d human motion prediction [3DV 2021][PDF][Code] <br>
3D Reconstruction
Vpfusion: Joint 3d volume and pixel-aligned feature fusion for single and multi-view 3d reconstruction [arXiv 2022][PDF] <br>
Thundr: Transformer-based 3d human reconstruction with marker [ICCV 2021][PDF] <br>
Multi-view 3d reconstruction with transformer [ICCV 2021][PDF] <br>
Point Cloud Registration
Regtr: End-to-end point cloud correspondences with transformer [CVPR 2022][PDF][Code] <br>
LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction [CVPR 2021][PDF][Code] <br>
Robust point cloud registra tion framework based on deep graph matching [CVPR 2021][PDF][Code] <br>
Deep closest point: Learning representations for point cloud registration [ICCV 2019][PDF][Code] <br>
Citation
If you find the listing or the survey useful for your work, please cite our paper:
@misc{lahoud20223d,
title={3D Vision with Transformers: A Survey},
author={Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
year={2022},
eprint={2208.04309},
archivePrefix={arXiv},
primaryClass={cs.CV}
}