Home

Awesome

Ultimate-Awesome-Transformer-Attention Awesome

This repo contains a comprehensive paper list of Vision Transformer & Attention, including papers, codes, and related websites. <br> This list is maintained by Min-Hung Chen. (Actively keep updating)

If you find some ignored papers, feel free to create pull requests, open issues, or email me. <br> Contributions in any form to make this list more comprehensive are welcome.

If you find this repository useful, please consider citing and ★STARing this list. <br> Feel free to share this list with others!

[Update: January, 2024] Added all the related papers from NeurIPS 2023! <br> [Update: December, 2023] Added all the related papers from ICCV 2023! <br> [Update: September, 2023] Split the multi-modal paper list to README_multimodal.md <br> [Update: June, 2023] Added all the related papers from ICML 2023! <br> [Update: June, 2023] Added all the related papers from CVPR 2023! <br> [Update: February, 2023] Added all the related papers from ICLR 2023! <br> [Update: December, 2022] Added attention-free papers from Networks Beyond Attention (GitHub) made by Jianwei Yang <br> [Update: November, 2022] Added all the related papers from NeurIPS 2022! <br> [Update: October, 2022] Split the 2nd half of the paper list to README_2.md <br> [Update: October, 2022] Added all the related papers from ECCV 2022! <br> [Update: September, 2022] Added the Transformer tutorial slides made by Lucas Beyer! <br> [Update: June, 2022] Added all the related papers from CVPR 2022!


Overview

------ (The following papers are moved to README_multimodal.md) ------

------ (The following papers are moved to README_2.md) ------


Citation

If you find this repository useful, please consider citing this list:

@misc{chen2022transformerpaperlist,
    title = {Ultimate awesome paper list: transformer and attention},
    author = {Chen, Min-Hung},
    journal = {GitHub repository},
    url = {https://github.com/cmhungsteve/Awesome-Transformer-Attention},
    year = {2022},
}

Survey

[Back to Overview]

Image Classification / Backbone

Replace Conv w/ Attention

Pure Attention

Conv-stem + Attention

Conv + Attention

[Back to Overview]

Vision Transformer

General Vision Transformer

Efficient Vision Transformer

Conv + Transformer

Training + Transformer

Robustness + Transformer

Model Compression + Transformer

[Back to Overview]

Attention-Free

MLP-Series

Other Attention-Free

[Back to Overview]

Analysis for Transformer

[Back to Overview]

Detection

Object Detection

[Back to Overview]

3D Object Detection

[Back to Overview]

Multi-Modal Detection

[Back to Overview]

HOI Detection

[Back to Overview]

Salient Object Detection

[Back to Overview]

Other Detection Tasks

[Back to Overview]

Segmentation

Semantic Segmentation

[Back to Overview]

Depth Estimation

[Back to Overview]

Object Segmentation

[Back to Overview]

Other Segmentation Tasks

[Back to Overview]

Video (High-level)

Action Recognition

[Back to Overview]

Action Detection/Localization

[Back to Overview]

Action Prediction/Anticipation

[Back to Overview]

Video Object Segmentation

[Back to Overview]

Video Instance Segmentation

[Back to Overview]

Other Video Tasks

[Back to Overview]


References