RWKV: Reinventing RNNs for the Transformer Era | arxiv [May'23] | EleutherAI |
Patches are All You Need ? | TMLR'23 | CMU |
Seperable Self Attention for Mobile Vision Transformers | TMLR'23 | Apple |
Parameter-efficient Fine-tuning for Vision Transformers | AAAI'23 | MSR & UCSC |
EfficientFormer: Vision Transformers at MobileNet Speed | NeurIPS'22 | Snap Inc |
Neighborhood Attention Transformer | CVPR'23 | Meta AI |
Training Compute Optimal Large Language Models | NeurIPS'22 | DeepMind |
CMT: Convolutional Neural Networks meet Vision Transformers | CVPR'22 | Huawei Noahā€™s Ark Lab |
Patch Slimming for Efficient Vision Transformers | CVPR'22 | Huawei Noahā€™s Ark Lab |
Lite Vision Transformer with Enhanced Self-Attention | CVPR'22 | Johns Hopkins University, Adobe |
TubeDETR: Spatio-Temporal Video Grounding with Transformers | CVPR'22 (Oral) | CNRS & Inria |
Beyond Fixation: Dynamic Window Visual Transformer | CVPR'22 | UT Sydney & RMIT University |
BEiT: BERT Pre-Training of Image Transformers | ICLR'22 (Oral) | MSR |
How Do Vision Transformers Work? | ICLR'22 (Spotlight) | NAVER AI |
Scale Efficiently: Insights from Pretraining and FineTuning Transformers | ICLR'22 | Google Research |
Tuformer: Data-Driven Design of Expressive Transformer by Tucker Tensor Representation | ICLR'22 | UoMaryland |
DictFormer: Tiny Transformer with Shared Dictionary | ICLR'22 | Samsung Research |
QuadTree Attention for Vision Transformers | ICLR'22 | Alibaba AI Lab |
Expediting Vision Transformers via Token Reorganization | ICLR'22 (Spotlight) | UC San Diego & Tencent AI Lab |
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning | ICLR'22 | SIAT-SenseTime |
Hierarchical Transformers Are More Efficient Language Models | NAACL'22 | Google Research, UoWarsaw |
Transformer in Transformer | NeurIPS'21 | Huawei Noah's Ark |
Long-Short Transformer: Efficient Transformers for Language and Vision | NeurIPS'21 | NVIDIA |
Memory-efficient Transformers via Top-k Attention | EMNLP Workshop '21 | Allen AI |
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | ICCV'21 best paper | MSR |
Rethinking Spatial Dimensions of Vision Transformers | ICCV'21 | NAVER AI |
What makes for hierarchical vision transformers | arxiv [Sept'21] | HUST |
AutoAttend: Automated Attention Representation Search | ICML'21 | Tsinghua University |
Rethinking Attention with Performers | ICLR'21 Oral | Google |
LambdaNetworks: Modeling long-range Interactions without Attention | ICLR'21 | Google Research |
HyperGrid Transformers | ICLR'21 | Google Research |
LocalViT: Bringing Locality to Vision Transformers | arxiv [April'21] | ETH Zurich |
Compressive Transformers for Long Range Sequence Modelling | ICLR'20 | DeepMind |
Improving Transformer Models by Reordering their Sublayers | ACL'20 | FAIR, Allen AI |
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned | ACL'19 | Yandex |