Home

Awesome

Awesome Transformer Architecture Search: Awesome

<p align="center"> <img width="250" src="https://camo.githubusercontent.com/1131548cf666e1150ebd2a52f44776d539f06324/68747470733a2f2f63646e2e7261776769742e636f6d2f73696e647265736f726875732f617765736f6d652f6d61737465722f6d656469612f6c6f676f2e737667" "Awesome!"> </p>

To keep track of the large number of recent papers that look at the intersection of Transformers and Neural Architecture Search (NAS), we have created this awesome list of curated papers and resources, inspired by awesome-autodl, awesome-architecture-search, and awesome-computer-vision. Papers are divided into the following categories:

  1. General Transformer search
  2. Domain Specific, applied Transformer search (divided into NLP, Vision, ASR)
  3. Transformers Knowledge: Insights / Searchable parameters / Attention
  4. Transformer Surveys
  5. Foundation Models
  6. Misc Resources

This repository is maintained by Yash Mehta, please feel free to reach out, create pull requests or open an issue to add papers. Please see this Google Doc for a comprehensive list of papers at ICML 2023 on foundation models/large language models.

General Transformer Search

TitleVenueGroup
Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language ModelsNeurIPS'22MSR
Training Free Transformer Architecture SearchCVPR'22Tencent & Xiamen University
LiteTransformerSearch: Training-free On-device Search for Efficient Autoregressive Language ModelsAutoML Conference 2022 Workshop TrackMSR
Searching the Search Space of Vision TransformerNeurIPS'21MSRA, Stony Brook University
UniNet: Unified Architecture Search with Convolutions, Transformer and MLPECCV'22SenseTime
Analyzing and Mitigating Interference in Neural Architecture SearchICML'22Tsinghua, MSR
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture SearchICCV'21Sun Yat-sen University
Memory-Efficient Differentiable Transformer Architecture SearchACL-IJCNLP'21MSR, Peking University
Finding Fast Transformers: One-Shot Neural Architecture Search by Component Compositionarxiv [Aug'20]Google Research
AutoTrans: Automating Transformer Design via Reinforced Architecture SearchNLPCC'21Fudan University
NASABN: A Neural Architecture Search Framework for Attention-Based NetworksIJCNN'20Chinese Academy of Sciences
NAT: Neural Architecture Transformer for Accurate and Compact ArchitecturesNeurIPS'19Tencent AI
The Evolved TransformerICML'19Google Brain

Domain Specific Transformer Search

Vision

TitleVenueGroup
š¯›¼NAS: Neural Architecture Search using Property Guided SynthesisACM Programming Languages'22MIT, Google
NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet TrainingICLR'22Meta Reality Labs
AutoFormer: Searching Transformers for Visual RecognitionICCV'21MSR
GLiT: Neural Architecture Search for Global and Local Image TransformerICCV'21University of Sydney
Searching for Efficient Multi-Stage Vision TransformersICCV'21 workshopMIT
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight TransformersCVPR'21Bytedance Inc.

Natural Language Processing

TitleVenueGroup
AutoBERT-Zero: Evolving the BERT backbone from scratchAAAI'22Huawei Noahā€™s Ark Lab
Primer: Searching for Efficient Transformers for Language ModelingNeurIPS'21Google
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language ModelsACL'21Tsinghua, Huawei Naoh's Ark
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture SearchKDD'21MSR, Tsinghua University
HAT: Hardware-Aware Transformers for Efficient Natural Language ProcessingACL'20MIT

Automatic Speech Recognition

TitleVenueGroup
SFA: Searching faster architectures for end-to-end automatic speech recognition modelsComputer Speech and Language'23Chinese Academy of Sciences
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture SearchICASSP'21MSR
Efficient Gradient-Based Neural Architecture Search For End-to-End ASRICMI-MLMI'21NPU, Xi'an
Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech RecognitionINTERSPEECH'20VUNO Inc.

Transformers Knowledge: Insights, Searchable parameters, Attention

TitleVenueGroup
RWKV: Reinventing RNNs for the Transformer Eraarxiv [May'23]EleutherAI
Patches are All You Need ?TMLR'23CMU
Seperable Self Attention for Mobile Vision TransformersTMLR'23Apple
Parameter-efficient Fine-tuning for Vision TransformersAAAI'23MSR & UCSC
EfficientFormer: Vision Transformers at MobileNet SpeedNeurIPS'22Snap Inc
Neighborhood Attention TransformerCVPR'23Meta AI
Training Compute Optimal Large Language ModelsNeurIPS'22DeepMind
CMT: Convolutional Neural Networks meet Vision TransformersCVPR'22Huawei Noahā€™s Ark Lab
Patch Slimming for Efficient Vision TransformersCVPR'22Huawei Noahā€™s Ark Lab
Lite Vision Transformer with Enhanced Self-AttentionCVPR'22Johns Hopkins University, Adobe
TubeDETR: Spatio-Temporal Video Grounding with TransformersCVPR'22 (Oral)CNRS & Inria
Beyond Fixation: Dynamic Window Visual TransformerCVPR'22UT Sydney & RMIT University
BEiT: BERT Pre-Training of Image TransformersICLR'22 (Oral)MSR
How Do Vision Transformers Work?ICLR'22 (Spotlight)NAVER AI
Scale Efficiently: Insights from Pretraining and FineTuning TransformersICLR'22Google Research
Tuformer: Data-Driven Design of Expressive Transformer by Tucker Tensor RepresentationICLR'22UoMaryland
DictFormer: Tiny Transformer with Shared DictionaryICLR'22Samsung Research
QuadTree Attention for Vision TransformersICLR'22Alibaba AI Lab
Expediting Vision Transformers via Token ReorganizationICLR'22 (Spotlight)UC San Diego & Tencent AI Lab
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation LearningICLR'22SIAT-SenseTime
Hierarchical Transformers Are More Efficient Language ModelsNAACL'22Google Research, UoWarsaw
Transformer in TransformerNeurIPS'21Huawei Noah's Ark
Long-Short Transformer: Efficient Transformers for Language and VisionNeurIPS'21NVIDIA
Memory-efficient Transformers via Top-k AttentionEMNLP Workshop '21Allen AI
Swin Transformer: Hierarchical Vision Transformer using Shifted WindowsICCV'21 best paperMSR
Rethinking Spatial Dimensions of Vision TransformersICCV'21NAVER AI
What makes for hierarchical vision transformersarxiv [Sept'21]HUST
AutoAttend: Automated Attention Representation SearchICML'21Tsinghua University
Rethinking Attention with PerformersICLR'21 OralGoogle
LambdaNetworks: Modeling long-range Interactions without AttentionICLR'21Google Research
HyperGrid TransformersICLR'21Google Research
LocalViT: Bringing Locality to Vision Transformersarxiv [April'21]ETH Zurich
Compressive Transformers for Long Range Sequence ModellingICLR'20DeepMind
Improving Transformer Models by Reordering their SublayersACL'20FAIR, Allen AI
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be PrunedACL'19Yandex

Transformer Surveys

TitleVenueGroup
Transformers in Vision: A SurveyACM Computing Surveys'22MBZ University of AI
A Survey of Vision TransformersTPAMI'22CAS
Efficient Transformers: A SurveyACM Computing Surveys'22Google Research
Neural Architecture Search for Transformers: A SurveyIEEE xplore [Sep'22]Iowa State Uni

Foundation Models

TitleVenueGroup
Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Modelsarxiv'23Amazon Alexa AI

Misc resources