Awesome

A Roadmap for Transfer Learning

Introduction

This repo is a collection of AWESOME papers, code related with transfer learning, pre-training and domain adaptation etc. Feel free to star and fork. Feel free to let us know the missing papers (issue or pull request).

This repo is also related with our latest survey, Transferability in Deep Learning

Survey | Library | Website | 中文介绍

Overview

Introduction
Pre-Training Models
Supervised Pre-Training
- Meta-Learning
- Causal Learning
Unsupervised Pre-Training
- Generative Learning
- Contrastive Learning
Task Adaptation
Domain Adaptation
Evaluation
- Cross-Task Evaluation
- Cross-Domain Evaluation

Pre-Training Models

Resources

Transformers: State-of-the-art Natural Language Processing [Library] [pdf]
PyTorch Image Models [Library]

Survey

On the Opportunities and Risks of Fondattion Model [pdf]
Pre-Trained Models: Past, Present and Future [pdf]
Pre-trained Models for Natural Language Processing: A Survey [pdf]

Paper

ViT - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [ICLR 2021] [Code]
Do Better ImageNet Models Transfer Better? [CVPR 2019]
GroupNorm - Group Normalization [ECCV 2018]
Transformer - Attention Is All You Need [NIPS 2017]
LayerNorm - Layer Normalization [arXiv 21 Jul 2016]
ResNet - Deep Residual Learning for Image Recognition [CVPR 2016 Best]
BatchNorm - Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [ICML 2015]

Supervised Pre-Training

Paper

Exploring the Limits of Large Scale Pre-training [arXiv 5 Oct 2021]
Do Adversarially Robust ImageNet Models Transfer Better? [NIPS 2020]
BiT - Big Transfer (BiT): General Visual Representation Learning [ECCV 2020] [Code]
Billion-scale Semi-supervised Learning for Image Classification [arXiv 2 May 2019] [Code]
SIN - ImageNet-trained CNNs are biased towards texture: increasing shape bias improves accuracy and robustness [ICLR 2019] [Code]
DAT - Domain Adaptive Transfer Learning with Specialist Models [arXiv 16 Nov 2018]
WSP - Exploring the Limits of Weakly Supervised Pretraining [ECCV 2018] [Code]

Meta-Learning

Resources

learn2learn [Library]

Survey

Meta-Learning in Neural Networks: A Survey [TPAMI 2021]

Paper

Omni-Training - Omni-Training for Data-Efficient Deep Learning [arXiv 14 Oct 2021]
HSML - Hierarchically Structured Meta-learning [ICML 2019]
Meta-Transfer Learning for Few-Shot Learning [CVPR 2019]
LEO - Meta-Learning with Latent Embedding Optimization [ICLR 2019]
A Closer Look at Few-shot Classification [ICLR 2019]
MAML - Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks [ICML 2017] [Tensorflow] [Pytorch]
Meta Networks [ICML 2017]
MANN - Meta-Learning with Memory-Augmented Neural Networks [ICML 2016]

Causal Learning

Survey

Toward Causal Representation Learning [Proceedings of the IEEE 2021]

Paper

RIM - Recurrent Independent Mechanisms [ICLR 2021]
IRM - Invariant Risk Minimization [Arixv 5 Jul 2019] [Code] [TLlib]

Unsupervised Pre-Training

Survey

Self-supervised Learning: Generative or Contrastive [TKDE 2021]

Generative Learning

Paper

MAE - Masked Autoencoders Are Scalable Vision Learners [arXiv 11 Nov 2021]
Strategies for Pre-training Graph Neural Networks [ICLR 2020] [Pytorch]
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations [ICLR 2020] [Tensorflow]
T5 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [JMLR 2020] [Tensorflow]
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension [ACL 2020] [Pytorhc]
GPT-3 - Language Models are Few-Shot Learners [NIPS 2020]
SpanBERT: Improving Pre-training by Representing and Predicting Spans [TACL 2020]
XLNet: Generalized Autoregressive Pretraining for Language Understanding [NIPS 2019] [Tensorflow]
XLM - Cross-lingual Language Model Pretraining [NIPS 2019] [Pytorch]
GPT-2 - Language Models are Unsupervised Multitask Learners [2019] [Code]
RoBERTa: A Robustly Optimized BERT Pretraining Approach [arXiv 26 Jul 2019] [Pytorch]
ERNIE: Enhanced Representation through Knowledge Integration [arXiv 19 Apr 2019] [Code]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [NAACL 2019] [Tensorflow]
GPT - Improving Language Understanding by Generative Pre-Training [OpenAI 2018]
ULMFiT - Universal Language Model Fine-tuning for Text Classification.[ACL 2018] [project
ELMo - Deep contextualized word representations [NAACL 2018] [project]
Deep Learning of Representations for Unsupervised and Transfer Learning [ICML 2012 workshop]
Extracting and Composing Robust Features with Denoising Autoencoders [ICML 2008]

Contrastive Learning

Resources

Lightly: A python library for self-supervised learning on images [Library]

Survey

Paper

CLIP - Learning Transferable Visual Models From Natural Language Supervision [arXiv 26 Feb 2021] [Pytorch]
MoCo v3 - An Empirical Study of Training Self-Supervised Vision Transformers [ICCV 2021 Oral] [Pytorch]
SimSiam - Exploring Simple Siamese Representation Learning [CVPR 2021] [Pytorch]
BYOL - Bootstrap Your Own Latent A New Approach to Self-Supervised [NIPS 2020]
CMC - Contrastive Multiview Coding [ECCV 2020] [Pytorch]
SimCLR - A Simple Framework for Contrastive Learning of Visual Representations [ICML 2020] [Tensorflow] [Pytorch]
MoCo - Momentum Contrast for Unsupervised Visual Representation Learning [CVPR 2020] [Pytorch]
DGI - Deep Graph Infomax [ICLR 2019] [Pytorch]
Deep InfoMax - Learning deep representations by mutual information estimation and maximization [ICLR 2019] [Code]
CMC - Representation Learning with Contrastive Predictive Coding [arXiv 10 Jul 2018] [Keras]
InstDisc - Unsupervised Feature Learning via Non-Parametric Instance Discrimination [CVPR 2018] [Pytorch]

Task Adaptation

Paper

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning [ACL 2021 Outstanding]
How transferable are features in deep neural networks? [NIPS 2014]
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition [ICML 2014]
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks [arXiv 21 Dec 2013]

Catastrophic Forgetting

Resources

TLlib [Library]

Paper

Bi-tuning of pre-trained representations [arXiv 12 Nov 2020] [TLlib]
StochNorm - Stochastic Normalization [NIPS 2020] [Pytorch] [TLlib]
Co-Tuning for Transfer Learning [NIPS 2020] [Pytorch] [TLlib]
Domain Adaptive Tuning - Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [ACL 2020] [Pytorch]
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization [ACL 2020] [Pytorch]
TRADES - Theoretically Principled Trade-off between Robustness and Accuracy [ICML 2019] [Pytorch]
DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks [ICLR 2019] [Pytorch] [TLlib]
SiATL - An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models [NAACL 2019] [Pytorch]
SpotTune: Transfer Learning through Adaptive Fine-tuning [CVPR 2019] [Pytorch]
ULMFiT - Universal Language Model Fine-tuning for Text Classification [ACL 2018]
L2SP - Explicit Inductive Bias for Transfer Learning with Convolutional Networks [ICML 2018] [TLlib]
LWF - Learning without Forgetting [TPAMI 2018] [TLlib]
EWC - Overcoming catastrophic forgetting in neural networks [PNAS 2017]

Negative Transfer

Paper

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset [ICAIL 2021] [Code]
Zoo-Tuning: Adaptive Transfer from A Zoo of Models [ICML 2021] [Pytorch]
LogME: Practical Assessment of Pre-trained Models for Transfer Learning [ICML 2021] [Pytorch]
LEEP: A New Measure to Evaluate Transferability of Learned Representations [ICML 2020]
Rethinking ImageNet Pre-training [ICCV 2019]
Characterizing and Avoiding Negative Transfer [CVPR 2019]
BSS - Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning [NIPS 2019] [Pytorch] [TLlib]
L2T-ww - Learning What and Where to Transfer [ICML 2019] [Pytorch]
Taskonomy: Disentangling Task Transfer Learning [CVPR 2018 Best] [Code]
To Transfer or Not To Transfer [NIPS 2005]

Parameter Efficiency

Resources

AdapterHub: A Framework for Adapting Transformers [Library] [EMNLP 2020]

Paper

Diff Pruning - Parameter-Efficient Transfer Learning with Diff Pruning [ACL 2021] [Pytorch]
PALs - BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning [ICML 2019] [Pytorch]
Adapter Tuning - Parameter-Efficient Transfer Learning for NLP [ICML 2019] [Tensorflow]
Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks [arXiv 31 Dec 2019] [Pytorch]
Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights [ECCV 2018] [Pytorch]
Residual Adapter - Learning multiple visual domains with residual adapters [NIPS 2017]

Data Efficiency

Resources

Pretrain, Prompt, Predict [Paper List]
Few-Shot Papers [Paper List]
few-shot [Library]
OpenPrompt [Library]

Survey

Generalizing from a Few Examples: A Survey on Few-Shot Learning [10 Apr 2019]
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing [arXiv 28 Jul 2021]

Paper

Instruction Tuning - Finetuned Language Models Are Zero-Shot Learners [arXiv 3 Sep 2021] [Tensorflow]
Prefix-Tuning: Optimizing Continuous Prompts for Generation [ACL 2021] [Pytorch]
PET-TC - Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference [EACL 2020] [Pytorch]
GPT3 - Language Models are Few-Shot Learners [NIPS 2020]
A Closer Look at Few-shot Classification [ICLR 2019]
ProtoNet - Prototypical Networks for Few-shot Learning [NIPS 2017] [Pytorch]
Matching Net - Matching Networks for One Shot Learning [NIPS 2016] [Tensorflow]

Domain Adaptation

Resources

TLlib [Library]
awesome-domain-adaptation [Paper list]

Survey

A Review of Single-Source Deep Unsupervised Visual Domain Adaptation [1 Sep 2020]
Transfer Adaptation Learning: A Decade Survey [12 Mar 2019]
A Survey on Transfer Learning [KDE 2010]

Theory

Paper

MDD - Bridging Theory and Algorithm for Domain Adaptation [ICML 2019] [Pytorch] [TLlib]
Unsupervised Domain Adaptation Based on Source-guided Discrepancy [AAAI 2019]
A theory of learning from different domains [Machine Learning 2010]
Domain Adaptation: Learning Bounds and Algorithms [COLT 2009]
Learning Bounds for Domain Adaptation [NIPS 2007]
Analysis of Representations for Domain Adaptation [NIPS 2006]

Statistics Matching

Paper

RSP - Representation Subspace Distance for Domain Adaptation Regression [ICML 2021] [Pytorch]
TransNorm - Transferable Normalization: Towards Improving Transferability of Deep Neural Networks [NIPS 2019] [Pytorch]
CAN - Contrastive Adaptation Network for Unsupervised Domain Adaptation [CVPR 2019] [Pytorch]
AdaBN - Adaptive Batch Normalization for practical domain adaptation [Pattern Recognition 2018]
DeepJDOT: Deep Joint distribution optimal transport for unsupervised domain adaptation [ECCV 2018] [Keras]
JDOT - Joint Distribution Optimal Transportation for Domain Adaptation [NIPS 2017] [python] [Python Optimal Transport Library]
CMD - Central Moment Discrepancy for Unsupervised Domain Adaptation [ICLR 2017], [Code]
JAN - Deep Transfer Learning with Joint Adaptation Networks [ICML 2017] [TLlib]
Deep CORAL: Correlation Alignment for Deep Domain Adaptation [ECCV 2016] [TLlib]
DAN - Learning Transferable Features with Deep Adaptation Networks [ICML 2015] [TLlib]
DDC - Deep Domain Confusion: Maximizing for Domain Invariance [Arxiv 2014]
MMD - Optimal kernel choice for large-scale two-sample tests [NIPS 2012]

Domain Adversarial Learning

Paper

BSP - Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation [ICML 2019] [Pytorch] [TLlib]
CDAN - Conditional Adversarial Domain Adaptation [NIPS 2018] [Pytorch(official)] [Pytorch(third party)] [TLlib]
ADDA - Adversarial Discriminative Domain Adaptation [CVPR2017] [Tensorflow(Official)] [Pytorch] [TLlib]
DSN - Domain Separation Networks [NIPS 2016]
DANN - Domain-Adversarial Training of Neural Networks [JMLR 2016] [TLlib]
Simultaneous Deep Transfer Across Domains and Tasks [ICCV 2015]
DANN - Unsupervised Domain Adaptation by Backpropagation [ICML 2015] [Caffe(Official)] [Tensorflow] [Pytorch]

Paper for Application

D-adapt - Decoupled Adaptation for Cross-Domain Object Detection [arXiv 6 Oct 2021] [TLib]
ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation [CVPR 2019 Oral] [Pytorch] [TLlib]
SWDA - Strong-Weak Distribution Alignment for Adaptive Object Detection [CVPR 2019] [Pytorch]
DA-Faster - Domain Adaptive Faster R-CNN for Object Detection in the Wild [CVPR 2018] [Caffe2] [Caffe]
AdaptSeg - Learning to Adapt Structured Output Space for Semantic Segmentation [CVPR 2018] [Pytorch]
FCN-wild - FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation [arXiv 8 Dec 2016]

Hypothesis Adversarial Learning

Paper

RegDA - Regressive Domain Adaptation for Unsupervised Keypoint Detection [CVPR 2021] [TLlib]
MDD - Bridging Theory and Algorithm for Domain Adaptation [ICML 2019] [Pytorch] [TLlib]
SWD - Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation [CVPR 2019]
MCD - Maximum Classifier Discrepancy for Unsupervised Domain Adaptation [CVPR 2018] [Pytorch(Official)] [TLlib]

Domain Translation

Paper

Diversify and Match: A Domain Adaptive Representation Learning Paradigm for Object Detection [CVPR 2019] [Pytorch]
CyCADA: Cycle-Consistent Adversarial Domain Adaptation [ICML 2018] [Pytorch(official)] [TLlib]
Using simulation and domain adaptation to improve efficiency of deep robotic grasping [ICRA 2018]
GTA - Generate To Adapt: Aligning Domains using Generative Adversarial Networks [CVPR 2018] [Pytorch(Official)]
PersonGAN - Person Transfer GAN to Bridge Domain Gap for Person Re-Identification [CVPR 2018]
Unsupervised Machine Translation Using Monolingual Corpora Only [ICLR 2017]
DTN - Unsupervised Cross-Domain Image Generation [ICLR 2017] [TensorFlow]
CycleGAN - Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks [ICCV 2017] [Pytorch(Official)]
PixelDA - Unsupervised Pixel–Level Domain Adaptation with Generative Adversarial Networks [CVPR 2017] [Tensorflow(Official)] [Pytorch]
Learning from Simulated and Unsupervised Images through Adversarial Training [CVPR 2017 Oral] [Tensorflow]
CoGAN - Coupled Generative Adversarial Networks [NIPS 2016] [Pytorch(Official)]
GAN - Generative Adversarial Nets [NIPS 2014]

Semi-Supervised Learning

Survey

An Overview of Deep Semi-Supervised Learning [pdf]
Semi-Supervised Learning [pdf]

Paper

Cycle Self-Training for Domain Adaptation [NIPS 2021] [Pytorch]
Adapting ImageNet-scale models to complex distribution shifts with self-learning [27 Apr 2021]
MCC - Minimum Class Confusion for Versatile Domain Adaptation [ECCV 2020] [TLlib]
MME - Semi-supervised Domain Adaptation via Minimax Entropy [ICCV 2019] [Pytorch]
MMT - Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification [ICLR 2020] [Pytorch] [TLlib]
GCE - Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels [NIPS 2018]
CBST - Unsupervised Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training [ECCV 2018] [Pytorch]
DIRT-T - A DIRT-T Approach to Unsupervised Domain Adaptation [ICLR 2018] [Tensorflow(Official)]
Self-Ensemble - Self-Ensembling for Visual Domain Adaptation [ICLR 2018] [TLlib]
ATT - Asymmetric Tri-training for Unsupervised Domain Adaptation [ICML 2017] [TensorFlow]

Evaluation

Cross-Task Evaluation

VTAB - The Visual Task Adaptation BenchmarkDownload [pdf] [Code]
GLUE - General Language Understanding Evaluation [ICLR 2019] [Website]

Cross-Domain Evaluation

ImageNet-R - The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization [ICCV 2021] [Download]
ImageNet-Sketch - Learning Robust Global Representations by Penalizing Local Predictive Power [NIPS 2019] [Download]
DomainNet - Moment Matching for Multi-Source Domain Adaptation [ICCV 2019] [Website]
XNLI: Evaluating Cross-lingual Sentence Representations [EMNLP 2018] [Download]