Awesome

awesome-mixed-sample-data-augmentation

This repo is a collection of awesome things about mixed sample data augmentation, including papers, code, etc.

Basic Method

We introduce a basic usage of mixed sample data augmentation, which was first proposed in mixup: Beyond Empirical Risk Minimization [ICLR2018] [code].

Formulation

In mixup, the virtual training feature-target samples are produced as,

x˜ = λxi + (1 − λ)xj
y˜ = λyi + (1 − λ)yj

where (xi, yi) and (xj, yj) are two feature-target samples drawn at random from the training data, λ∈[0, 1]. The mixup hyper-parameter α controls the strength of interpolation between feature-target pairs and λ∼Beta(α, α).

Training Pipeline

The simple and basic training pipeline is shown as the following Figure,

Core Code

The few lines of code necessary to implement mixup training in PyTorch

for (x1, y1), (x2, y2) in zip(loader1, loader2): 
  lam = numpy.random.beta(alpha, alpha) 
  x = Variable(lam * x1 + (1. - lam) * x2) 
  y = Variable(lam * y1 + (1. - lam) * y2) 
  optimizer.zero_grad() 
  loss(net(x), y).backward()
  optimizer.step()

Application

Classification Tasks (Image, Text, Audio, ...)

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty [ICLR2020] [code]
SuperMix: Supervising the Mixing Data Augmentation [Arxiv2020] [code]
Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification [AAAI2020]
Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization [Arxiv2020]
Attribute Mix: Semantic Data Augmentation for Fine-grained Recognition [Arxiv2020]
Understanding and Enhancing Mixed Sample Data Augmentation [Arxiv2020] [code]
Attentive CutMix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification [ICASSP2020]
Mixup-breakdown: a consistency training method for improving generalization of speech separation models [ICASSP2020]
Cutmix: Regularization strategy to train strong classifiers with localizable features [ICCV2019] [code]
Improved Mixed-Example Data Augmentation [WACV2019] [code]
Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy [Arxiv2019]
Target-Directed MixUp for Labeling Tangut Characters [ICDAR2019]
Manifold Mixup improves text recognition with CTC loss [Arixv2019]
Manifold Mixup: Better Representations by Interpolating Hidden States [ICML2019] [code]
Data augmentation using random image cropping and patching for deep CNNs [TCSVT2019] [code]
MixUp as Locally Linear Out-Of-Manifold Regularization [AAAI2019]
On Adversarial Mixup Resynthesis [NeurIPS2019] [code]
On mixup training: Improved calibration and predictive uncertainty for deep neural networks [NeurIPS2019]
mixup: Beyond Empirical Risk Minimization [ICLR2018] [code]
Learning from between-class examples for deep sound recognition [ICLR2018] [code]
Between-class Learning for Image Classification [CVPR2018] [code]
Data Augmentation by Pairing Samples for Images Classification [Arxiv2018]
Rare Sound Event Detection Using Deep Learning and Data Augmentation [Interspeech2019]
Mixup Learning Strategies for Text-independent Speaker Verification [Interspeech2019]
Acoustic Scene Classification with Mismatched Devices Using CliqueNets and Mixup Data Augmentation [Interspeech2019]
Deep Convolutional Neural Network with Mixup for Environmental Sound Classification [PRCV2018]
Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition [Interspeech2018]
An investigation of mixup training strategies for acoustic models in ASR [Interspeech2018] [code]
Understanding Mixup Training Methods [IEEE ACCESS 2018]

Semi-Supervised Learning

Rethinking Image Mixture for Unsupervised Visual Representation Learning [Arxiv2020]
FocalMix: Semi-Supervised Learning for 3D Medical Image Detection [Arxiv2020]
ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring [ICLR2020] [code]
DivideMix: Learning with Noisy Labels as Semi-supervised Learning [ICLR2020] [code]
OpenMix: Reviving Known Knowledge for Discovering Novel Visual Categories in An Open World [Arxiv2020]
MixPUL: Consistency-based Augmentation for Positive and Unlabeled Learning [Arxiv2020]
ROAM: Random Layer Mixup for Semi-Supervised Learning in Medical Imaging [Arxiv2020]
Interpolation Consistency Training for Semi-Supervised Learning [IJCAI2019] [code]
RealMix: Towards Realistic Semi-Supervised Deep Learning Algorithms [Arxiv2019] [code]
Unifying semi-supervised and robust learning by mixup [ICLR Workshop 2019]
On Adversarial Mixup Resynthesis [NeurIPS2019] [code]
Unifying semi-supervised and robust learning by mixup [ICLR2019 Workshop]
Mixmatch: A holistic approach to semi-supervised learning [NeurIPS2019] [code]
Semi-Supervised and Task-Driven Data Augmentation [IPMI2019]

Object Detection and Localization

Mixup Regularization for Region Proposal based Object Detectors [Arxiv2020]
FocalMix: Semi-Supervised Learning for 3D Medical Image Detection [Arxiv2020]
Cutmix: Regularization strategy to train strong classifiers with localizable features [ICCV2019] [code]

Natural Language Processing

On mixup training: Improved calibration and predictive uncertainty for deep neural networks [NeurIPS2019]

Image Segmentation

ROAM: Random Layer Mixup for Semi-Supervised Learning in Medical Imaging [Arxiv2020]
Improving Robustness of Deep Learning Based Knee MRI Segmentation: Mixup and Adversarial Domain Adaptation [Arxiv2019]
Improving Data Augmentation for Medical Image Segmentation [MIDL2018]

Super Resolution

Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy [Arxiv2020]

Novelty Detection

Multi-class Novelty Detection Using Mix-up Technique [WACV2020]

Generative Adversarial Networks

A U-Net Based Discriminator for Generative Adversarial Networks [CVPR2020]
Mixed batches and symmetric discriminators for GAN training [ICML2018]
mixup: Beyond Empirical Risk Minimization [ICLR2018] [code]

Domain Adaptation

Improve Unsupervised Domain Adaptation with Mixup Training [Arxiv2020]
Adversarial Domain Adaptation with Domain Mixup [AAAI2020]

Few-shot Learning

Charting the Right Manifold: Manifold Mixup for Few-shot Learning [WACV2020] [code]

Machine Learning

An Experimental Evaluation of Mixup Regression Forests [EXPERT SYST APPL 2020]

Analysis

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data [Arxiv2019]