Home

Awesome

awesome-mixed-sample-data-augmentation

This repo is a collection of awesome things about mixed sample data augmentation, including papers, code, etc.


Basic Method

We introduce a basic usage of mixed sample data augmentation, which was first proposed in mixup: Beyond Empirical Risk Minimization [ICLR2018] [code].

Formulation

In mixup, the virtual training feature-target samples are produced as,

x˜ = λxi + (1 − λ)xj
y˜ = λyi + (1 − λ)yj

where (xi, yi) and (xj, yj) are two feature-target samples drawn at random from the training data, λ∈[0, 1]. The mixup hyper-parameter α controls the strength of interpolation between feature-target pairs and λ∼Beta(α, α).

Training Pipeline

The simple and basic training pipeline is shown as the following Figure,

Core Code

The few lines of code necessary to implement mixup training in PyTorch

for (x1, y1), (x2, y2) in zip(loader1, loader2): 
  lam = numpy.random.beta(alpha, alpha) 
  x = Variable(lam * x1 + (1. - lam) * x2) 
  y = Variable(lam * y1 + (1. - lam) * y2) 
  optimizer.zero_grad() 
  loss(net(x), y).backward()
  optimizer.step()

Application

Classification Tasks (Image, Text, Audio, ...)

Semi-Supervised Learning

Object Detection and Localization

Natural Language Processing

Image Segmentation

Super Resolution

Novelty Detection

Generative Adversarial Networks

Domain Adaptation

Few-shot Learning

Machine Learning

Analysis