Awesome
To Smooth or Not? When Label Smoothing Meets Noisy Labels
This repository is the official Pytorch implementation of "To Smooth or Not? When Label Smoothing Meets Noisy Labels" accepted by ICML2022 (Oral).
Plug-in implementation of (Generalized) Label Smoothing in PyTorch
import torch
import torch.nn.functional as F
def loss_gls(logits, labels, smooth_rate=0.1):
# logits: model prediction logits before the soft-max, with size [batch_size, classes]
# labels: the (noisy) labels for evaluation, with size [batch_size]
# smooth_rate: could go either positive or negative,
# smooth_rate candidates we adopted in the paper: [0.8, 0.6, 0.4, 0.2, 0.0, -0.2, -0.4, -0.6, -0.8, -1.0, -2.0, -4.0, -6.0, -8.0].
confidence = 1. - smooth_rate
logprobs = F.log_softmax(logits, dim=-1)
nll_loss = -logprobs.gather(dim=-1, index=labels.unsqueeze(1))
nll_loss = nll_loss.squeeze(1)
smooth_loss = -logprobs.mean(dim=-1)
loss = confidence * nll_loss + smooth_rate * smooth_loss
loss_numpy = loss.data.cpu().numpy()
num_batch = len(loss_numpy)
return torch.sum(loss)/num_batch
Required Packages & Environment
We recommend readers build an virtual environment and install required packages in requirements.txt
.
Experiments on synthetic noisy CIFAR dataset
Direct training on CIFAR-10
For Vanilla Loss and PLS, direct training works better when learning with symmetric noisy labels under noise rate 0.2. Run the code bellow to reproduce our results:
CUDA_VISIBLE_DEVICES=0 python3 main_GLS_direct_train.py --noise_type symmetric --noise_rate 0.2
Warm-up with CE loss
When noise rates are large, warming up with CE loss makes PLS and NLS reaches a better performance. Run the code bellow to generate the warm-up model:
CUDA_VISIBLE_DEVICES=0 python3 main_warmup.py --noise_type symmetric --noise_rate 0.2
After the warming up, proceed with GLS:
CUDA_VISIBLE_DEVICES=0 python3 main_GLS_load.py --noise_type symmetric --noise_rate 0.2
Experiments on CIFAR-N dataset
You may want to refer to "CIFAR-N Github Page", and modify the file loss.py
by referring to the loss_gls
plug-in implementation specified above.
Details of key arguments:
In experiments, we formulate GLS as wa * Vanilla Loss + wb * GLS
.
- --lr: learning rate
- --noise_rate: the error rate in symmetric noise model
- --n_epoch: number of epochs
- --wa: the weight of Vanilla Loss (default is 0)
- --wb: the weight of GLS (default is 1)
- --smooth_rate: the smooth rate in GLS
Citation
If you use our code, please cite the following paper:
@inproceedings{Wei2022ToSO,
title={To Smooth or Not? When Label Smoothing Meets Noisy Labels},
author={Jiaheng Wei and Hangyu Liu and Tongliang Liu and Gang Niu and Yang Liu},
booktitle={ICML},
year={2022}
}