Home

Awesome

[Re] Can gradient clipping mitigate label noise?

Python 3.8 License: MIT Code style: black

ReScience | OpenReview

This is a non-official PyTorch implementation of the ICLR 2020 paper "Can gradient clipping mitigate label noise?" by Menon et al. This paper studies the robustness of gradient clipping to symmetric label noise, and proposes partially Huberised (PHuber) versions of standard losses, which perform well in the presence of label noise.

For the experiments, the following losses are also implemented:

This repository reproduces all the experiments of the original paper, as part of our participation in the ML Reproducibility Challenge 2020. Our report can be found on OpenReview and in the ReScience C journal.

Table of Contents

Dependencies

This project requires Python >= 3.8. Dependencies can be installed with:

pip install -r requirements.txt

Training

This project uses Hydra to configure experiments. Configurations can be overridden through config files (in conf/) and the command line. For more information, check out the Hydra documentation.

With Hydra, configurations can be fully customized directly though the command line. To find out more about the configuration options, run:

python3 train.py --help

To run the experiments from the paper based on real-world datasets (72 different configurations), only 5 arguments need to be provided:

Note: When choosing a dataset and model, the hyper-parameters (e.g. number of epochs, batch size, optimizer, learning rate scheduler, ...) are automatically changed to those used by the authors in their experiments. If needed, these hyper-parameters can also be overridden through command line arguments.

Examples

Training LeNet on MNIST using cross-entropy loss and no label corruption:

python3 train.py dataset=mnist model=lenet loss=ce dataset.train.corrupt_prob=0.0

Training a ResNet-50 on CIFAR-10 using the partially Huberised cross-entropy loss (PHuber-CE) with τ=2, and label corruption probability ρ of 0.2:

python3 train.py dataset=cifar10 model=resnet50 loss=phuber_ce loss.tau=2 dataset.train.corrupt_prob=0.2

Training a ResNet-50 on CIFAR-100 using the Generalized Cross Entropy loss (GCE) and label corruption probability ρ of 0.6, with mixed precision:

python3 train.py dataset=cifar100 model=resnet50 loss=gce dataset.train.corrupt_prob=0.6 mixed_precision=true

Training LeNet on MNIST using cross-entropy loss, and varying label corruption probability ρ (0.0, 0.2, 0.4 and 0.6). This uses Hydra's multi-run flag for parameter sweeps:

python3 train.py --multirun dataset=mnist model=lenet loss=ce dataset.train.corrupt_prob=0.0,0.2,0.4,0.6

Run metrics and saved models

By default, run metrics are logged to TensorBoard. In addition, the saved models, training parameters and training log can be found in the run's directory, in outputs/.

Evaluation

To evaluate a trained model using eval.py, you need to provide:

For example, to evaluate a LeNet model trained on MNIST saved as models/lenet.pt, run:

python3 eval.py dataset=mnist model=lenet checkpoint=models/lenet.pt

By default, trained models are only evaluated on the test set. This can be modified by overriding the dataset.train.use, dataset.val.use and dataset.test.use arguments.

To find out more about the configuration options for evaluation, use the --help flag.

Results

MNIST with LeNet-5

Loss functionρ = 0.0ρ = 0.2ρ = 0.4ρ = 0.6
CE99.1±0.198.8±0.098.6±0.098.0±0.1
CE + clipping97.0±0.096.5±0.095.7±0.194.7±0.1
Linear95.0±3.598.5±0.198.2±0.097.6±0.0
GCE98.8±0.098.7±0.098.5±0.098.1±0.0
PHuber-CE τ=1099.0±0.098.8±0.198.5±0.197.6±0.0
PHuber-GCE τ=1098.9±0.098.7±0.098.4±0.098.0±0.0

mnist_results

CIFAR-10 with ResNet-50

Loss functionρ = 0.0ρ = 0.2ρ = 0.4ρ = 0.6
CE95.8±0.184.0±0.367.8±0.344.0±0.2
CE + clipping89.3±0.082.6±1.678.7±0.267.6±0.1
Linear94.1±0.191.4±0.586.0±2.458.6±5.2
GCE95.3±0.092.5±0.182.4±0.153.3±0.3
PHuber-CE τ=294.8±0.092.8±0.287.8±0.273.2±0.2
PHuber-GCE τ=1095.4±0.192.2±0.281.5±0.254.3±0.5

cifar10_results

CIFAR-100 with ResNet-50

Loss functionρ = 0.0ρ = 0.2ρ = 0.4ρ = 0.6
CE75.4±0.362.2±0.445.8±0.926.7±0.1
CE + clipping23.5±0.220.4±0.416.2±0.512.9±0.1
Linear13.7±0.78.2±0.35.9±0.73.9±0.3
GCE73.3±0.268.5±0.359.5±0.540.3±0.4
PHuber-CE τ=1060.6±1.154.8±1.243.1±1.124.3±0.8
PHuber-GCE τ=1072.7±0.168.4±0.160.2±0.242.2±0.4
PHuber-CE τ=5075.4±0.265.9±0.249.1±0.226.9±0.0

cifar100_results

Pretrained models

For each configuration, the models obtained during the first trial are available on Google Drive:

Synthetic experiments

This repo also reproduces the experiments from the paper based on synthetic datasets. These experiments use simple linear models, which are implemented using NumPy and SciPy.
To reproduce the first synthetic experiment (fig. 2a from the paper), run:

python3 synthetic_1.py

To reproduce the second synthetic experiment (fig. 2b from the paper), run:

python3 synthetic_2.py

Project structure

The codebase is separated into 3 parts:

phuber/

This directory contains all the code related to the deep learning experiments on MNIST, CIFAR-10 and CIFAR-100, using PyTorch.
This includes:

synthetic/

This directory contains all the code related to experiments on synthetic data with linear models, using NumPy and SciPy.

conf/

This directory contains all the Hydra config files for both types of experiments:

References

Citation

If you find any piece of our code or report useful, please cite:

@inproceedings{mizrahi2021re,
title={[Re] Can gradient clipping mitigate label noise?},
author={David Mizrahi and O{\u{g}}uz Kaan Y{\"u}ksel and Aiday Marlen Kyzy},
booktitle={ML Reproducibility Challenge 2020},
year={2021},
url={https://openreview.net/forum?id=TM_SgwWJA23}
}