Home

Awesome

Clusterability as an Alternative to Anchor Points When Learning with Noisy Labels

This code is a PyTorch implementation of the paper:

[1] Zhaowei Zhu, Yiwen Song, and Yang Liu, "Clusterability as an Alternative to Anchor Points When Learning with Noisy Labels," https://arxiv.org/abs/2102.05291.

Prerequisites

Python 3.6.6

PyTorch 1.3.0

Torchvision 0.4.1

Datasets will be downloaded to ./data/.

Run HOC + forward loss correction

On CIFAR-10 with instance 0.6 noise.

export CUDA_VISIBLE_DEVICES=0 && nohup python -u main.py --pre_type image --dataset cifar10 --loss fw --label_file_path ./data/IDN_0.6_C10_0.pt> ./out/test10.out &

On CIFAR-10 with real-world human-annotated labels

export CUDA_VISIBLE_DEVICES=0 && nohup python -u main.py --pre_type image --dataset cifar10 --loss fw --label_file_path ./data/noise_label_human.pt> ./out/test10.out &

On CIFAR-100 with instance 0.6 noise.

export CUDA_VISIBLE_DEVICES=1 && nohup python -u main.py --pre_type image --dataset cifar100 --loss fw --label_file_path ./data/IDN_0.6_C100_0.pt> ./out/test100.out &

Real-world human-annotated CIFAR-10

We collected them from Amazon Mechanical Turk (MTurk) and students at UC Santa Cruz in February 2020. We only collected one annotation for each image at the cost of ¢10 per image. The label file is available at ./data/noise_label_human.pt.

Minimal implementation of HOC

G: the number of rounds needed to estimate the consensus probabilities (See details in Algorithm 1 [1]) max_iter: the maximum number of iterations to get an estimate of T

CUDA_VISIBLE_DEVICES=0 python main_min.py --G 50 --max_iter 1500