Awesome
A Topological Filter for Learning with Label Noise (NeurIPS 2020, Paper)
Requirements
- PyTorch 0.4.1 (have not tested on other versions)
- Python 3.6 (for the purpose of compiling C++ code. Other 3.x versions should also work.)
- scipy 1.1.0 (this is due to the computation of distribution mode)
- termcolor, etc (which can be easily installed with pip)
Usage
- Compile the C++ code for computing the connected components. In folder
ref
, run./compile_pers_lib.sh
(by default it requires Python 3.6. If you are using other Python versions, modify the command insidecompile_pers_lib.sh
). - Run
train.py
with the commands like below:
python train.py --every 5 --start_clean 30 --k_cc 4 --k_outlier 32 --seed 77 --type uniform --noise 0.4 --patience 65 --gpus 0 --dataset cifar10 --zeta 0.5
- For point cloud dataset, run the command with
pc
argument:
python train.py --gpus 2 --every 5 --start_clean 10 --k_outlier 30 --k_cc 100 --noise 0.8 --type uniform --patience 60 --seed 77 --dataset pc --net pc --milestone 35 --zeta 2
Here the major parameters are:
every
: the frequency of data collection.start_clean
: when to start data collection.k_cc
: the parameter for computing the KNN graph when finding the largest connected component.k_outlier
: the parameter for computing the KNN graph when applying zeta filtering.seed
: the random seed.type
: the noise type. Options includeuniform
andasym
.noise
: the noise level.patience
: this is a trick to save training time. If we observe no obvious improvement of validation accuracy for a consecutive number ofN
epochs, we stop the training.gpus
: run on which GPU.dataset
: which dataset to use. Options includecifar10
,cifar100
andpc
. For thepc
dataset, it can be downloaded from https://github.com/charlesq34/pointnetzeta
: the parameter forzeta
filtering. Note that, when setting zeta to be > 1.0, we will use majority voting to remove the outliers. This sometimes achieves better performance.
Practical tips: For the extrmely noisy scenarios (noise level >= 0.8), we observe setting a larger k_cc
is better.
Our code will be further improved to make it cleaner and easier to use.
Reference:
@inproceedings{wu2020topological,
title={A Topological Filter for Learning with Label Noise},
author={Wu, Pengxiang and Zheng, Songzhu and Goswami, Mayank and Metaxas, Dimitris and Chen, Chao},
booktitle={Advances in Neural Information Processing Systems},
year={2020}
}