Awesome
<div align="center">MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition
Shuang Li, Kaixiong Gong, et al.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [CVPR 2021 PDF]
</div>This repository contains the code of our CVPR 2021 work "MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition".
Abstract
Real-world training data usually exhibits long-tailed distribution, where several majority classes have a significantly larger number of samples than the remaining minority classes. This imbalance degrades the performance of typical supervised learning algorithms designed for balanced training sets. In this paper, we address this issue by augmenting minority classes with a recently proposed implicit semantic data augmentation (ISDA) algorithm, which produces diversified augmented samples by translating deep features along many semantically meaningful directions. Importantly, given that ISDA estimates the classconditional statistics to obtain semantic directions, we find it ineffective to do this on minority classes due to the insufficient training data. To this end, we propose a novel approach to learn transformed semantic directions with metalearning automatically. In specific, the augmentation strategy during training is dynamically optimized, aiming to minimize the loss on a small balanced validation set, which is approximated via a meta update step. Extensive empirical results on CIFAR-LT-10/100, ImageNet-LT, and iNaturalist2017/2018 validate the effectiveness of our method
<p align="center"> <img src="assets/illustration.png" alt="drawing" width="800"/> </p>If you find this idea or code useful for your research, please consider citing our paper:
@inproceedings{li2021metasaug,
title={Metasaug: Meta semantic augmentation for long-tailed visual recognition},
author={Li, Shuang and Gong, Kaixiong and Liu, Chi Harold and Wang, Yulin and Qiao, Feng and Cheng, Xinjing},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5212--5221},
year={2021}
}
Prerequisite
- PyTorch >= 1.2.0
- Python3
- torchvision
- PIL
- argparse
- numpy
Evaluation
We provide several trained models of MetaSAug for evaluation.
Testing on CIFAR-LT-10/100:
sh scripts/MetaSAug_CE_test.sh
sh scripts/MetaSAug_LDAM_test.sh
Testing on ImageNet and iNaturalist18:
sh ImageNet_iNat/test.sh
The trained models are in Google Drive.
Getting Started
Dataset
- Long-tailed CIFAR10/100: The long-tailed version of CIFAR10/100. Code for coverting to long-tailed version is in data_utils.py.
- ImageNet-LT: The long-tailed version of ImageNet. [Long-tailed annotations]
- iNaturalist2017: A natural long-tailed dataset.
- iNaturalist2018: A natural long-tailed dataset.
Training
Training on CIFAR-LT-10/100:
CIFAR-LT-100, MetaSAug with LDAM loss
python3.6 MetaSAug_LDAM_train.py --gpu 0 --lr 0.1 --lam 0.75 --imb_factor 0.05 --dataset cifar100 --num_classes 100 --save_name MetaSAug_cifar100_LDAM_imb0.05 --idx 1
Or run the script:
sh scripts/MetaSAug_LDAM_train.sh
Training on ImageNet-LT:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 --master_port 53212 train.py --lr 0.0003 --meta_lr 0.1 --workers 0 --batch_size 256 --epochs 20 --dataset ImageNet_LT --num_classes 1000 --data_root ../ImageNet
Or run the script:
sh ImageNet_iNat/scripts/train.sh
Note: Training on large scale datasets like ImageNet-LT and iNaturalist2017/2018 involves multiple gpus for faster speed. To achieve better generalizable representations, vanilla CE loss is used for training the network in the early training stage. For convenience, the training starts from the pre-trained models, e.g., ImageNet-LT, iNat18 (both from project cRT).
Results and models
CIFAR-LT-10
Model | Imb. | Top-1 Error | Download | Model | Imb. | Top-1 Error | Download |
---|---|---|---|---|---|---|---|
MetaSAug+LDAM | 200 | 22.65 | ResNet32 | MetaSAug+CE | 200 | 23.11 | ResNet32 |
MetaSAug+LDAM | 100 | 19.34 | ResNet32 | MetaSAug+CE | 100 | 19.46 | ResNet32 |
MetaSAug+LDAM | 50 | 15.66 | ResNet32 | MetaSAug+CE | 50 | 15.97 | ResNet32 |
MetaSAug+LDAM | 20 | 11.90 | ResNet32 | MetaSAug+CE | 20 | 12.36 | ResNet32 |
MetaSAug+LDAM | 10 | 10.32 | ResNet32 | MetaSAug+CE | 10 | 10.56 | ResNet32 |
CIFAR-LT-100
Model | Imb. | Top-1 Error | Download | Model | Imb. | Top-1 Error | Download |
---|---|---|---|---|---|---|---|
MetaSAug+LDAM | 200 | 56.91 | ResNet32 | MetaSAug+CE | 200 | 60.06 | ResNet32 |
MetaSAug+LDAM | 100 | 51.99 | ResNet32 | MetaSAug+CE | 100 | 53.13 | ResNet32 |
MetaSAug+LDAM | 50 | 47.73 | ResNet32 | MetaSAug+CE | 50 | 48.10 | ResNet32 |
MetaSAug+LDAM | 20 | 42.47 | ResNet32 | MetaSAug+CE | 20 | 42.15 | ResNet32 |
MetaSAug+LDAM | 10 | 38.72 | ResNet32 | MetaSAug+CE | 10 | 38.27 | ResNet32 |
ImageNet-LT
Model | Top-1 Error | Download |
---|---|---|
MetaSAug | 52.33 | ResNet50 |
iNaturalist18
Model | Top-1 Error | Download |
---|---|---|
MetaSAug | 30.50 | ResNet50 |
Acknowledgements
Some codes in this project are adapted from Meta-class-weight and cRT. We thank them for their excellent projects.