Awesome
Bag of tricks for long-tailed visual recognition with deep convolutional neural networks
This repository is the official PyTorch implementation of Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks, which provides practical and effective tricks used in long-tailed image classification.
-
Recommond to install Github Sort Content, which can sort the columns of tables listed on github. With Github Sort Content, you can easily find the most efficient trick under each dataset in trick_gallery.md. You can see this issue on stackoverflow for more information.
Development log
-
2022-01-05
- Add DiVE ICCV 2021 in trick_gallery.md, which is a knowledge distillation method. -
2021-11-08
- Add InfluenceBalancedLoss ICCV 2021 in trick_gallery.md, which belongs to two-stage training. -
2021-05-19
- Add CONFIGs and experimental results of BBN-style sampling CVPR2020 in trick_gallery.md, which consists of a uniform sampler and a reverse sampler.
Trick gallery
Brief inroduction
We divided the long-tail realted tricks into four families: re-weighting, re-sampling, mixup training, and two-stage training. For more details of the above four trick families, see the original paper.
Detailed information :
-
Trick gallery:
Tricks, corresponding results, experimental settings, and running commands are listed in trick_gallery.md.
Main requirements
torch >= 1.4.0
torchvision >= 0.5.0
tensorboardX >= 2.1
tensorflow >= 1.14.0 #convert long-tailed cifar datasets from tfrecords to jpgs
Python 3
apex
- We provide the detailed requirements in requirements.txt. You can run
pip install requirements.txt
to create the same running environment as ours. - The apex is recommended to be installed for saving GPU memories:
pip install -U pip
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
- If the apex is not installed, the
Distributed training with DistributedDataParallel
in our codes cannot be used.
Preparing the datasets
We provide three datasets in this repo: long-tailed CIFAR (CIFAR-LT), long-tailed ImageNet (ImageNet-LT), and iNaturalist 2018 (iNat18).
The detailed information of these datasets are shown as follows:
<table> <thead> <tr> <th align="center" rowspan="3">Datasets</th> <th align="center" colspan="2">CIFAR-10-LT</th> <th align="center" colspan="2">CIFAR-100-LT</th> <th align="center" rowspan="3">ImageNet-LT</th> <th align="center" rowspan="3">iNat18</th> </tr> <tr> <td align="center" colspan="4"><b>Imbalance factor</b></td> </tr> <tr> <td align="center" ><b>100</b></td> <td align="center" ><b>50</b></td> <td align="center" ><b>100</b></td> <td align="center" ><b>50</b></td> </tr> </thead> <tbody> <tr> <td align="center" style="font-weight:normal"> Training images</td> <td align="center" style="font-weight:normal"> 12,406 </td> <td align="center" style="font-weight:normal"> 13,996 </td> <td align="center" style="font-weight:normal"> 10,847 </td> <td align="center" style="font-weight:normal"> 12,608 </td> <td align="center" style="font-weight:normal">11,5846</td> <td align="center" style="font-weight:normal">437,513</td> </tr> <tr> <td align="center" style="font-weight:normal"> Classes</td> <td align="center" style="font-weight:normal"> 50 </td> <td align="center" style="font-weight:normal"> 50 </td> <td align="center" style="font-weight:normal"> 100 </td> <td align="center" style="font-weight:normal"> 100 </td> <td align="center" style="font-weight:normal"> 1,000 </td> <td align="center" style="font-weight:normal">8,142</td> </tr> <tr> <td align="center" style="font-weight:normal">Max images</td> <td align="center" style="font-weight:normal">5,000</td> <td align="center" style="font-weight:normal">5,000</td> <td align="center" style="font-weight:normal">500</td> <td align="center" style="font-weight:normal">500</td> <td align="center" style="font-weight:normal">1,280</td> <td align="center" style="font-weight:normal">1,000</td> </tr> <tr> <td align="center" style="font-weight:normal" >Min images</td> <td align="center" style="font-weight:normal">50</td> <td align="center" style="font-weight:normal">100</td> <td align="center" style="font-weight:normal">5</td> <td align="center" style="font-weight:normal">10</td> <td align="center" style="font-weight:normal">5</td> <td align="center" style="font-weight:normal">2</td> </tr> <tr> <td align="center" style="font-weight:normal">Imbalance factor</td> <td align="center" style="font-weight:normal">100</td> <td align="center" style="font-weight:normal">50</td> <td align="center" style="font-weight:normal">100</td> <td align="center" style="font-weight:normal">50</td> <td align="center" style="font-weight:normal">256</td> <td align="center" style="font-weight:normal">500</td> </tr> </tbody> </table> <font size=2> - `Max images` and `Min images` represents the number of training images in the largest and smallest classes, respectively.</font><font size=2> - CIFAR-10-LT-100
means the long-tailed CIFAR-10 dataset with the imbalance factor $\beta = 100$.</font>
<font size=2> - Imbalance factor
is defined as $\beta = \frac{\text{Max images}}{\text{Min images}}$.</font>
-
Data format
The annotation of a dataset is a dict consisting of two field: annotations
and num_classes
.
The field annotations
is a list of dict with
image_id
, fpath
, im_height
, im_width
and category_id
.
Here is an example.
{
'annotations': [
{
'image_id': 1,
'fpath': '/data/iNat18/images/train_val2018/Plantae/7477/3b60c9486db1d2ee875f11a669fbde4a.jpg',
'im_height': 600,
'im_width': 800,
'category_id': 7477
},
...
]
'num_classes': 8142
}
-
CIFAR-LT
Cao et al., NeurIPS 2019 followed Cui et al., CVPR 2019 's method to generate the CIFAR-LT randomly. They modify the CIFAR datasets provided by PyTorch as this file shows.
-
ImageNet-LT
You can use the following steps to convert from the original images of ImageNet-LT.
- Download the original ILSVRC-2012. Suppose you have downloaded and reorgnized them at path
/downloaded/ImageNet/
, which should contain two sub-directories:/downloaded/ImageNet/train
and/downloaded/ImageNet/val
. - Download the train/test splitting files (
ImageNet_LT_train.txt
andImageNet_LT_test.txt
) in GoogleDrive or Baidu Netdisk (password: cj0g). Suppose you have downloaded them at path/downloaded/ImageNet-LT/
. - Run tools/convert_from_ImageNet.py, and you will get two jsons:
ImageNet_LT_train.json
andImageNet_LT_val.json
.
# Convert from the original format of ImageNet-LT python tools/convert_from_ImageNet.py --input_path /downloaded/ImageNet-LT/ --image_path /downloaed/ImageNet/ --output_path ./
- Download the original ILSVRC-2012. Suppose you have downloaded and reorgnized them at path
-
iNat18
You can use the following steps to convert from the original format of iNaturalist 2018.
- The images and annotations should be downloaded at iNaturalist 2018 firstly. Suppose you have downloaded them at path
/downloaded/iNat18/
. - Run tools/convert_from_iNat.py, and use the generated
iNat18_train.json
andiNat18_val.json
to train.
# Convert from the original format of iNaturalist # See tools/convert_from_iNat.py for more details of args python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/train2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_train.json python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/val2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_val.json
- The images and annotations should be downloaded at iNaturalist 2018 firstly. Suppose you have downloaded them at path
Usage
In this repo:
-
The results of CIFAR-LT (ResNet-32) and ImageNet-LT (ResNet-10), which need only one GPU to train, are gotten by DataParallel training with apex.
-
The results of iNat18 (ResNet-50), which need more than one GPU to train, are gotten by DistributedDataParallel training with apex.
-
If more than one GPU is used, DistributedDataParallel training is efficient than DataParallel training, especially when the CPU calculation forces are limited.
Training
Parallel training with DataParallel
1, To train
# To train long-tailed CIFAR-10 with imbalanced ratio of 50.
# `GPUs` are the GPUs you want to use, such as `0,4`.
bash data_parallel_train.sh configs/test/data_parallel.yaml GPUs
Distributed training with DistributedDataParallel
1, Change the NCCL_SOCKET_IFNAME in run_with_distributed_parallel.sh to [your own socket name].
export NCCL_SOCKET_IFNAME = [your own socket name]
2, To train
# To train long-tailed CIFAR-10 with imbalanced ratio of 50.
# `GPUs` are the GPUs you want to use, such as `0,1,4`.
# `NUM_GPUs` are the number of GPUs you want to use. If you set `GPUs` to `0,1,4`, then `NUM_GPUs` should be `3`.
bash distributed_data_parallel_train.sh configs/test/distributed_data_parallel.yaml NUM_GPUs GPUs
Validation
You can get the validation accuracy and the corresponding confusion matrix after running the following commands.
See main/valid.py for more details.
1, Change the TEST.MODEL_FILE in the yaml to your own path of the trained model firstly.
2, To do validation
# `GPUs` are the GPUs you want to use, such as `0,1,4`.
python main/valid.py --cfg [Your yaml] --gpus GPUS
The comparison between the baseline results using our codes and the references [<a href="https://arxiv.org/abs/1901.05555">Cui</a>, <a href="https://arxiv.org/abs/1910.09217">Kang</a>]
- We use Top-1 error rates as our evaluation metric.
- For the ImageNet-LT, we find that the color_jitter augmentation was not included in our experiments, which, however, is adopted by other methods. So, in this repo, we add the color_jitter augmentation on ImageNet-LT. The old baseline without color_jitter is 64.89, which is +1.15 points higher than the new baseline.
- You can click the
Baseline
in the table below to see the experimental settings and corresponding running commands.
Paper collection of long-tailed visual recognition
Awesome-of-Long-Tailed-Recognition
Long-Tailed-Classification-Leaderboard
Citation
@inproceedings{zhang2021tricks,
author = {Yongshun Zhang and Xiu{-}Shen Wei and Boyan Zhou and Jianxin Wu},
title = {Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks},
pages = {3447--3455},
booktitle = {AAAI},
year = {2021},
}
Contacts
If you have any question about our work, please do not hesitate to contact us by emails provided in the paper.