Home

Awesome

FCC: Feature Clusters Compression for Long-Tailed Visual Recognition

This repository is the official PyTorch implementation of the paper in CVPR 2023:

FCC: Feature Clusters Compression for Long-Tailed Visual Recognition<br/> Jian Li, Ziyao Meng, Daqian Shi, Rui Song, Xiaolei Diao, Jingwen Wang <br/> [PDF]  

<p align="center"> <img src='./resources/paper_image.jpg'> </p>

Feature Clusters Compression (FCC)

FCC is a simple and generic method for long-tailed visual recognition, which can be easily achieved and friendly combined with existing long-tailed methods to further boost them. FCC works on backbone features from the last layer of backbone networks. The core code of FCC is available at "lib/fcc/fcc_functions.py.  

<p align="center"> <img src='./resources/novelty.jpg' height="70%" width="70%"> </p>

Main requirements

torch >= 1.7.1 
tensorboardX >= 2.1 
tensorflow >= 1.14.0 
Python 3.6
apex

Detailed requirement

pip install -r requirements.txt

The apex is recommended to be installed for saving GPU memories:

pip install -U pip
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Prepare datasets

This part is mainly based on https://github.com/zhangyongshun/BagofTricks-LT and https://github.com/Bazinga699/NCL

Three widely used datasets are provided in this repo: long-tailed CIFAR (CIFAR-LT), long-tailed ImageNet (ImageNet-LT) and iNaturalist 2018 (iNat18).

The detailed information of these datasets are shown as follows:

<table> <thead> <tr> <th align="center" rowspan="3">Datasets</th> <th align="center" colspan="2">CIFAR-10-LT</th> <th align="center" colspan="2">CIFAR-100-LT</th> <th align="center" rowspan="3">ImageNet-LT</th> <th align="center" rowspan="3">iNat18</th> </tr> <tr> <td align="center" colspan="4"><b>Imbalance factor</b></td> </tr> <tr> <td align="center" ><b>100</b></td> <td align="center" ><b>50</b></td> <td align="center" ><b>100</b></td> <td align="center" ><b>50</b></td> </tr> </thead> <tbody> <tr> <td align="center" style="font-weight:normal"> Training images</td> <td align="center" style="font-weight:normal"> 12,406 </td> <td align="center" style="font-weight:normal"> 13,996 </td> <td align="center" style="font-weight:normal"> 10,847 </td> <td align="center" style="font-weight:normal"> 12,608 </td> <td align="center" style="font-weight:normal">11,5846</td> <td align="center" style="font-weight:normal">437,513</td> </tr> <tr> <td align="center" style="font-weight:normal"> Classes</td> <td align="center" style="font-weight:normal"> 50 </td> <td align="center" style="font-weight:normal"> 50 </td> <td align="center" style="font-weight:normal"> 100 </td> <td align="center" style="font-weight:normal"> 100 </td> <td align="center" style="font-weight:normal"> 1,000 </td> <td align="center" style="font-weight:normal">8,142</td> </tr> <tr> <td align="center" style="font-weight:normal">Max images</td> <td align="center" style="font-weight:normal">5,000</td> <td align="center" style="font-weight:normal">5,000</td> <td align="center" style="font-weight:normal">500</td> <td align="center" style="font-weight:normal">500</td> <td align="center" style="font-weight:normal">1,280</td> <td align="center" style="font-weight:normal">1,000</td> </tr> <tr> <td align="center" style="font-weight:normal" >Min images</td> <td align="center" style="font-weight:normal">50</td> <td align="center" style="font-weight:normal">100</td> <td align="center" style="font-weight:normal">5</td> <td align="center" style="font-weight:normal">10</td> <td align="center" style="font-weight:normal">5</td> <td align="center" style="font-weight:normal">2</td> </tr> <tr> <td align="center" style="font-weight:normal">Imbalance factor</td> <td align="center" style="font-weight:normal">100</td> <td align="center" style="font-weight:normal">50</td> <td align="center" style="font-weight:normal">100</td> <td align="center" style="font-weight:normal">50</td> <td align="center" style="font-weight:normal">256</td> <td align="center" style="font-weight:normal">500</td> </tr> </tbody> </table> -"Max images" and "Min images" represents the number of training images in the largest and smallest classes, respectively.

-"CIFAR-10-LT-100" means the long-tailed CIFAR-10 dataset with the imbalance factor beta = 100.

-"Imbalance factor" is defined as: beta = Max images / Min images.

The annotation of a dataset is a dict consisting of two field: annotations and num_classes. The field annotations is a list of dict with image_id, fpath, im_height, im_width and category_id.

Here is an example.

{
    'annotations': [
                    {
                        'image_id': 1,
                        'fpath': '/data/iNat18/images/train_val2018/Plantae/7477/3b60c9486db1d2ee875f11a669fbde4a.jpg',
                        'im_height': 600,
                        'im_width': 800,
                        'category_id': 7477
                    },
                    ...
                   ]
    'num_classes': 8142
}

Usage

First, prepare the dataset and modify the relevant paths in configs/FCC/xxx.yaml

Parallel training with DataParallel

1, Train
# Train long-tailed CIFAR-100 with imbalanced ratio of 100. 
# In run.sh, `GPUs` are the GPUs you want to use, such as '0' or`0,1,2,3`.
bash run.sh

2, If you want to train different methods with FCC.
# Just modify the "configs/xxx.ymal" in run.sh.

Citation

@InProceedings{Li_2023_CVPR,
    author    = {Li, Jian and Meng, Ziyao and Shi, Daqian and Song, Rui and Diao, Xiaolei and Wang, Jingwen and Xu, Hao},
    title     = {FCC: Feature Clusters Compression for Long-Tailed Visual Recognition},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {24080-24089}
}

Acknowledgements

This is a project based on Bag of tricks.