Home

Awesome

<!-- <h1 align="center"> HeroLT </h1> -->

HeroLT is a comprehensive long-tailed learning benchmark that examine long-tailed learning concerning three pivotal angles:

  1. (A1) the characterization of data long-tailedness: long-tailed data exhibits a highly skewed data distribution and an extensive number of categories;
  2. (A2) the data complexity of various domains: a wide range of complex domains may naturally encounter long-tailed distribution, e.g., tabular data, sequential data, grid data, and relational data; and
  3. (A3) the heterogeneity of emerging tasks: it highlights the need to consider the applicability and limitations of existing methods on heterogeneous tasks.
<div align="center"> <img src="https://s2.loli.net/2023/07/19/EHwPQAYdkhquIXS.png" width = "700" height = "366" /> </div>

We provide a fair and accessible performance evaluation of 13 state-of-the-art methods on multiple benchmark datasets across multiple tasks using accuracy-based and ranking-based evaluation metrics.

| Code Structure | Quick Start | Algorithms | Datasets | Example | Publications |


Code Structure

HeroLT
├── HeroLT
│   ├── configs                  # Customizable configurations 
│   ├── data                     # Datasets in HeroLT 
│   |   ├── ... ..
│   ├── nn       
│   |   ├── Dataloaders
│   |   ├── Datasets    
│   |   ├── layers              
│   |   ├── Models              
│   |   ├── Modules             
│   |   ├── Samplers
│   |   ├── Schedulers
│   |   ├── Wrappers            # Algorithms in HeroLT
│   |   ├── loss                # Loss functions in long-tailed learning 
│   |   ├── pecos
│   |   ├── xbert
│   ├── outputs                          
│   |   ├── ... ..
│   ├── tools                          
│   |   ├── ... ..
│   ├── utils                   # utility functions and classes      
│   |   ├── ... ..                 
├── examples                    # Examples of running the specific method
├── figs
└── README.md

Quick Start

We provide the following example for users to quickly implementing HerolT.

Step 1. Dependency

First of all, users need to clone the source code and install the required packages:

<!-- ```bash git clone https://github.com/SSSKJ/HeroLT/ cd HeroLT ``` -->

Step 2. Prepare datasets

To run an LT task, users should prepare a dataset. The DataZoo provided in HeroLT can help to automatically download and preprocess widely-used public datasets for various LT applications, including CV, NLP, graph learning, etc. Users can directly specify dataset = DATASET_NAMEin the configuration. For example,

GraphSMOTE('wiki', './HeroLT/')

Step 3. Prepare models

Then, users should specify the model architecture that will be trained. HeroLT provides a ModelZoo that contains the implementation of widely adopted model architectures for various LT tasks. Users can import MODEL_NAME to apply a specific model architecture in LT tasks. For example,

from HeroLT.nn.Wrappers import GraphSMOTE

Step 4. Start running

Here we demonstrate how to run a standard LT task with HeroLT, with setting dataset = 'wiki'and import GraphSMOTE to run GraphSMOTE for an node classification task on Cora_Full dataset. Users can customize training configurations, such as lr, in the configs/GraphSMOTE/config.yaml, and run a standard LT task as:

# Run with default configurations
from HeroLT.nn.Wrappers import GraphSMOTE
model = GraphSMOTE('wiki', './HeroLT/')
model.train()

Then you can observe some monitored metrics during the training process as:

============== seed:123 ==============
[seed 123][GraphSMOTE][Epoch 0][Val] ACC: 21.6, bACC: 17.9, Precision: 16.5, Recall: 16.8, mAP: 15.7|| [Test] ACC: 18.8, bACC: 13.3, Precision: 16.5, Recall: 13.3, mAP: 11.6
  [*Best Test Result*][Epoch 0] ACC: 18.8,  bACC: 13.3, Precision: 16.5, Recall: 13.3, mAP: 11.6
[seed 123][GraphSMOTE][Epoch 100][Val] ACC: 68.1, bACC: 60.5, Precision: 60.5, Recall: 60.5, mAP: 59.8|| [Test] ACC: 68.0, bACC: 54.1, Precision: 56.7, Recall: 54.1, mAP: 53.2
  [*Best Test Result*][Epoch 100] ACC: 68.0,  bACC: 54.1, Precision: 56.7, Recall: 54.1, mAP: 53.2
[seed 123][GraphSMOTE][Epoch 200][Val] ACC: 67.7, bACC: 59.0, Precision: 57.9, Recall: 59.0, mAP: 58.4|| [Test] ACC: 67.4, bACC: 54.1, Precision: 56.7, Recall: 54.1, mAP: 52.9
  [*Best Test Result*][Epoch 102] ACC: 67.8,  bACC: 54.0, Precision: 56.4, Recall: 54.0, mAP: 53.2
[seed 123][GraphSMOTE][Epoch 300][Val] ACC: 67.2, bACC: 58.6, Precision: 58.9, Recall: 58.6, mAP: 58.0|| [Test] ACC: 67.1, bACC: 53.6, Precision: 57.7, Recall: 53.6, mAP: 52.3
  [*Best Test Result*][Epoch 102] ACC: 67.8,  bACC: 54.0, Precision: 56.4, Recall: 54.0, mAP: 53.2
... ...

Algorithms

HeroLT includes 13 algorithms, as shown in the following Table.

AlgorithmVenueLong-tailednessTask
X-Transformer20KDDData imbalance, extreme # of categoriesMulti-label text classification
XR-Transformer21NeurIPSData imbalance, extreme # of categoriesMulti-label text classification
XR-Linear22KDDData imbalance, extreme # of categoriesMulti-label text classification
BBN20CVPRData imbalanceImage classification
BALMS20NeurIPSData imbalanceImage classification, Instance segmentation
OLTR19CVPRData imbalance, extreme # of categoriesImage classification
TDE20NeurIPSData imbalance, extreme # of categoriesImage classification
MiSLAS21CVPRData imbalanceImage classification
Decoupling20ICLRData imbalanceImage classification
GraphSMOTE21WSDMData imbalanceNode classification
ImGAGN21KDDData imbalanceNode classification
TailGNN21KDDData imbalance, extreme # of categoriesNode classification
LTE4G22CIKMData imbalance, extreme # of categoriesNode classification

Datasets

HeroLT includes 14 datasets, as shown in the following Table.

Data StatisticsLong-Tailedness
DatasetData# of CategoriesSize# of EdgesIFGiniPareto
EURLEX-4KSequential3,95615,499-1,0240.3423.968
AMAZONCat-13KSequential13,3301,186,239-355,2110.32720.000
Wiki10-31KSequential30,93814,146-11,4110.3124.115
ImageNet-LTGrid1,000115,846-2560.5171.339
Places-LTGrid36562,500-9960.6102.387
iNatural 2018Grid8,142437,513-5000.6471.658
CIFAR 10-LT (100)Grid1012,406-1000.6171.751
CIFAR 10-LT (50)Grid1013,996-500.5931.751
CIFAR 10-LT (10)Grid1020,431-100.5200.833
CIFAR 100-LT (100)Grid10010,847-1000.4981.972
CIFAR 100-LT (50)Grid10012,608-500.4881.590
CIFAR 100-LT (10)Grid10019,573-100.4470.836
LVIS v0.5Grid1,231693,958-26,1480.3816.250
Cora-FullRelational7019,793146,635620.3210.919
WikiRelational172,40525,597450.4141.000
EmailRelational421,00525,9341090.4131.263
Amazon-ClothingRelational7724,919208,279100.3430.814
Amazon-EletronicsRelational16742,318129,43090.3290.600

Example

Here, we present a motivative application of the recommendation system, which naturally exhibits long-tailed data distributions coupled with data complexity [2] (e.g., tabular data and relational data) and task heterogeneity (e.g., user profiling [1] and recommendation [2]).

<div align="center"> <img src="https://s2.loli.net/2023/08/22/ubi3YA6O7WLqDNe.png" width = "350" height = "300" /> </div>

[1] E. Purificato, L. Boratto, and E. W. De Luca, “Do graph neural networks build fair user models? assessing disparate impact and mistreatment in behavioural user profiling”. CIKM 2022.

[2] F. Liu, Z. Cheng, L. Zhu, C. Liu, and L. Nie, “An attribute-aware attentive GCN model for attribute missing in recommendation”. IEEE Transactions on Knowledge and Data Engineering 2022.

Publications

<!-- If you find HeroLT useful for your research or development, please cite the following <a href="https://arxiv.org/" target="_blank">paper</a>: ``` @article{heroLT, title = {HeroLT: Benchmarking Heterogeneous Long-Tailed Learning}, author = {Wang, Haohui and Guan, Weijie and Chen, Jianpeng and Wang, Zi and Zhou, Dawei}, } ``` -->