

<div align="center"> <img src="./docs/pygdebias.png" width = "600" height = "200" alt="pygdebias" align=center /> </div>

PyGDebias: Attributed Network Datasets and Fairness-Aware Graph Mining Algorithms

Graph mining algorithms have been playing a critical role in a plethora of areas. However, most of the existing ones lack fairness consideration. Consequently, they may deliver biased predictions toward certain demographic subgroups or individuals. To better understand existing debiasing techniques and facilitate the deployment of fairness-aware graph mining algorithms, we developed this library PyGDebias featured for built-in datasets and implementations of popular fairness-aware graph mining algorithms for the study of algorithmic fairness on graphs.

Specifically, this open-source library PyGDebias aims to provide a systematic schema to load datasets and compare different debiasing techniques for graph learning algorithms. Specifically, 26 graph datasets (including 24 commonly used ones and two newly constructed ones, AMiner-L and AMiner-S) are collected, and 13 algorithms are implemented in this library.

1. Citation

Our survey paper "Fairness in Graph Mining: A Survey" has been accepted by TKDE and released on arxiv [PDF]. If you find PyGDebias helpful, we would appreciate citations to the following paper:

  title={Fairness in graph mining: A survey},
  author={Dong, Yushun and Ma, Jing and Wang, Song and Chen, Chen and Li, Jundong},
  journal={IEEE Transactions on Knowledge and Data Engineering},


Dong, Y., Ma, J., Wang, S., Chen, C., & Li, J. (2023). Fairness in graph mining: A survey. IEEE Transactions on Knowledge and Data Engineering.

2. API Cheatsheet

We summarize the basic API of the implemented graph mining algorithms as below.

3. Installations

Here, we provide guidelines for setting up the library. There are basically 2 ways to install it

3.1 Manually

# Set up the environment
conda create -n pygdebias python=3.9
conda activate pygdebias

# Installation
git clone https://github.com/yushundong/PyGDebias.git
pip install torch==1.12.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
pip install PyGDebias/ -f https://data.pyg.org/whl/torch-1.12.0%2Bcu116.html -f https://download.pytorch.org/whl/torch_stable.html  -f https://data.dgl.ai/wheels/cu116/repo.html -f https://data.dgl.ai/wheels-test/repo.html

3.2 pip

# create env
conda create -n pygdebias python=3.9
conda activate pygdebias

# install torch
pip install torch==1.12.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
# You can also choose conda to install torch through the following command
# conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge

# install pygdebias
pip install pygdebias==1.1.1 -f https://data.pyg.org/whl/torch-1.12.0%2Bcu116.html -f https://download.pytorch.org/whl/torch_stable.html  -f https://data.dgl.ai/wheels/cu116/repo.html -f https://data.dgl.ai/wheels-test/repo.html

4. Usage & Examples

from pygdebias.debiasing import GUIDE
from pygdebias.datasets import Nba
# Available choices: 'Credit', 'German', 'Facebook', 'Pokec_z', 'Pokec_n', 'Nba', 'Twitter', 'Google', 'LCC', 'LCC_small', 'Cora', 'Citeseer', 'Amazon', 'Yelp', 'Epinion', 'Ciao', 'Dblp', 'Filmtrust', 'Lastfm', 'Ml-100k', 'Ml-1m', 'Ml-20m', 'Oklahoma', 'UNC', 'Bail'.

nba = Nba()
adj, features, idx_train, idx_val, idx_test, labels, sens = nba.adj(), nba.features(), nba.idx_train(), nba.idx_val(), nba.idx_test(), nba.labels(), nba.sens()

# Initiate the model (with default parameters).
model = GUIDE()

# Train the model.
model.fit(adj, features, idx_train, idx_val, idx_test, labels, sens)

# Evaluate the model.

5. Collected Datasets

26 graph datasets are collected in this library, including 24 commonly used ones and two newly constructed ones (Aminer-L and Aminer-S). We provide their descriptions as follows.

We provide their statistics as follows.

Amazon2,549 (item) 2 (genre)2,549N/A
Yelp2,834 (item) 2 (genre)2,834N/A
Ciao7,375 (user) 106,797 (product)57,544N/A
DBLP22,166 (user) 296,277 (product)355,813N/A
Filmtrust1,508 (user) 2,071 (item)35,497N/A
Lastfm1,892 (customer) 17,632 (producer)92,800N/A
ML100k943 (user) 1,682 (item)100,0004
ML1m6,040 (user) 3,952 (item)1,000,2094
ML20m138,493 (user) 27,278 (item)20,000,263N/A

6. Collected Algorithms

13 different methods in total are implemented in this library. We provide an overview of their characteristics as follows.

MethodsDebiasing TechniqueFairness NotionsPaper & Code
FairGNN [2]Adversarial LearningGroup Fairness[Paper] [Code]
EDITS [3]Edge RewiringGroup Fairness[Paper] [Code]
FairWalk [4]RebalancingGroup Fairness[Paper] [Code]
CrossWalk [5]RebalancingGroup Fairness[Paper] [Code]
UGE [6]Edge RewiringGroup Fairness[Paper] [Code]
FairVGNN [7]Adversarial LearningGroup Fairness[Paper] [Code]
FairEdit [8]Edge RewiringGroup Fairness[Paper] [Code]
NIFTY [9]Optimization with RegularizationGroup/Counterfactual Fairness[Paper] [Code]
GEAR [10]Edge RewiringGroup/Counterfactual Fairness[Paper] [Code]
InFoRM [11]Optimization with RegularizationIndividual Fairness[Paper] [Code]
REDRESS [12]Optimization with RegularizationIndividual Fairness[Paper] [Code]
GUIDE [13]Optimization with RegularizationIndividual Fairness[Paper] [Code]
RawlsGCN [14]RebalancingDegree-Related Fairness[Paper] [Code]

7. Performance Leaderboards

We summarize the performances of the implemented 13 graph mining algorithms/frameworks by fairness notions, including group fairness, individual fairness, counterfactual fairness, and degree-related fairness.

7.1 Group Fairness

7.1.1 GNN-based ones:

We present the evaluation results of both utility (including AUCROC, F1 score, and accuracy) and fairness (including $\Delta_{SP}$ and $\Delta_{EO}$) on Credit and Recidivism below.

7.1.2 Non-GNN-based ones:

We present the evaluation results of fairness (including classification accuracy on the learned embeddings) on Pokec_z and Pokec_n below.

Classification Acc of Gender on Pokec_zClassification Acc of Gender on Pokec_n
FairWalk0.4962 ± 0.00310.5016 ± 0.0036
CrossWalk0.4943 ± 0.00070.5047 ± 0.0036
UGE-C0.4908 ± 0.000080.5002 ± 0.0001

7.2 Counterfactual Fairness

We present the evaluation results of both utility (including AUCROC, F1 score, and accuracy) and fairness (including $\Delta_{SP}$, $\Delta_{EO}$, $\delta_{CF}$, and $R^2$) on Credit and Recidivism below.

GCN_Vanilla0.684 ± 0.0190.794± 0.0270.698± 0.0280.108± 0.0310.087± 0.0350.042± 0.0290.022± 0.0140.885 ± 0.0180.782 ± 0.0230.838 ± 0.0170.075 ± 0.0140.023 ± 0.0190.132 ± 0.0590.075 ± 0.028
NIFTY-GCN0.685 ± 0.0070.792± 0.0070.697± 0.0070.106± 0.0210.097± 0.0240.004± 0.0040.017± 0.0030.799 ± 0.0510.669 ± 0.0500.752 ± 0.0650.036 ± 0.0220.019 ± 0.0150.031 ± 0.0170.025 ± 0.018
GEAR-GCN0.740 ± 0.0080.835± 0.0080.755± 0.0110.104± 0.0130.086± 0.0180.001± 0.0010.010± 0.0030.896 ± 0.0160.800 ± 0.0310.852 ± 0.0260.058 ± 0.0170.019 ± 0.0230.003 ± 0.0020.038 ± 0.012

7.3 Individual Fairness

We present the evaluation results of both utility (including AUCROC) and fairness (including IF, GDIF, and Ranking-based IF) on Credit and Recidivism below.

AUCROCIF (in $10^3$)GDIFRanking-based IFAUCIF (in $10^3$)GDIFRanking-Based IF

7.4 Degree-Related Fairness

We present the evaluation results of both utility (including accuracy) and fairness (including bias according to Rawlsian difference principle) on Amazon-Photo dataset below.

AccuracyBias (according to Rawlsian difference principle)
GCN0.8262 ±0.00900.5033±0.1552
RawlsGCN0.8708 ± 0.01340.0782±0.0071

Folder Structure

├── README.md
├── docs
├── pygdebias
│ ├── init.py
│ ├── datasets
│ ├── debiasing
│ └── metrics
├── requirements.txt
├── setup.cfg
└── setup.py

How to Contribute

You are welcome to become part of this project. See contribute guide for more information.

Authors & Acknowledgements

Yushun Dong, Song Wang, Zaiyi Zheng, Zhenyu Lei, Alex Jing Huang, Jing Ma, Chen Chen, Jundong Li

We extend our heartfelt appreciation to everyone who has contributed to and will contribute to this work.

10. Contact

Reach out to us by submitting an issue report or sending an email to yd6eb@virginia.edu.

11. References

