Home

Awesome

<div align="center"> <img src="./docs/pygdebias.png" width = "600" height = "200" alt="pygdebias" align=center /> </div>

GitHub Repo stars GitHub forks GitHub watchers GitHub code size in bytes GitHub repo file count (file type) GitHub repo size GitHub milestones GitHub release (release name instead of tag name) GitHub all releases GitHub issues GitHub closed issues GitHub


PyGDebias: Attributed Network Datasets and Fairness-Aware Graph Mining Algorithms

Graph mining algorithms have been playing a critical role in a plethora of areas. However, most of the existing ones lack fairness consideration. Consequently, they may deliver biased predictions toward certain demographic subgroups or individuals. To better understand existing debiasing techniques and facilitate the deployment of fairness-aware graph mining algorithms, we developed this library PyGDebias featured for built-in datasets and implementations of popular fairness-aware graph mining algorithms for the study of algorithmic fairness on graphs.

Specifically, this open-source library PyGDebias aims to provide a systematic schema to load datasets and compare different debiasing techniques for graph learning algorithms. Specifically, 26 graph datasets (including 24 commonly used ones and two newly constructed ones, AMiner-L and AMiner-S) are collected, and 13 algorithms are implemented in this library.

1. Citation

Our survey paper "Fairness in Graph Mining: A Survey" has been accepted by TKDE and released on arxiv [PDF]. If you find PyGDebias helpful, we would appreciate citations to the following paper:

@article{dong2023fairness,
  title={Fairness in graph mining: A survey},
  author={Dong, Yushun and Ma, Jing and Wang, Song and Chen, Chen and Li, Jundong},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2023},
  publisher={IEEE}
}

or:

Dong, Y., Ma, J., Wang, S., Chen, C., & Li, J. (2023). Fairness in graph mining: A survey. IEEE Transactions on Knowledge and Data Engineering.

2. API Cheatsheet

We summarize the basic API of the implemented graph mining algorithms as below.

3. Installations

Here, we provide guidelines for setting up the library. There are basically 2 ways to install it

3.1 Manually

# Set up the environment
conda create -n pygdebias python=3.9
conda activate pygdebias

# Installation
git clone https://github.com/yushundong/PyGDebias.git
pip install torch==1.12.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
pip install PyGDebias/ -f https://data.pyg.org/whl/torch-1.12.0%2Bcu116.html -f https://download.pytorch.org/whl/torch_stable.html  -f https://data.dgl.ai/wheels/cu116/repo.html -f https://data.dgl.ai/wheels-test/repo.html

3.2 pip

# create env
conda create -n pygdebias python=3.9
conda activate pygdebias

# install torch
pip install torch==1.12.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
# You can also choose conda to install torch through the following command
# conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge

# install pygdebias
pip install pygdebias==1.1.1 -f https://data.pyg.org/whl/torch-1.12.0%2Bcu116.html -f https://download.pytorch.org/whl/torch_stable.html  -f https://data.dgl.ai/wheels/cu116/repo.html -f https://data.dgl.ai/wheels-test/repo.html

4. Usage & Examples

from pygdebias.debiasing import GUIDE
from pygdebias.datasets import Nba
# Available choices: 'Credit', 'German', 'Facebook', 'Pokec_z', 'Pokec_n', 'Nba', 'Twitter', 'Google', 'LCC', 'LCC_small', 'Cora', 'Citeseer', 'Amazon', 'Yelp', 'Epinion', 'Ciao', 'Dblp', 'Filmtrust', 'Lastfm', 'Ml-100k', 'Ml-1m', 'Ml-20m', 'Oklahoma', 'UNC', 'Bail'.

nba = Nba()
adj, features, idx_train, idx_val, idx_test, labels, sens = nba.adj(), nba.features(), nba.idx_train(), nba.idx_val(), nba.idx_test(), nba.labels(), nba.sens()

# Initiate the model (with default parameters).
model = GUIDE()

# Train the model.
model.fit(adj, features, idx_train, idx_val, idx_test, labels, sens)

# Evaluate the model.
model.predict()

5. Collected Datasets

26 graph datasets are collected in this library, including 24 commonly used ones and two newly constructed ones (Aminer-L and Aminer-S). We provide their descriptions as follows.

We provide their statistics as follows.

#Nodes#Edges#Features
Facebook1,04553,498573
Pokec_z67,796882,765276
Pokec_n66,569729,129265
NBA40316,57095
German1,00024,97027
Credit30,000200,52613
Recidivism18,876403,97718
Google+290,4683,6012,532
AMiner-L129,726591,0395,694
AMiner-S39,42452,4605,694
Cora2,7084,7511,433
Citeseer3,3124,1943,703
Pubmed19,71788,648500
Amazon2,549 (item) 2 (genre)2,549N/A
Yelp2,834 (item) 2 (genre)2,834N/A
Ciao7,375 (user) 106,797 (product)57,544N/A
DBLP22,166 (user) 296,277 (product)355,813N/A
Filmtrust1,508 (user) 2,071 (item)35,497N/A
Lastfm1,892 (customer) 17,632 (producer)92,800N/A
ML100k943 (user) 1,682 (item)100,0004
ML1m6,040 (user) 3,952 (item)1,000,2094
ML20m138,493 (user) 27,278 (item)20,000,263N/A
Oklahoma3,11173,230N/A
UNC4,01865,287N/A

6. Collected Algorithms

13 different methods in total are implemented in this library. We provide an overview of their characteristics as follows.

MethodsDebiasing TechniqueFairness NotionsPaper & Code
FairGNN [2]Adversarial LearningGroup Fairness[Paper] [Code]
EDITS [3]Edge RewiringGroup Fairness[Paper] [Code]
FairWalk [4]RebalancingGroup Fairness[Paper] [Code]
CrossWalk [5]RebalancingGroup Fairness[Paper] [Code]
UGE [6]Edge RewiringGroup Fairness[Paper] [Code]
FairVGNN [7]Adversarial LearningGroup Fairness[Paper] [Code]
FairEdit [8]Edge RewiringGroup Fairness[Paper] [Code]
NIFTY [9]Optimization with RegularizationGroup/Counterfactual Fairness[Paper] [Code]
GEAR [10]Edge RewiringGroup/Counterfactual Fairness[Paper] [Code]
InFoRM [11]Optimization with RegularizationIndividual Fairness[Paper] [Code]
REDRESS [12]Optimization with RegularizationIndividual Fairness[Paper] [Code]
GUIDE [13]Optimization with RegularizationIndividual Fairness[Paper] [Code]
RawlsGCN [14]RebalancingDegree-Related Fairness[Paper] [Code]

7. Performance Leaderboards

We summarize the performances of the implemented 13 graph mining algorithms/frameworks by fairness notions, including group fairness, individual fairness, counterfactual fairness, and degree-related fairness.

7.1 Group Fairness

7.1.1 GNN-based ones:

We present the evaluation results of both utility (including AUCROC, F1 score, and accuracy) and fairness (including $\Delta_{SP}$ and $\Delta_{EO}$) on Credit and Recidivism below.

CreditCreditCreditCreditCreditRecidivismRecidivismRecidivismRecidivismRecidivism
AUCROCF1Acc$\Delta_{SP}$$\Delta_{EO}$AUCROCF1Acc$\Delta_{SP}$$\Delta_{EO}$
GCN_Vanilla0.722±0.0050.876±0.0020.784±0.0040.160±0.1100.108±0.0730.865±0.0200.672±0.0310.804±0.0140.094±0.0120.130±0.023
FairGNN0.729±0.0040.879±0.0020.790±0.0040.022±0.0070.020±0.0050.929±0.0020.820±0.0040.880±0.0020.073±0.0060.046±0.006
NIFTY0.717±0.0000.876±0.0000.782±0.0010.019±0.0190.014±0.0100.841±0.0060.403±0.0080.716±0.0020.017±0.0010.006±0.003
EDITS0.667±0.0120.876±0.0000.779±0.0000.005±0.0020.004±0.0020.847±0.0260.607±0.0580.773±0.0290.026±0.0120.013±0.001
FairVGNN0.654±0.0340.865±0.0180.772±0.0150.030±0.0400.023±0.0320.826±0.0090.722±0.0150.796±0.0200.090±0.0090.089±0.014
FairEdit0.666±0.0670.809±0.0320.716±0.0320.107±0.0460.098±0.0450.884±0.0070.798±0.0090.852±0.0070.070±0.0020.044±0.002
7.1.2 Non-GNN-based ones:

We present the evaluation results of fairness (including classification accuracy on the learned embeddings) on Pokec_z and Pokec_n below.

Classification Acc of Gender on Pokec_zClassification Acc of Gender on Pokec_n
FairWalk0.4962 ± 0.00310.5016 ± 0.0036
CrossWalk0.4943 ± 0.00070.5047 ± 0.0036
UGE-C0.4908 ± 0.000080.5002 ± 0.0001

7.2 Counterfactual Fairness

We present the evaluation results of both utility (including AUCROC, F1 score, and accuracy) and fairness (including $\Delta_{SP}$, $\Delta_{EO}$, $\delta_{CF}$, and $R^2$) on Credit and Recidivism below.

CreditCreditCreditCreditCreditCreditCreditRecidivismRecidivismRecidivismRecidivismRecidivismRecidivismRecidivism
AUCROCF1Acc$\Delta_{SP}$$\Delta_{EO}$$\delta_{CF}$$R^2$AUCROCF1Acc$\Delta_{SP}$$\Delta_{EO}$$\delta_{CF}$$R^2$
GCN_Vanilla0.684 ± 0.0190.794± 0.0270.698± 0.0280.108± 0.0310.087± 0.0350.042± 0.0290.022± 0.0140.885 ± 0.0180.782 ± 0.0230.838 ± 0.0170.075 ± 0.0140.023 ± 0.0190.132 ± 0.0590.075 ± 0.028
NIFTY-GCN0.685 ± 0.0070.792± 0.0070.697± 0.0070.106± 0.0210.097± 0.0240.004± 0.0040.017± 0.0030.799 ± 0.0510.669 ± 0.0500.752 ± 0.0650.036 ± 0.0220.019 ± 0.0150.031 ± 0.0170.025 ± 0.018
GEAR-GCN0.740 ± 0.0080.835± 0.0080.755± 0.0110.104± 0.0130.086± 0.0180.001± 0.0010.010± 0.0030.896 ± 0.0160.800 ± 0.0310.852 ± 0.0260.058 ± 0.0170.019 ± 0.0230.003 ± 0.0020.038 ± 0.012

7.3 Individual Fairness

We present the evaluation results of both utility (including AUCROC) and fairness (including IF, GDIF, and Ranking-based IF) on Credit and Recidivism below.

CreditCreditCreditCreditRecidivismRecidivismRecidivismRecidivism
AUCROCIF (in $10^3$)GDIFRanking-based IFAUCIF (in $10^3$)GDIFRanking-Based IF
GCN0.666±0.023190.408±135.4121.628±0.5960.611±0.0030.962±0.0011.32e6±6.19e51.12±0.0010.965±0.001
InFoRM0.645±0.01023.560±8.7602.422±0.1430.621±0.0140.651±0.04311.606±4.281.106±0.1330.971±0.001
REDRESS0.570±0.0058.827e11±1.95e113.048±0.3010.913±0.0050.687±0.0674.0393e12±6.84e121.816±0.5940.980±0.002
GUIDE0.632±0.0080.337±0.0631.003±0.0020.665±0.0100.797±0.0249.629±0.8311.006±0.0030.972±0.001

7.4 Degree-Related Fairness

We present the evaluation results of both utility (including accuracy) and fairness (including bias according to Rawlsian difference principle) on Amazon-Photo dataset below.

AccuracyBias (according to Rawlsian difference principle)
GCN0.8262 ±0.00900.5033±0.1552
RawlsGCN0.8708 ± 0.01340.0782±0.0071

Folder Structure

.
├── LICENSE
├── MANIFEST.in
├── README.md
├── docs
├── pygdebias
│ ├── init.py
│ ├── datasets
│ ├── debiasing
│ └── metrics
├── requirements.txt
├── setup.cfg
└── setup.py

How to Contribute

You are welcome to become part of this project. See contribute guide for more information.

Authors & Acknowledgements

Yushun Dong, Song Wang, Zaiyi Zheng, Zhenyu Lei, Alex Jing Huang, Jing Ma, Chen Chen, Jundong Li

We extend our heartfelt appreciation to everyone who has contributed to and will contribute to this work.

10. Contact

Reach out to us by submitting an issue report or sending an email to yd6eb@virginia.edu.

11. References

[1] Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008, August). Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 990-998).

[2] Dai, E., & Wang, S. (2021, March). Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (pp. 680-688).

[3] Dong, Y., Liu, N., Jalaian, B., & Li, J. (2022, April). Edits: Modeling and mitigating data bias for graph neural networks. In Proceedings of the ACM Web Conference 2022 (pp. 1259-1269).

[4] Rahman, T., Surma, B., Backes, M., & Zhang, Y. (2019). Fairwalk: Towards fair graph embedding. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2022 (pp. 3289-3295).

[5] Khajehnejad, A., Khajehnejad, M., Babaei, M., Gummadi, K. P., Weller, A., & Mirzasoleiman, B. (2022, June). CrossWalk: fairness-enhanced node representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 11, pp. 11963-11970).

[6] Wang, N., Lin, L., Li, J., & Wang, H. (2022, April). Unbiased graph embedding with biased graph observations. In Proceedings of the ACM Web Conference 2022 (pp. 1423-1433).

[7] Wang, Y., Zhao, Y., Dong, Y., Chen, H., Li, J., & Derr, T. (2022). Improving Fairness in Graph Neural Networks via Mitigating Sensitive Attribute Leakage. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 1938–1948).

[8] Loveland, D., Pan, J., Bhathena, A. F., & Lu, Y. (2022). FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing. arXiv preprint arXiv:2201.03681.

[9] Agarwal, C., Lakkaraju, H., & Zitnik, M. (2021, December). Towards a unified framework for fair and stable graph representation learning. In Uncertainty in Artificial Intelligence (pp. 2114-2124). PMLR.

[10] Ma, J., Guo, R., Wan, M., Yang, L., Zhang, A., & Li, J. (2022, February). Learning fair node representations with graph counterfactual fairness. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (pp. 695-703).

[11] Kang, J., He, J., Maciejewski, R., & Tong, H. (2020, August). Inform: Individual fairness on graph mining. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 379-389).

[12] Dong, Y., Kang, J., Tong, H., & Li, J. (2021, August). Individual fairness for graph neural networks: A ranking based approach. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (pp. 300-310).

[13] Song, W., Dong, Y., Liu, N., & Li, J. (2022, August). GUIDE: Group Equality Informed Individual Fairness in Graph Neural Networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 1625-1634).

[14] Kang, J., Zhu, Y., Xia, Y., Luo, J., & Tong, H. (2022, April). Rawlsgcn: Towards rawlsian difference principle on graph convolutional network. In Proceedings of the ACM Web Conference 2022 (pp. 1214-1225).