Awesome

HtFLlib: Heterogeneous Federated Learning Library

👏 We will change the license to Apache-2.0 in the next release.

Standard federated learning, e.g., FedAvg, assumes that all the participating clients build their local models with the same architecture, which limits its utility in real-world scenarios. In practice, clients can build their models with heterogeneous model architectures for specific local tasks. When faced with data heterogeneity, model heterogeneity, communication overhead, and intellectual property (IP) protection, Heterogeneous Federated Learning (HtFL) emerges.

9 data-free HtFL algorithms and 21 heterogeneous model architectures.
PFLlib compatible.

🎯If you find our repository useful, please cite the corresponding paper:

@article{zhang2023pfllib,
  title={PFLlib: Personalized Federated Learning Algorithm Library},
  author={Zhang, Jianqing and Liu, Yang and Hua, Yang and Wang, Hao and Song, Tao and Xue, Zhengui and Ma, Ruhui and Cao, Jian},
  journal={arXiv preprint arXiv:2312.04992},
  year={2023}
}

Environments

Install CUDA v11.6.

Install conda latest and activate conda.

conda env create -f env_cuda_latest.yaml # You may need to downgrade the torch using pip to match the CUDA version

Scenarios and datasets

Here, we only show the MNIST dataset in the label skew scenario generated via Dirichlet distribution for example. Please refer to my other repository PFLlib for more help.

You can also modify codes in PFLlib to support model heterogeneity scenarios, but it requires much effort. In this repository, you only need to configure system/main.py to support model heterogeneity scenarios.

Note: you may need to manually clean checkpoint files in the temp/ folder via system/clean_temp_files.py if your program crashes accidentally. You can also set a checkpoint folder by yourself to prevent automatic deletion using the -sfn argument in the command line.

Data-free algorithms with code (updating)

Here, "data-free" refers to the absence of any additional dataset beyond the clients' private data. We only consider data-free algorithms here, as they have fewer restrictions and assumptions, making them more valuable and easily extendable to other scenarios, such as the existence of public server data.

Local — Each client trains its model locally without federation.
FedDistill (FD) — Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data 2018
FML — Federated Mutual Learning 2020
LG-FedAvg — Think Locally, Act Globally: Federated Learning with Local and Global Representations 2020
FedGen — Data-Free Knowledge Distillation for Heterogeneous Federated Learning ICML 2021
FedProto — FedProto: Federated Prototype Learning across Heterogeneous Clients AAAI 2022
FedKD — Communication-efficient federated learning via knowledge distillation Nature Communications 2022
FedGH — FedGH: Heterogeneous Federated Learning with Generalized Global Header ACM MM 2023
FedTGP — FedTGP: Trainable Global Prototypes with Adaptive-Margin-Enhanced Contrastive Learning for Data and Model Heterogeneity in Federated Learning AAAI 2024
FedKTL — An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning CVPR 2024 (Note: FedKTL requires pre-trained generators to run, please refer to its project page for download links.)

Experimental results

You can run total.sh with pre-tuned hyperparameters to obtain some results, like

cd ./system
sh total.sh

Or you can find some results in our accepted FL paper (i.e., FedTGP and FedKTL). Please note that this developing project may not be able to reproduce the results on these papers, since some basic settings may change due to the requests of the community.