Home

Awesome

<p align="center"> <img src=".github/images/logo.svg" alt="Image"/> </p> <!-- <p align="center"> <a href="https://github.com/KarhouTam/FL-bench/blob/master/LICENSE"> <img alt="GitHub License" src="https://img.shields.io/github/license/KarhouTam/FL-bench?style=for-the-badge&logo=github&color=8386e0"/> </a> <a href="https://github.com/KarhouTam/FL-bench/issues?q=is%3Aissue+is%3Aclosed"> <img alt="GitHub closed issues" src="https://img.shields.io/github/issues-closed-raw/KarhouTam/FL-bench?style=for-the-badge&logo=github&color=8386e0"> </a> <a href="https://github.com/KarhouTam/FL-bench/stargazers"> <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/KarhouTam/FL-bench?style=for-the-badge&logo=github&color=8386e0"> </a> <a href="https://github.com/KarhouTam/FL-bench/forks"> <img alt="GitHub Repo forks" src="https://img.shields.io/github/forks/KarhouTam/FL-bench?style=for-the-badge&logo=github&color=8386e0"> </a> </p> --> <h4 align="center"><i>

Benchmarking Federated Learning Methods.

Realizing Your Brilliant Ideas.

Having Fun with Federated Learning.

FL-bench welcomes PR on everything that can make this project better.

</i></h4>

<p align="center"> <a href=https://zhuanlan.zhihu.com/p/703576051>FL-bench 的简单介绍</a> </p>

Methods 🧬

<!-- <details> --> <summary><b>Traditional FL Methods</b></summary> <!-- </details> --> <!-- <details> --> <summary><b>Personalized FL Methods</b></summary> <!-- </details> --> <!-- <details> --> <summary><b>FL Domain Generalization Methods</b></summary> <!-- </details> -->

Environment Preparation 🧩

PyPI 🐍

pip install -r .env/requirements.txt

Poetry 🎶

For those China mainland users

poetry install --no-root -C .env

For others

cd .env && sed -i "10,14d" pyproject.toml && poetry lock --no-update && poetry install --no-root

Docker 🐳

For those China mainland users

docker pull registry.cn-hangzhou.aliyuncs.com/karhoutam/fl-bench:master

For others

docker pull ghcr.io/karhoutam/fl-bench:master

An example of building container

docker run -it --name fl-bench -v path/to/FL-bench:/root/FL-bench --privileged --gpus all ghcr.io/karhoutam/fl-bench:master

Easy Run 🏃‍♂️

ALL classes of methods are inherited from FedAvgServer and FedAvgClient. If you wanna figure out the entire workflow and detail of variable settings, go check src/server/fedavg.py and src/client/fedavg.py.

Step 1. Generate FL Dataset

Partition the MNIST according to Dir(0.1) for 100 clients

python generate_data.py -d mnist -a 0.1 -cn 100

About methods of generating federated dastaset, go check data/README.md for full details.

Step 2. Run Experiment

python main.py [--config-path, --config-name] [method=<METHOD_NAME> args...]

Such as running FedAvg with all defaults.

python main.py method=fedavg

Defaults are set in both config/defaults.yaml and src/utils/constants.py.

How To Customize FL method Arguments 🤖

⚠ For the same FL method argument, the priority of argument setting is CLI > Config file > Default value.

For example, the default value of fedprox.mu is 1,

# src/server/fedprox.py
class FedProxServer(FedAvgServer):

    @staticmethod
    def get_hyperparams(args_list=None) -> Namespace:
        parser = ArgumentParser()
        parser.add_argument("--mu", type=float, default=1.0)
        return parser.parse_args(args_list)

and your .yaml config file has

# config/your_config.yaml
...
fedprox:
  mu: 0.01
python main.py method=fedprox                                  # fedprox.mu = 1
python main.py --config-name your_config method=fedprox        # fedprox.mu = 0.01

Monitor 📈

FL-bench supports visdom and tensorboard.

Activate

👀 NOTE: You needs to launch visdom / tensorboard server by yourself.

# your_config.yaml
common:
  ...
  visible: tensorboard # options: [null, visdom, tensorboard]

Launch visdom / tensorboard Server

visdom
  1. Run python -m visdom.server on terminal.
  2. Go check localhost:8097 on your browser.

tensorboard

  1. Run tensorboard --logdir=<your_log_dir> on terminal.
  2. Go check localhost:6006 on your browser.

Parallel Training via Ray 🚀

This feature can vastly improve your training efficiency. At the same time, this feature is user-friendly and easy to use!!!

Activate (What You ONLY Need To Do)

# your_config.yaml
mode: parallel
parallel:
  num_workers: 2 # any positive integer that larger than 1
  ...
...

Manually Create Ray Cluster (Optional)

A Ray cluster would be created implicitly everytime you run experiment in parallel mode. Or you can create it manually by the command shown below to avoid creating and destroying cluster every time you run experiment.

ray start --head [OPTIONS]

👀 NOTE: You need to keep num_cpus: null and num_gpus: null in your config file for connecting to a existing Ray cluster.

# your_config_file.yaml
# Connect to an existing Ray cluster in localhost.
mode: parallel
parallel:
  ...
  num_gpus: null
  num_cpus: null
...

Common Arguments 🔧

All common arguments have their default value. Go check DEFAULT_COMMON_ARGS in src/utils/constants.py for full details of common arguments.

⚠ Common arguments cannot be set via CLI.

You can also write your own .yaml config file. I offer you a template in config and recommend you to save your config files there also.

One example: python main.py fedavg config/template.yaml [cli_method_args...]

About the default values of specific FL method arguments, go check corresponding FL-bench/src/server/<method>.py for the full details.

ArgumentsTypeDescription
--config-pathstrThe directory of config files. Defaults to config, means ./config.
--config-namestrThe name of config file (w/o the .yaml extension). Defaults to defaults, which points to config/defaults.yaml.
datasetstrThe name of dataset that experiment run on.
modelstrThe model backbone experiment used.
seedintRandom seed for running experiment.
join_ratiofloatRatio for (client each round) / (client num in total).
global_epochintGlobal epoch, also called communication round.
local_epochintLocal epoch for client local training.
finetune_epochintEpoch for clients fine-tunning their models before test.
buffersstrHow to deal with parameter buffers (in model.buffers()) of each client model. Options: [local, global, drop]. local (default): clients' buffers are isolated; global: buffers will be aggregated like other model parameters; drop: clients will drop their buffers after training done.
test_intervalintInterval round of performing test on clients.
eval_testbooltrue for performing evaluation on joined clients' testset before and after local training.
eval_valbooltrue for performing evaluation on joined clients' valset before and after local training.
eval_trainbooltrue for performing evaluation on joined clients' trainset before and after local training.
optimizerdictClient-side optimizer. Argument request is the same as Optimizers in torch.optim.
lr_schedulerdictClient-side learning rate scheduler. Argument request is the same as schedulers in torch.optim.lr_scheduler.
verbose_gapintInterval round of displaying clients training performance on terminal.
batch_sizeintData batch size for client local training.
use_cudabooltrue indicates that tensors are in gpu.
visibleboolOptions: [null, visdom, tensorboard]
straggler_ratiofloatThe ratio of stragglers (set in [0, 1]). Stragglers would not perform full-epoch local training as normal clients. Their local epoch would be randomly selected from range [straggler_min_local_epoch, local_epoch).
straggler_min_local_epochintThe minimum value of local epoch for stragglers.
external_model_params_filestrThe model parameters .pt file relative path to the root of FL-bench. ⚠ This feature is enabled only when unique_model=False, which is pre-defined by each FL method.
save_logbooltrue for saving algorithm running log in out/<method>/<start_time>.
save_modelbooltrue for saving output model(s) parameters in out/<method>/<start_time>.pt`.
save_figbooltrue for saving the accuracy curves showed on Visdom into a .pdf file at out/<method>/<start_time>.
save_metricsbooltrue for saving metrics stats into a .csv file at out/<method>/<start_time>.
delete_useless_runbooltrue for deleting output files after user press Ctrl + C, which indicates that the run is removable.

Parallel Training Arguments 👯‍♂️

ArgumentsTypeDescription
num_workersintThe number of parallel workers. Need to be set as an integer that larger than 1.
ray_cluster_addrstrThe IP address of the selected ray cluster. Default as null, which means if there is no existing ray cluster, ray will build a new cluster everytime you run the experiment and destroy it at the end. More details can be found in the official docs.
num_cpus and num_gpusintThe amount of computational resources you allocate for your Ray cluster. Default as null for all.

Models 🤖

This benchmark supports bunch of models that common and integrated in Torchvision (check here for all):

🤗 You can define your own custom model by filling the CustomModel class in src/utils/models.py and use it by defining model: custom in your .yaml config file.

Datasets and Partition Strategies 🎨

Regular Image Datasets

Domain Generalization Image Datasets

Medical Image Datasets

Customization Tips 💡

Implementing FL Method

The package() at server-side class is used for assembling all parameters server need to send to clients. Similarly, package() at client-side class is for parameters clients need to send back to server. You should always has super().package() in your override implementation.

class YourServer(FedBNServer):
  ...

class YourClient(FedBNClient):
  ...

You can find all details in FedAvgClient and FedAvgServer, which are the bases of all implementations in FL-bench.

Integrating Dataset

Customizing Model

Citation 🧐

@software{Tan_FL-bench,
  author = {Tan, Jiahao and Wang, Xinpeng},
  license = {MIT},
  title = {{FL-bench: A federated learning benchmark for solving image classification tasks}},
  url = {https://github.com/KarhouTam/FL-bench}
}

@misc{tan2023pfedsim,
  title={pFedSim: Similarity-Aware Model Aggregation Towards Personalized Federated Learning}, 
  author={Jiahao Tan and Yipeng Zhou and Gang Liu and Jessie Hui Wang and Shui Yu},
  year={2023},
  eprint={2305.15706},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}