Home

Awesome

This code can be used to reproduce the experimental results of Accordion.

Overview

Accordion borrows inspiration from the idea of critical regimes presented by Achille et al. to improve communication efficiency of existing gradient compression methods. Accordion uses rate of change of gradient norm to detect Accordion keeps low compression values in critical regimes and high compression otherwise.

The basic of structure of code

Running the code

python main.py --model-type CNN --auto-switch --norm-file
  "cifar10_training.log" --start-k --k-start 2 --distributed --master-ip
"tcp://127.0.0.1:9998" --num-nodes 2 --rank 0

Run the same command again but replace --rank 0 with --rank 1 uptil

To reproduce for example our Cifar10, ResNet-18 example run the code with 4 nodes using the following command.

python main.py --model-type CNN --fixed-sched --norm-file "res18_psgd_k_1.log"
  --start-k --k-start 1 --distributed --master-ip "master_ip" 
--num-nodes 4 --rank 0

Repeat the same command on 4 different nodes but replace --rank 0 with 1, 2 and 3 on each node. Similarly to get result for PowerSGD Rank 2 run-

python main.py --model-type CNN --fixed-sched --norm-file "res18_psgd_k_1.log"
  --start-k --k-start 2 --distributed --master-ip "master_ip"
--num-nodes 4 --rank 0

To get the results for Accordion run the following command

python main.py --model-type CNN --auto-switch --norm-file
  "res18_psgd_accordion.log" --start-k --k-start 2 --distributed --master-ip "master_ip"
--num-nodes 4 --rank 0

Repeat the same command on 4 different nodes but replace --rank 0 with 1, 2 and 3 on each node.

For easy reproducibility of the experiments in the paper we provide the following bash script. To reproduce Table 2 you can run ./get_table_2.sh master_ip rank on four different nodes providing where master_ip is of the rank 0 node and ranks range from 0 to 3

To run more experiments users can either add more configuration dictionaries as present at the top of the main.py or choose to modify existing ones.

Acknowledgement

The code borrows a lot of structure from code for PowerSGD. We will like to thank the authors of PowerSGD for providing the code.

Cite

@article{agarwal2020accordion,
  title={Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification},
  author={Agarwal, Saurabh and Wang, Hongyi and Lee, Kangwook and Venkataraman, Shivaram and Papailiopoulos, Dimitris},
  journal={arXiv preprint arXiv:2010.16248},
  year={2020}
}