Home

Awesome

Efficient Algorithms for Device Placement of DNN Graph Operators - code

This code package contains algorithms and input files (DNN workloads) from the paper "Efficient Algorithms for Device Placement of DNN Graph Operators" published at NeurIPS 2020. It allows one to reproduce the results in the paper, as well as run the partitioning algorithms on other workloads.

Throughout this package (code and input files), "FPGA" is used is to refer to an arbitrary accelerator. In particular, for layer-granularity graphs we actually use profiled processing time (latency) data from a GPU deployment scenario.

Input format

All our algorithms take as input a JSON file with the following format (all fields are mandatory unless indicated otherwise). This format closely follows our model (see Section "Computational Model" in the paper):

Other debug information may be present in the input files, such as names or layerIds on nodes and sizes on edges.

Throughput maximization (max-load minimization)

Here we give two algorithms: DP (dynamic programming) and IP (integer programming) based. We also give several baselines. Input files are placed in the directory throughput-inputs.

Dynamic programming solution

The solution is implemented in throughput-dp/dp.cpp. It is a single C++ file (using one header-only library for JSON parsing) and can be compiled with a recent version of gcc by running e.g. g++ -O3 throughput-dp/dp.cpp -o dp.exe.

The compiled program takes the input network in the JSON format outlined above via standard input. It can be executed as follows: ./dp.exe < throughput-inputs/OperatorGraphs/bert_l-3_inference.json. It produces an output JSON file on standard output and diagnostic messages on standard error.

Integer programming solution

This solution requires the solver Gurobi to be installed and have an active license. The solution is implemented in Python 3.7.

To run the solution, execute python throughput-ip/ip.py contig or python throughput-ip/ip.py noncontig depending on whether contiguous splits are desired. It also takes the input on standard input, so an example run could be python throughput-ip/ip.py contig < throughput-inputs/OperatorGraphs/bert_l-3_inference.json. The Gurobi solver will print information about the progress of the optimization to file gurobi.log.

Further parameters than can be adjusted include:

Scotch baseline

First compile the DP solution. Then run it with a command-line parameter -scotch, e.g.: ./dp.exe -scotch < throughput-inputs/OperatorGraphs/bert_l-3_inference.json. For this to work, Scotch needs to be installed and its bin directory should be added to the system PATH. To check if this is the case, run gpart in the console. Alternatively, you can change the constant scotchExecutable in dp.cpp.

Human-expert baseline

We provide human-expert splits in the directory human-experts. The dp program can compute the max-load (Time-Per-Sample) for a given split file using the command-line parameter -expert. The second parameter should be the split file. E.g. ./dp.exe -expert human-experts/resnet50_inference_expert.json < throughput-inputs/LayerGraphs/resnet50_inference.json. If there is no human expert split for a training workload, use the corresponding inference split - the dp program will infer the split for the backward pass from the split for the forward pass.

Local search baseline

Add an option -localSearch and a second parameter specifying the number of restarts (we used 10).

PipeDream baseline

Use option -pipeDream.

Place everything on a single FPGA

Use option -oneFpga.

Latency minimization

Here we give an IP (integer programming) based algorithm, as well as two baselines. Input files are placed in the directory latency-inputs.

Integer programming solution

This solution requires the solver Gurobi to be installed and have an active license. The solution is implemented in Python 3.7.

To run the solution, execute python latency-ip/lip.py. It also takes the input on standard input, so an example run could be python latency-ip/lip.py < latency-inputs/OperatorGraphs/bert_l-3_inference.json. The Gurobi solver will print information about the progress of the optimization to file gurobi.log.

Parameters that can be adjusted include:

Greedy baseline

The solution, described in Appendix F, is implemented in latency-greedy/greedy.cpp. It is a single C++ file (using one header-only library for JSON parsing) and can be compiled with a recent version of gcc by running e.g. g++ -O3 latency-greedy/greedy.cpp -o greedy.exe.

The compiled program takes the input network in the JSON format outlined above via standard input. It can be executed as follows: ./greedy.exe < latency-inputs/OperatorGraphs/bert_l-3_inference.json. It produces an output JSON file on standard output and diagnostic messages on standard error.

Max-load DP baseline

The same program can alternatively take any split (in the output JSON format produced by our implementations) and compute the minimum single-sample latency that can be obtained with this split. To do this, run greedy.exe with two command-line parameters: -computeLatency and the path to the JSON file containing the split. The input workload should still be given via standard input.

Our second baseline for latency minimization, described in Appendix F, consists in running the throughput-maximization (max-load minimization) DP solution and computing the latency that it obtains. To do so, run e.g.:

Human expert baseline

See "max-load DP baseline" above - the same program can compute the latency for a given split. Run e.g.:

Scotch baseline

First run dp with -scotch option (see above) to produce the split. Then use greedy to compute the latency of the split. E.g.:

Legal notices

Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

We use the JSON for Modern C++ library, copyright (c) 2013-2020 Niels Lohmann, licensed under the MIT license.