Home

Awesome

Abacus

This repository contains the source code for a research paper that was submitted for publication at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21).

What is Abacus

Abacus is a runtime system that runs multiple DNN queries simultaneously with stable and predictable latency. Abacus enables deterministic operator overlap to enforce the latency predictability. Abacus is comprised of an overlap-aware latency predictor, a headroom-based query controller, and segmental model executors. The latency predictor is able to precisely predict the latencies of queries when the operator overlap is determined. The query controller determines the appropriate operator overlap to guarantee the QoS of all the DNN services on a GPU. The model executors run the operators as needed to support the deterministic operator overlap. Our evaluation using seven popular DNNs on an Nvidia A100 GPU shows that Abacus significantly reduces the QoS violation and improves the throughput compared with state-of-the-art solutions.

Environment Preparation

Getting Started

The following sections step through the things required to run Abacus

Profiling

Abacus needs to profile the essential data for training a precise overlap-aware latency predictor.

We first profiling the data without MPS enabled and MIG disabled.

We then profiling the data without MPS and MIG both enabled.

Training Predictor

After obataining all the profiling data, we train the latency predictor for each cases.

Training MLP model

Training LR/SVM model

5.5 Determining Modeling Techniques

We can get the prediction error from the output in terminal after training the predictor with MLP/LR/SVM models. To get the cross-validation results, we only need to re-train the model because the random seed for generating the dataset is automatically changed.

We organize the data of errors as shown in data/modeling/2in7results.csv.We also provide a script experiments/5.5_prediction/plot.py for plot the results.

The following figure depicts the prediction errors for all modeling methods.

<img src="figure/prediction_error.png" width="1000"/>
Prediction errors of all the evaluated modeling techniques: Linear Regression, SVM, and MLP. We also show the cross validation accuracy of MLP..

Online Serving

After profiling and training, we can serve multiple DNN services with Abacus

Evaluation

All evaluations are conducted in the root directory of Abacus. The following table presents the detailed information of each experiments, including the corresponding figures in paper and the shell scripts for running the experiment.

Experiment Name/ Section/ ParagraphRelated FiguresScripts Location
7.2 Ensuring QoSFigure 14, 15 & 16experiment/7.2_qos
7.3 Improving Peak ThroughputFigure 17experiment/7.3_throughput
7.4 Beyongd Pair-wise Co-locationFigure 18 & 19experiment/7.4_beyond_pair
7.5 Integrating with MIGsFigure 20 & 21experiment/7.5_mig
7.6 Applying in a DNN Serving ClusterFigure 22experiment/7.6_cluster
7.7 Effectiveness of Multi-way SearchFigure 23experiment/7.7_multiway

7.2 Ensuring QoS

7.3 Improving Peak Throughput

7.4 Beyongd Pair-wise Co-location

7.5 Integrating with MIGs

<img src="figure/mig_qos.png" width="580"><img src="figure/mig_throughput.png" width="580">
The 99%-ile latency of the co-located services with MIGs.The peak throughputs of the co-located services with MIGs.

7.6 Applying in a DNN Serving Cluster

 # Client
 $ ./experiments/7.6_cluster/client.sh
 # Server
 $ ./experiments/7.6_cluster/server.sh

The following figure presents

<img src="figure/large_scale.png" width="360">
The throughput, 99%-ile latency, and average la- tency of the benchmarks with Abacus and Clockwork.

7.7 Effectiveness of Multi-way Search

$ python main.py --task train --model_num 2 --mode all --modeling mlp
<img src="figure/bs_core_latency.png" width="360">
Duration of determining an appropriate operator group with different search ways.