Awesome
FRAME: <ins>F</ins>ast <ins>R</ins>oofline <ins>A</ins>nalytical <ins>M</ins>odeling and <ins>E</ins>stimation
This is a roofline cost model for DNN accelerators. We support CNNs, MLPs, and Transformers workload.
What it does
- Given DNN accelerator system information (using the
System
class insrc/system.py
), where you can specify PE array shape (mxu_shape), on-chip BWs, off-chip BWs, etcs. - Given DNN workload (e.g.,
model='vgg16'
)
FRAME
generate a table of layer-wise latency and memory usage information as well as a roofline figure, as shown in the following
How to use it
Interactive Design Space Exploration
You are welcome to play with it by notebook/dnn_accel_playground.ipynb
.
We also provide a colab version for quick trial
How to plug into you experiments
Use the analyze_model
.
model_df, _ = analyze_model()
model_df
contains a layer-by-layer analysis results.
The parameters of analyze_model
are described as follows.
Parameters
Algorithmic Parameters
Basic Parameters
- use_attn_model: Set to True if you want to use the pre-defined attention-based language model configuration that we provide.
- head, hidden_size, ff_hidden_size: If
use_attn_model
== False, these variable will be ignored.head
:number of head.hidden_size
: hidden_size of key/query/value projection.ff_hidden_size
: hiddden_size of the feedforward layers - custom_model: If you don't want to use the pre-defined bert model
- Set
use_attn_model
to False - Set
custom_model
tocustom
- Create a
custom.csv
model configuration indata/model
. You can usedata/model/alexnet.csv
as an template.
- Set
- batch_size: Batch size
Sparsity-specific Parameters
-
custom_sparsity: Set to False if
- you would not like to explore sparsity or
- if you would like to explore uniform sparsity on all the layers, i.e., all the layers have density defined by the following three parameters
density_input
,density_weight
,density_output
.
- if you set
custom_sparsty
to False, you can specify you customized layer-by-layer sparsity indata/sparsity/custom.csv
(assuming you model is namedcustom
)
-
density_input: Density of input tensor. Set to 1.0 if considering dense.
-
density_weight: Density of weight tensor. Set to 1.0 if considering dense.
-
density_output: Density of output tensor. Set to 1.0 if considering dense.
-
compress_mem: Set to True, if you want to model the fact of memory saving when model has sparsity. If set to False, then it would model the fact that model is saved in un-compressed format.
-
skip_compute: Set to True, if you want to model the fact of compute saving (by skipping 0 multiplication) when model has sparsity. If set to False, then it would model the fact that all the 0-multiplications are executed.
-
skip_compute_on_noopt_output: Set to True, if you want to model a more clever control which skip to computation when knowing the output is going to be ignored anyway. This would be effective when we are sparsifying the operation with masking the output. That is if we know the output is going to be masked anyway, we skip the computation.
Attention model -specific Parameter
- attn_method: The attention method. You can choose from
vanilla
,sparse
(sparse transformer-like),lowrank
(Linformer-like),kernel
(Performer-like). - low_rank_ratio: The low rank projection ratio, if you pick
lowrank
method. - spattn_density: Sparse attention density, if you pick
sparse
method. - m_ratio: The kernel approximation projection ratio, if you pick
kernel
method. - seq_len: Sequence length.
System Parameters
- onchip_mem_bw: On-chip memory bandwidth (GB/s)
- offchip_mem_bw: Off-chip memory bandwidth (GB/s)
- flops: The compute capacity. Number of floating point operation per seconds. (TFLOPs/s)
- frequency: The frequency of the system. (MHz/s)
- bits: The bits per elements: Can choose from
int8
,bg16
,f32
- compute_efficiency: The efficiency of the compute unit. Default as 1.0.
- memory_efficiency: The efficiency of the memory accesss. Default as 1.0.
- use_flops: Set to True, if you want to use
flops
to indicate the compute capacity. Then this will consider the ideal case. If you want to explore the impact of the shape of the PE (processing elements) array, then setuse_flops
to False and specfiy the PE array shape by the following parameters. - mxu_height: Height of PE array.
- mxu_width: Width of PE array.
- mxu_instance: Number of PE arrays.
- These three parameters will creste
mxu_instance
of PE arrays. Each PE array hasmxu_height
xmxu_width
PEs.
- These three parameters will creste
Contributors
- Sheng-Chun (Felix) Kao
- Suvinay Subramanian
- Abhimanyu Bambhaniya
- Tushar Krishna
Citation
@software{frame,
author = {Kao, Sheng-Chun and Subramanian, Suvinay and Bambhaniya, Abhimanyu and Krishna, Tushar},
title = {{FRAME: Fast Roofline Analytical Modeling and Estimation}},
url = {https://github.com/maestro-project/frame},
version = {1.0.0},
year = {2022}
}