Home

Awesome

chainer-computational-cost

Build Status Coverage Status

This is a tool to estimate theoretical computational cost of a chainer-based neural network forward pass.

You can analyze

For each layer (we call them computational costs).

Also, summary of these computational costs for each layer-type, and total cost can be calculated.

The computational costs this tool estimates are all theoretical number, by assuming a most straightforward naive implementation, for each layer. Therefore, for example, the following factors are NOT considered.

In addition, for now, they are not in cosideration either.

Otherwise, these costs are also exluceded from computational cost.

However,

Requirements

Installation

% pip install chainer_computational_cost

Manual installation by

% git clone git@github.com:belltailjp/chainer_computational_cost.git
% cd chainer_computational_cost
% python setup.py

or

% pip install git+https://github.com/belltailjp/chainer_computational_cost.git

Quick Start

import chainer
import chainer.links as L
import numpy as np

from chainer_computational_cost import ComputationalCostHook

net = L.VGG16Layers()
x = np.random.random((1, 3, 224, 224)).astype(np.float32)
with chainer.no_backprop_mode(), chainer.using_config('train', False):
    with ComputationalCostHook(fma_1flop=True) as cch:
        y = net(x)
        cch.show_report(unit='G', mode='md')

It will show the following table to stdout in markdown table format.

Layer nameGFLOPsMemRead GiBMemWrite GiBMemR+W GiB
Convolution2DFunction-10.0870.0010.0120.013
ReLU-10.0030.0120.0120.024
Convolution2DFunction-21.850.0120.0120.024
ReLU-20.0030.0120.0120.024
MaxPooling2D-10.0020.0120.0030.015
Convolution2DFunction-30.9250.0030.0060.009
ReLU-30.0020.0060.0060.012
Convolution2DFunction-41.850.0070.0060.013
ReLU-40.0020.0060.0060.012
MaxPooling2D-20.0010.0060.0010.007
Convolution2DFunction-50.9250.0030.0030.006
ReLU-50.0010.0030.0030.006
Convolution2DFunction-61.850.0050.0030.008
ReLU-60.0010.0030.0030.006
Convolution2DFunction-71.850.0050.0030.008
ReLU-70.0010.0030.0030.006
MaxPooling2D-30.0010.0030.0010.004
Convolution2DFunction-80.9250.0050.0010.007
ReLU-80.00.0010.0010.003
Convolution2DFunction-91.850.010.0010.012
ReLU-90.00.0010.0010.003
Convolution2DFunction-101.850.010.0010.012
ReLU-100.00.0010.0010.003
MaxPooling2D-40.00.0010.00.002
Convolution2DFunction-110.4620.0090.00.01
ReLU-110.00.00.00.001
Convolution2DFunction-120.4620.0090.00.01
ReLU-120.00.00.00.001
Convolution2DFunction-130.4620.0090.00.01
ReLU-130.00.00.00.001
MaxPooling2D-50.00.00.00.0
Reshape-10.00.00.00.0
LinearFunction-10.1030.3830.00.383
ReLU-140.00.00.00.0
LinearFunction-20.0170.0630.00.063
ReLU-150.00.00.00.0
LinearFunction-30.0040.0150.00.015
Softmax-10.00.00.00.0
total15.4880.6230.1070.729

If you call show_summary_report method, it will show summary for each type of layer.

Layer type# LayersGFLOPsMemRead GiBMemWrite GiBMemR+W GiB
Convolution2DFunction1315.3470.0890.050.139
ReLU150.0140.050.050.101
MaxPooling2D50.0050.0230.0060.029
Reshape10.00.00.00.0
LinearFunction30.1240.4610.00.461
Softmax10.00.00.00.0
total3815.4880.6230.1070.729

Estimation values can be accessed through instance method of ComputationalCostHook.

Usage

As for basic usage, please refer to the avobe quickstart.

FMA mode

When fma_1flop is set to True, chainer_computational_cost considers FMA (fused multiply and add, ax + b) as one operation. Otherwise, it counts as 2 operations.

This affects to convolution and linear layers' estimation.

Reporting

Estimated computational cost table is reported by calling show_report and show_summary_report method.

These have several options as explained below.

Report mode

Currently it supports the following modes.

>>> cch.show_summary_report(unit='G', mode='table')
+-----------------------+----------+--------+---------+----------+--------+
|      Layer type       | # Layers | GFLOPs | MemRead | MemWrite | MemR+W |
|                       |          |        |   GiB   |   GiB    |  GiB   |
+=======================+==========+========+=========+==========+========+
| Convolution2DFunction | 13       | 15.347 | 0.089   | 0.050    | 0.139  |
+-----------------------+----------+--------+---------+----------+--------+
| ReLU                  | 15       | 0.014  | 0.050   | 0.050    | 0.101  |
+-----------------------+----------+--------+---------+----------+--------+
| MaxPooling2D          | 5        | 0.005  | 0.023   | 0.006    | 0.029  |
+-----------------------+----------+--------+---------+----------+--------+
| Reshape               | 1        | 0      | 0       | 0        | 0      |
+-----------------------+----------+--------+---------+----------+--------+
| LinearFunction        | 3        | 0.124  | 0.461   | 0        | 0.461  |
+-----------------------+----------+--------+---------+----------+--------+
| Softmax               | 1        | 0      | 0       | 0        | 0      |
+-----------------------+----------+--------+---------+----------+--------+
| total                 | 38       | 15.488 | 0.623   | 0.107    | 0.729  |
+-----------------------+----------+--------+---------+----------+--------+

Report destination

Report is by default written to stdout. You can specify stream (e.g. file object) to dst argument of show_report and show_summary_report.

cch.show_report(ost=sys.stderr, unit='G', mode='md')

cch.show_summary_report(ost=sys.stderr, unit='G', mode='md')

Prefixed-units

The following unit prefixes are supported by unit argument of show_report and show_summary_report.

For memory report, the unit will be shown as like KiB or MiB instead of KB.

Number of digits

You can specify how many digits after the decimal point to show to n_digits argument of show_report and show_summary_report.

By default it is set to 3. Possible value is between 0 (round to integer) to 10. If None is specified it is treated as 10.

Be noted that you do not need to worry about numerical error in summary report due to the rounding, because summary values are calculated before rounding.

>>> cch.show_summary_report(unit='G', mode='table', n_digits=8)
+-----------------------+----------+-------------+------------+------------+------------+
|      Layer type       | # Layers |   GFLOPs    |  MemRead   |  MemWrite  |   MemR+W   |
|                       |          |             |    GiB     |    GiB     |    GiB     |
+=======================+==========+=============+============+============+============+
| Convolution2DFunction | 13       | 15.34663066 | 0.08864903 | 0.05046844 | 0.13911748 |
+-----------------------+----------+-------------+------------+------------+------------+
| ReLU                  | 15       | 0.01355571  | 0.05049896 | 0.05049896 | 0.10099792 |
+-----------------------+----------+-------------+------------+------------+------------+
| ...                   | ...      | ...         | ...        | ...        | ...        |

Custom columns

You can specify which column to show as a table to columns argument of show_report and show_summary_report.

There are two ways to customize columns.

The first way is to make use of predefined columns set. There are some column definitions in SummaryColumns for show_summary_report, and ReportColumns for show_report, respectively.

>>> cch.show_summary_report(unit='G', mode='table', columns=SummaryColumns.ALL)
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
|      Layer type       | # Layers | GFLOPs | MemRead | MemWrite | MemR+W |  FLOPs  | MemRead | MemWrite | MemR+W  |
|                       |          |        |   GiB   |   GiB    |  GiB   |   (%)   |   (%)   |   (%)    |   (%)   |
+=======================+==========+========+=========+==========+========+=========+=========+==========+=========+
| Convolution2DFunction | 13       | 15.347 | 0.089   | 0.05     | 0.139  | 99.085% | 14.237% | 47.297%  | 19.073% |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| ReLU                  | 15       | 0.014  | 0.05    | 0.05     | 0.101  | 0.088%  | 8.11%   | 47.325%  | 13.847% |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| MaxPooling2D          | 5        | 0.005  | 0.023   | 0.006    | 0.029  | 0.03%   | 3.662%  | 5.343%   | 3.908%  |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| Reshape               | 1        | 0.0    | 0.0     | 0.0      | 0.0    | 0.0%    | 0.0%    | 0.0%     | 0.0%    |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| LinearFunction        | 3        | 0.124  | 0.461   | 0.0      | 0.461  | 0.798%  | 73.991% | 0.032%   | 63.171% |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| Softmax               | 1        | 0.0    | 0.0     | 0.0      | 0.0    | 0.0%    | 0.001%  | 0.003%   | 0.001%  |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+
| total                 | 38       | 15.488 | 0.623   | 0.107    | 0.729  | 100.0%  | 100.0%  | 100.0%   | 100.0%  |
+-----------------------+----------+--------+---------+----------+--------+---------+---------+----------+---------+

The other way is to manually specify the column list.

>>> cch.show_report(unit='G', mode='table' , columns=[
...     'name', 'flops', 'mread', 'mwrite', 'mrw', 'output_shapes', "params"
... ])
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+
|        Layer name        | GFLOPs | MemRead | MemWrite | MemR+W |    Output shapes     |            Function parameters             |
|                          |        |   GiB   |   GiB    |  GiB   |                      |                                            |
+==========================+========+=========+==========+========+======================+============================================+
| Convolution2DFunction-1  | 0.087  | 0.001   | 0.012    | 0.013  | [(1, 64, 224, 224)]  | k=3, s=1, p=1, d=1, groups=1, nobias=False |
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+
| ReLU-1                   | 0.003  | 0.012   | 0.012    | 0.024  | [(1, 64, 224, 224)]  |                                            |
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+
| Convolution2DFunction-2  | 1.850  | 0.012   | 0.012    | 0.024  | [(1, 64, 224, 224)]  | k=3, s=1, p=1, d=1, groups=1, nobias=False |
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+
| ...                      | ...    | ...     | ...      | ...    | ...                  | ...                                        |
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+
| total                    | 15.488 | 0.623   | 0.107    | 0.729  |                      |                                            |
+--------------------------+--------+---------+----------+--------+----------------------+--------------------------------------------+

Access to the detailed report

Once you let cch gather the computational costs as explained above, you can access to the report information directly.

>>> cch.layer_report

It is a huge dictionary whose structure is:

{
    "Convolution2DFunction-1": {
        "name": "Convolution2DFunction-1",
        "type": "Convolution2DFunction",
        "flops": 86704128,
        "mread": 609280,
        "mwrite": 12845056,
        "mrw": 13454336,
        "traceback": "...",
        "input_shapes": [[1, 3, 224, 224], [64, 3, 3, 3], [ 4]],
        "output_shapes": [[1, 64, 224, 224]],
        "params": {"k": 3, "s": 1, "p": 1, "d": 1, "groups": 1, "nobias": false},
        "flops%": 0.5597995752663483,
        "mread%": 0.09112725854650683,
        "mwrite%": 11.21102960110868,
        "mrw%": 1.7179140987382209
    },
    "ReLU-1": {
        "name": "ReLU-1",
        "type": "ReLU",
        "flops": 3211264,
        "mread": 12845056,
        "mwrite": 12845056,
        "mrw": 25690112,
        "traceback": ...,
        "input_shapes": [[1, 64, 224, 224]],
        "output_shapes": [[1, 64, 224, 224]],
        "params": {},
        "flops%": 0.020733317602457342,
        "mread%": 1.9211770272392967,
        "mwrite%": 11.21102960110868,
        "mrw%": 3.2802366168768162
    },
    ...
}

Also, summary report can be found. This contains total costs for each type of layers.

>>> cch.summary_report
{
    "Convolution2DFunction": {
        "type": "Convolution2DFunction",
        "name": "Convolution2DFunction",
        "flops": 15346630656,
        "n_layers": 13,
        "mread": 95186176,
        "mwrite": 54190080,
        "mrw": 149376256,
        "flops%": 99.08452482214363,
        "mread%": 14.236566554630553,
        "mwrite%": 47.29653112967724,
        "mrw%": 19.07307623350047
    },
    "ReLU": {
        "type": "ReLU",
        "name": "ReLU",
        "flops": 13555712,
        "n_layers": 15,
        "mread": 54222848,
        "mwrite": 54222848,
        "mrw": 108445696,
        "flops%": 0.08752157475169971,
        "mread%": 8.109866545469965,
        "mwrite%": 47.32513069498619,
        "mrw%": 13.846866178002324
    },
    ...
    "total": {
        "name": "total",
        "type": "total",
        "flops": 15488423327,
        "n_layers": 38,
        "mread": 668603456,
        "mwrite": 114575168,
        "mrw": 783178624,
        "flops%": 100.0,
        "mread%": 100.0,
        "mwrite%": 100.0,
        "mrw%": 100.0
    }
}

Supported layers

Please see DETAILS.md.

Custom cost calculator for non-supported layer types

Layer types supported are listed in the next section.

In case you need an unsupported layer or you have your custom layer, you can insert a cost calculator.

def custom_calculator(func, in_data, **kwargs)
    ...
    return (0, 0, 0)

with chainer.no_backprop_mode(), chainer.using_config('train', False):
    with ComputationalCostHook(fma_1flop=True) as cch:
        cch.add_custom_cost_calculator(F.math.basic_math.Add, custom_calculator)
        y = x + x   # Call Add

        cch.report['Add-0']    # you can find custom estimation result

A custom cost calculator has to have the following signature.

Also, a calculator has to return a tuple with the following 4 elements:

For more details about how to implement custom cost calculator, please refer existing implementations located in chainer_computational_cost/cost_calculators/*.py.

You can overwrite your custom calculator to existing one. This is useful when for example the device or environment you're considering has some special specifications that are different from normal behavior. e.g. there is an inference engine that doesn't support inplace Reshape, whose mread and mwrite won't be 0.

If a layer not supported by chainer_computational_cost is used, it shows a message like Warning: XXXFunction is not yet supported by ComputationalCostHook, ignored.

Also, you can access to which layers are ignored.

with ComputationalCostHook() as cch:
  ...
  print(cch.ignored_layers)

It has the following structure.

{
  'XXXFunction':
  {
    'type': 'XXXFunction',
    'traceback': '(traceback)'
  },
  ...
}

Contribution Guide

New layer support

Adding layer type support is one of the most important contribution.

The specification is almost same as cost calculator for custom layers explaine above.

In addition, functions have to follow the following rules.

Once you properly implemented your cost calculation function, chainer-computational-cost automatically find and configure. This auto-discovery mechanism is implemented in chainer_computational_cost/cost_calculators/__init__.py.

Also, please write a docstring for each cost calculator in Markdown format. It will be used to generate DETAILS.md. Please refer Documentation section below for more details.

Coding standard

Following the Chainer contribution guide, chainer-computational-cost also requires flake8 and hacking.

% pip install hacking flake8
% flake8 .

Testing

We use pytest for unit-testing. Please write sufficient test cases when you support new functions or implement a new feataure.

% pip install pytest
% python -m pytest

In order to check coverage locally, please confirm by following command.

% pip install pytest-cov
% python -m pytest --cov chainer_computational_cost --cov-report html:cov_html
% open cov_html/index.html

Every PRs will be automatically tested by Travis CI and code coverage of the test is monitored by coveralls. Please make sure your PR becomes all green.

Documentation

(TODO: consider better way)

DETAILS.md is automatically generated by make_details_md.py script.

This script collects docstring of each cost calculation functions. If a cost calculator doesn't have a docstring, it won't appear in DETAILS.md. So please always write docstring for every cost calculators in the following format.

def calc_xxxx(func: XXXX, inputs, **kwargs):
    """XXXX

    XXXX is defined as: $y=f(x)$
    ...
    |Item|Value|
    |:---|:---|
    |FLOPs| $$ 4 \| x \| $$ |
    |mread| $$ \| x \| $$ |
    |mwrite| $$ \| x \| $$ |
    """

Also, mathematical formula is supported. This is an example of inline equation.

function $y=f(x)$ is ...

And non-inline equation.

function is defined a follows:
$$y=f(x)$$

Both inline and non-inline formula, please write in one line.

# NG pattern
$$
y=f(x)
$$

After writing docstring, DETAILS.md can be generated by:

% python make_details_md.py

Then please don't forget to commit new DETAILS.md.

Discriminator

There is not assurance of cost estimation formula, and it might change in the future. Please verify it by yourself if you will use this for critical purposes.

Acknowledgements

The key concept of chainer-computational-cost is originally developed by t-abe.