Home

Awesome

Keras inference time optimizer (KITO)

This code takes on input trained Keras model and optimize layer structure and weights in such a way that model became much faster (~10-30%), but works identically to initial model. It can be extremely useful in case you need to process large amount of images with trained model. Reduce operation was tested on all Keras models zoo. See comparison table below.

Installation

pip install kito

How it works?

In current version it only apply single type of optimization: It reduces Conv2D + BatchNormalization set of layers to single Conv2D layer. Since Conv2D + BatchNormalization is very common set of layers, optimization works well almost on all modern CNNs for image processing.

Also supported:

How to use

Typical code:

model.fit(...)
...
model.predict(...)

must be replaced with following block:

from kito import reduce_keras_model
model.fit(...)
...
model_reduced = reduce_keras_model(model)
model_reduced.predict(...)

So basically you need to insert 2 lines in your code to speed up operations. But note that it requires some time to convert model. You can see usage example in test_bench.py

Comparison table

Neural netInput shapeNumber of layers (Init)Number of layers (Reduced)Number of params (Init)Number of params (Reduced)Time to process 10000 images (Init)Time to process 10000 images (Reduced)Conversion Time (sec)Maximum diff on final layerAverage difference on final layer
MobileNet (1.0)(224, 224, 3)102754,253,8644,221,03232.3822.1312.452.80e-064.41e-09
MobileNetV2 (1.4)(224, 224, 3)1521006,156,7126,084,80852.5337.7187.003.99e-066.88e-09
ResNet50(224, 224, 3)17712425,636,71225,530,47258.8735.8145.285.06e-071.24e-09
Inception_v3(299, 299, 3)31321923,851,78423,817,35279.1559.55126.027.74e-071.26e-09
Inception_Resnet_v2(299, 299, 3)78257855,873,73655,813,192131.16102.38766.148.04e-079.26e-10
Xception(299, 299, 3)1349422,910,48022,828,688115.5676.1728.153.65e-079.69e-10
DenseNet121(224, 224, 3)4283698,062,5048,040,04068.2557.57392.244.61e-078.69e-09
DenseNet169(224, 224, 3)59651314,307,88014,276,20080.5668.74772.542.14e-061.79e-09
DenseNet201(224, 224, 3)70860920,242,98420,205,16098.9987.041120.887.00e-071.27e-09
NasNetMobile(224, 224, 3)7515635,326,7165,272,59946.0531.76728.961.10e-061.60e-09
NasNetLarge(331, 331, 3)102176188,949,81888,658,596445.58328.161402.611.43e-075.88e-10
ZF_UNET_224(224, 224, 3)856331,466,75331,442,68996.7669.179.934.72e-057.54e-09
DeepLabV3+ (mobile)(512, 512, 3)1621082,146,6452,097,013583.63432.7148.004.72e-051.00e-05
DeepLabV3+ (xception)(512, 512, 3)40926341,258,21340,954,0131000.36699.24333.18.63e-055.22e-06
ResNet152(224, 224, 3)56641160,344,23260,117,096107.9268.53357.658.94e-071.27e-09

Config: Single NVIDIA GTX 1080 8 GB. Timing obtained on Tensorflow 1.4 (+ CUDA 8.0) backend

Notes

Requirements

Formulas

Base formulas

Other implementations

PyTorch BN Fusion - with support for VGG, ResNet, SeNet.