Home

Awesome

convnet-burden

Estimates of memory consumption and FLOP counts for various convolutional neural networks.

Image Classification Architectures

The numbers below are given for single element batches.

modelinput sizeparam memfeat. memflopssrcperformance
alexnet227 x 227233 MB3 MB727 MFLOPsMCN41.80 / 19.20
caffenet224 x 224233 MB3 MB724 MFLOPsMCN42.60 / 19.70
squeezenet1-0224 x 2245 MB30 MB837 MFLOPsPT41.90 / 19.58
squeezenet1-1224 x 2245 MB17 MB360 MFLOPsPT41.81 / 19.38
vgg-f224 x 224232 MB4 MB727 MFLOPsMCN41.40 / 19.10
vgg-m224 x 224393 MB12 MB2 GFLOPsMCN36.90 / 15.50
vgg-s224 x 224393 MB12 MB3 GFLOPsMCN37.00 / 15.80
vgg-m-2048224 x 224353 MB12 MB2 GFLOPsMCN37.10 / 15.80
vgg-m-1024224 x 224333 MB12 MB2 GFLOPsMCN37.80 / 16.10
vgg-m-128224 x 224315 MB12 MB2 GFLOPsMCN40.80 / 18.40
vgg-vd-16-atrous224 x 22482 MB58 MB16 GFLOPsN/A- / -
vgg-vd-16224 x 224528 MB58 MB16 GFLOPsMCN28.50 / 9.90
vgg-vd-19224 x 224548 MB63 MB20 GFLOPsMCN28.70 / 9.90
googlenet224 x 22451 MB26 MB2 GFLOPsMCN34.20 / 12.90
resnet18224 x 22445 MB23 MB2 GFLOPsPT30.24 / 10.92
resnet34224 x 22483 MB35 MB4 GFLOPsPT26.70 / 8.58
resnet-50224 x 22498 MB103 MB4 GFLOPsMCN24.60 / 7.70
resnet-101224 x 224170 MB155 MB8 GFLOPsMCN23.40 / 7.00
resnet-152224 x 224230 MB219 MB11 GFLOPsMCN23.00 / 6.70
resnext-50-32x4d224 x 22496 MB132 MB4 GFLOPsL122.60 / 6.49
resnext-101-32x4d224 x 224169 MB197 MB8 GFLOPsL121.55 / 5.93
resnext-101-64x4d224 x 224319 MB273 MB16 GFLOPsPT20.81 / 5.66
inception-v3299 x 29991 MB89 MB6 GFLOPsPT22.55 / 6.44
SE-ResNet-50224 x 224107 MB103 MB4 GFLOPsSE22.37 / 6.36
SE-ResNet-101224 x 224189 MB155 MB8 GFLOPsSE21.75 / 5.72
SE-ResNet-152224 x 224255 MB220 MB11 GFLOPsSE21.34 / 5.54
SE-ResNeXt-50-32x4d224 x 224105 MB132 MB4 GFLOPsSE20.97 / 5.54
SE-ResNeXt-101-32x4d224 x 224187 MB197 MB8 GFLOPsSE19.81 / 4.96
SENet224 x 224440 MB347 MB21 GFLOPsSE18.68 / 4.47
SE-BN-Inception224 x 22446 MB43 MB2 GFLOPsSE23.62 / 7.04
densenet121224 x 22431 MB126 MB3 GFLOPsPT25.35 / 7.83
densenet161224 x 224110 MB235 MB8 GFLOPsPT22.35 / 6.20
densenet169224 x 22455 MB152 MB3 GFLOPsPT24.00 / 7.00
densenet201224 x 22477 MB196 MB4 GFLOPsPT22.80 / 6.43
mcn-mobilenet224 x 22416 MB38 MB579 MFLOPsAU29.40 / -

Click on the model name for a more detailed breakdown of feature extraction costs at different input image/batch sizes if needed. The performance numbers are reported as top-1 error/top-5 error on the 2012 ILSVRC validation data. The src column indicates the source of the benchmark scores using the following abberviations:

These numbers provide an estimate of performance, but note that there may be small differences between the evaluation scripts from different sources.

References:

Object Detection Architectures

modelinput sizeparam memoryfeature memoryflops
rfcn-res50-pascal600 x 850122 MB1 GB79 GFLOPS
rfcn-res101-pascal600 x 850194 MB2 GB117 GFLOPS
ssd-pascal-vggvd-300300 x 300100 MB116 MB31 GFLOPS
ssd-pascal-vggvd-512512 x 512104 MB337 MB91 GFLOPS
ssd-pascal-mobilenet-ft300 x 30022 MB37 MB1 GFLOPs
faster-rcnn-vggvd-pascal600 x 850523 MB600 MB172 GFLOPS

The input sizes used are "typical" for each of the architectures listed, but can be varied. Anchor/priorbox generation and roi/psroi-pooling are not included in flop estimates. The ssd-pascal-mobilenet-ft detector uses the MobileNet feature extractor (the model used here was imported from the architecture made available by chuanqi305).

References:

Semantic Segmentation Architectures

modelinput sizeparam memoryfeature memoryflops
pascal-fcn32s384 x 384519 MB423 MB125 GFLOPS
pascal-fcn16s384 x 384514 MB424 MB125 GFLOPS
pascal-fcn8s384 x 384513 MB426 MB125 GFLOPS
deeplab-vggvd-v2513 x 513144 MB755 MB202 GFLOPs
deeplab-res101-v2513 x 513505 MB4 GB346 GFLOPs

In this case, the input sizes are those which are typically taken as input crops during training. The deeplab-res101-v2 model uses multi-scale input, with scales x1, x0.75, x0.5 (computed relative to the given input size).

References:

Keypoint Detection Architectures

modelinput sizeparam memoryfeature memoryflops
multipose-mpi368 x 368196 MB245 MB134 GFLOPS
multipose-coco368 x 368200 MB246 MB136 GFLOPS

References:

<h3>Notes and Assumptions</h3>

The numbers for each architecture should be reasonably framework agnostic. It is assumed that all weights and activations are stored as floats (with 4 bytes per datum) and that all relus are performed in-place. Feature memory therefore represents an estimate of the total memory consumption of the features computed via a forward pass of the network for a given input, assuming that memory is not re-used (the exception to this is that, as noted above, relus are performed in-place and do not add to the feature memory total). In practice, many frameworks will clear features from memory when they are no-longer required by the execution path and will therefore require less memory than is noted here. The feature memory statistic is simply a rough guide as to "how big" the activations of the network look.

Fused multiply-adds are counted as single operations. The numbers should be considered to be rough approximations - modern hardware makes it very difficult to accurately count operations (and even if you could, pipelining etc. means that it is not necessarily a good estimate of inference time).

The tool for computing the estimates is implemented as a module for the autonn wrapper of matconvnet and is included in this repo, so feel free to take a look for extra details. This module can be installed with the vl_contrib package manager (it has two dependencies which can be installed in a similar manner: autonn and mcnExtraLayers). Matconvnet versions of all of the models can be obtained from either here or here.

For further reading on the topic, the 2017 ICLR submission An analysis of deep neural network models for practical applications is interesting. If you find any issues, or would like to add additional models, add an issue/PR.