Home

Awesome

cnn-benchmarks

Benchmarks for popular convolutional neural network models on CPU and different GPUs, with and without cuDNN.

Some general conclusions from this benchmarking:

All benchmarks were run in Torch. The GTX 1080 and Maxwell Titan X benchmarks were run on a machine with dual Intel Xeon E5-2630 v3 processors (8 cores each plus hyperthreading means 32 threads) and 64GB RAM running Ubuntu 14.04 with the CUDA 8.0 Release Candidate. The Pascal Titan X benchmarks were run on a machine with an Intel Core i5-6500 CPU and 16GB RAM running Ubuntu 16.04 with the CUDA 8.0 Release Candidate. The GTX 1080 Ti benchmarks were run on a machine with an Intel Core i7-7700 CPU and 64GB RAM running Ubuntu 16.04 with the CUDA 8.0 release.

We benchmark all models with a minibatch size of 16 and an image size of 224 x 224; this allows direct comparisons between models, and allows all but the ResNet-200 model to run on the GTX 1080, which has only 8GB of memory.

The following models are benchmarked:

NetworkLayersTop-1 errorTop-5 errorSpeed (ms)Citation
AlexNet842.9019.8014.56[1]
Inception-V122-10.0739.14[2]
VGG-161627.008.80128.62[3]
VGG-191927.309.00147.32[3]
ResNet-181830.4310.7631.54[4]
ResNet-343426.738.7451.59[4]
ResNet-505024.017.02103.58[4]
ResNet-10110122.446.21156.44[4]
ResNet-15215222.166.16217.91[4]
ResNet-20020021.665.79296.51[5]

Top-1 and Top-5 error are single-crop error rates on the ILSVRC 2012 Validation set, except for VGG-16 and VGG-19 which instead use dense prediction on a 256x256 image. This gives the VGG models a slight advantage, but I was unable to find single-crop error rates for these models. All models perform better when using more than one crop at test-time.

Speed is the total time for a forward and backward pass on a Pascal Titan X with cuDNN 5.1.

You can download the model files used for benchmarking here (2.1 GB); these were converted from Caffe or Torch checkpoints using the convert_model.lua script.

We use the following GPUs for benchmarking:

GPUMemoryArchitectureCUDA CoresFP32 TFLOPSRelease Date
Pascal Titan X12GB GDDRX5Pascal358410.16August 2016
GTX 10808GB GDDRX5Pascal25608.87May 2016
GTX 1080 Ti11GB GDDRX5Pascal358410.6March 2017
Maxwell Titan X12GB GDDR5Maxwell30726.14March 2015

AlexNet

(input 16 x 3 x 224 x 224)

We use the BVLC AlexNet from Caffe.

AlexNet uses grouped convolutions; this was a strategy to allow model parallelism over two GTX 580 GPUs, which had only 3GB of memory each. Grouped convolutions are no longer commonly used, and are not even implemented by the torch/nn backend; therefore we can only benchmark AlexNet using cuDNN.

GPUcuDNNForward (ms)Backward (ms)Total (ms)
GTX 1080 Ti5.1.104.319.5813.89
Pascal Titan X5.1.055.049.5214.56
Pascal Titan X5.0.055.3210.9016.23
GTX 10805.1.057.0013.7420.74
Maxwell Titan X5.1.057.0914.7621.85
GTX 10805.0.057.3515.7323.08
Maxwell Titan X5.0.057.5517.7825.33
Maxwell Titan X4.0.078.0317.9125.94

Inception-V1

(input 16 x 3 x 224 x 224)

We use the Torch implementation of Inception-V1 from soumith/inception.torch.

GPUcuDNNForward (ms)Backward (ms)Total (ms)
GTX 1080 Ti5.1.1011.5025.3736.87
Pascal Titan X5.1.0512.0627.0839.14
Pascal Titan X5.0.0511.9428.3940.33
GTX 10805.0.0516.0840.0856.16
Maxwell Titan X5.1.0519.2942.6961.98
Maxwell Titan X5.0.0519.2746.4165.68
Maxwell Titan X4.0.0721.0449.4170.45
GTX 1080 TiNone56.3485.30141.64
Pascal Titan XNone57.4685.90143.36
GTX 1080None63.03102.31165.34
Maxwell Titan XNone91.31140.81232.12

VGG-16

(input 16 x 3 x 224 x 224)

This is Model D in [3] used in the ILSVRC-2014 competition, available here.

GPUcuDNNForward (ms)Backward (ms)Total (ms)
GTX 1080 Ti5.1.1041.2386.91128.14
Pascal Titan X5.1.0541.5987.03128.62
Pascal Titan X5.0.0546.16111.23157.39
GTX 10805.1.0559.37123.42182.79
Maxwell Titan X5.1.0562.30130.48192.78
GTX 10805.0.0567.27166.17233.43
Maxwell Titan X5.0.0575.80186.47262.27
Maxwell Titan X4.0.07111.99226.69338.69
Pascal Titan XNone98.15260.38358.53
GTX 1080None143.73379.09522.82
Maxwell Titan XNone172.61415.87588.47
CPU: Dual Xeon E5-2630 v3None3101.765393.728495.48

VGG-19

(input 16 x 3 x 224 x 224)

This is Model E in [3] used in the ILSVRC-2014 competition, available here.

GPUcuDNNForward (ms)Backward (ms)Total (ms)
Pascal Titan X5.1.0548.0999.23147.32
GTX 1080 Ti5.1.1048.15100.04148.19
Pascal Titan X5.0.0555.75134.98190.73
GTX 10805.1.0568.95141.44210.39
Maxwell Titan X5.1.0573.66151.48225.14
GTX 10805.0.0579.79202.02281.81
Maxwell Titan X5.0.0593.47229.34322.81
Maxwell Titan X4.0.07139.01279.21418.22
Pascal Titan XNone121.69318.39440.08
GTX 1080None176.36453.22629.57
Maxwell Titan XNone215.92491.21707.13
CPU: Dual Xeon E5-2630 v3None3609.786239.459849.23

ResNet-18

(input 16 x 3 x 224 x 224)

This is the 18-layer model described in [4] and implemented in fb.resnet.torch.

GPUcuDNNForward (ms)Backward (ms)Total (ms)
Pascal Titan X5.1.0510.1421.4031.54
GTX 1080 Ti5.1.1010.4522.3432.78
Pascal Titan X5.0.0510.0623.0833.13
GTX 10805.1.0514.6229.3243.94
GTX 10805.0.0514.8432.6847.52
Maxwell Titan X5.1.0516.8734.5551.42
Maxwell Titan X5.0.0517.0837.7954.87
Maxwell Titan X4.0.0721.5442.2663.80
Pascal Titan XNone34.7661.6496.40
GTX 1080 TiNone50.0465.99116.03
GTX 1080None42.9479.17122.10
Maxwell Titan XNone55.8296.01151.82
CPU: Dual Xeon E5-2630 v3None847.461348.332195.78

ResNet-34

(input 16 x 3 x 224 x 224)

This is the 34-layer model described in [4] and implemented in fb.resnet.torch.

GPUcuDNNForward (ms)Backward (ms)Total (ms)
GTX 1080 Ti5.1.1016.7134.6051.31
Pascal Titan X5.1.0517.0134.5851.59
Pascal Titan X5.0.0516.9138.6755.58
GTX 10805.1.0524.5047.5972.09
GTX 10805.0.0524.7655.0079.76
Maxwell Titan X5.1.0527.3352.9080.23
Maxwell Titan X5.0.0528.7963.1991.98
Maxwell Titan X4.0.0740.1276.00116.11
Pascal Titan XNone66.56106.42172.98
GTX 1080 TiNone86.30109.43195.73
GTX 1080None82.71137.42220.13
Maxwell Titan XNone108.95166.19275.13
CPU: Dual Xeon E5-2630 v3None1530.012435.203965.21

ResNet-50

(input 16 x 3 x 224 x 224)

This is the 50-layer model described in [4] and implemented in fb.resnet.torch.

GPUcuDNNForward (ms)Backward (ms)Total (ms)
GTX 1080 Ti5.1.1034.1467.06101.21
Pascal Titan X5.1.0535.0368.54103.58
Pascal Titan X5.0.0535.0370.76105.78
GTX 10805.1.0550.6499.18149.82
GTX 10805.0.0550.76103.35154.11
Maxwell Titan X5.1.0555.75103.87159.62
Maxwell Titan X5.0.0556.30109.75166.05
Maxwell Titan X4.0.0762.03116.81178.84
Pascal Titan XNone87.62158.96246.58
GTX 1080 TiNone99.90177.58277.47
GTX 1080None109.79201.40311.18
Maxwell Titan XNone137.14247.65384.79
CPU: Dual Xeon E5-2630 v3None2477.614149.646627.25

ResNet-101

(input 16 x 3 x 224 x 224)

This is the 101-layer model described in [4] and implemented in fb.resnet.torch.

GPUcuDNNForward (ms)Backward (ms)Total (ms)
GTX 1080 Ti5.1.1052.18102.08154.26
Pascal Titan X5.1.0553.38103.06156.44
Pascal Titan X5.0.0553.28108.20161.48
GTX 10805.1.0577.59148.21225.80
GTX 10805.0.0577.39158.19235.58
Maxwell Titan X5.1.0587.76159.73247.49
Maxwell Titan X5.0.0588.45172.12260.57
Maxwell Titan X4.0.07108.96189.93298.90
Pascal Titan XNone161.55257.57419.11
GTX 1080 TiNone162.03266.77428.81
GTX 1080None203.19322.48525.67
Maxwell Titan XNone260.48453.45713.93
CPU: Dual Xeon E5-2630 v3None4414.916891.3311306.24

ResNet-152

(input 16 x 3 x 224 x 224)

This is the 152-layer model described in [4] and implemented in fb.resnet.torch.

GPUcuDNNForward (ms)Backward (ms)Total (ms)
GTX 1080 Ti5.1.1073.52142.02215.54
Pascal Titan X5.1.0575.45142.47217.91
Pascal Titan X5.0.0575.12150.08225.20
GTX 10805.1.05109.32204.98314.30
GTX 10805.0.05109.64218.62328.26
Maxwell Titan X5.1.05124.04221.41345.45
Maxwell Titan X5.0.05124.88240.16365.03
Maxwell Titan X4.0.07150.90268.64419.54
Pascal Titan XNone238.04371.40609.43
GTX 1080 TiNone225.36368.42593.79
GTX 1080None299.05461.67760.72
Maxwell Titan XNone382.39583.83966.22
CPU: Dual Xeon E5-2630 v3None6572.1710300.6116872.78

ResNet-200

(input 16 x 3 x 224 x 224)

This is the 200-layer model described in [5] and implemented in fb.resnet.torch.

Even with a batch size of 16, the 8GB GTX 1080 did not have enough memory to run the model.

GPUcuDNNForward (ms)Backward (ms)Total (ms)
Pascal Titan X5.1.05104.74191.77296.51
Pascal Titan X5.0.05104.36201.92306.27
Maxwell Titan X5.0.05170.03320.80490.83
Maxwell Titan X5.1.05169.62383.80553.42
Maxwell Titan X4.0.07203.52356.35559.87
Pascal Titan XNone314.77519.72834.48
Maxwell Titan XNone497.57953.941451.51
CPU: Dual Xeon E5-2630 v3None8666.4313758.7322425.16

Citations

<a id='alexnet-paper'> [1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks." NIPS 2012 <br> <a id='inception-v1-paper'> [2] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Andrew Rabinovich. "Going Deeper with Convolutions." CVPR 2015. <br> <a id='vgg-paper'> [3] Karen Simonyan and Andrew Zisserman. "Very Deep Convolutional Networks for Large-Scale Image Recognition." ICLR 2015 <br> <a id='resnet-cvpr'> [4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." CVPR 2016. <br> <a id='resnet-eccv'> [5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Identity Mappings in Deep Residual Networks." ECCV 2016.