Home

Awesome

Awesome Computer Vision Models Awesome

A curated list of popular classification, segmentation and detection models with corresponding evaluation metrics from papers.

Contents

Classification models

ModelNumber of parametersFLOPSTop-1 ErrorTop-5 ErrorYear
AlexNet ('One weird trick for parallelizing convolutional neural networks')62.3M1,132.33M40.9618.242014
VGG-16 ('Very Deep Convolutional Networks for Large-Scale Image Recognition')138.3M?26.788.692014
ResNet-10 ('Deep Residual Learning for Image Recognition')5.5M894.04M34.6914.362015
ResNet-18 ('Deep Residual Learning for Image Recognition')11.7M1,820.41M28.539.822015
ResNet-34 ('Deep Residual Learning for Image Recognition')21.8M3,672.68M24.847.802015
ResNet-50 ('Deep Residual Learning for Image Recognition')25.5M3,877.95M22.286.332015
InceptionV3 ('Rethinking the Inception Architecture for Computer Vision')23.8M?21.25.62015
PreResNet-18 ('Identity Mappings in Deep Residual Networks')11.7M1,820.56M28.439.722016
PreResNet-34 ('Identity Mappings in Deep Residual Networks')21.8M3,672.83M24.897.742016
PreResNet-50 ('Identity Mappings in Deep Residual Networks')25.6M3,875.44M22.406.472016
DenseNet-121 ('Densely Connected Convolutional Networks')8.0M2,872.13M23.487.042016
DenseNet-161 ('Densely Connected Convolutional Networks')28.7M7,793.16M22.866.442016
PyramidNet-101 ('Deep Pyramidal Residual Networks')42.5M8,743.54M21.986.202016
ResNeXt-14(32x4d) ('Aggregated Residual Transformations for Deep Neural Networks')9.5M1,603.46M30.3211.462016
ResNeXt-26(32x4d) ('Aggregated Residual Transformations for Deep Neural Networks')15.4M2,488.07M24.147.462016
WRN-50-2 ('Wide Residual Networks')68.9M11,405.42M22.536.412016
Xception ('Xception: Deep Learning with Depthwise Separable Convolutions')22,855,9528,403.63M20.975.492016
InceptionV4 ('Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning')42,679,81612,304.93M20.645.292016
InceptionResNetV2 ('Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning')55,843,46413,188.64M19.934.902016
PolyNet ('PolyNet: A Pursuit of Structural Diversity in Very Deep Networks')95,366,60034,821.34M19.104.522016
DarkNet Ref ('Darknet: Open source neural networks in C')7,319,416367.59M38.5817.182016
DarkNet Tiny ('Darknet: Open source neural networks in C')1,042,104500.85M40.7417.842016
DarkNet 53 ('Darknet: Open source neural networks in C')41,609,9287,133.86M21.755.642016
SqueezeResNet1.1 ('SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size')1,235,496352.02M40.0918.212016
SqueezeNet1.1 ('SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size')1,235,496352.02M39.3117.722016
ResAttNet-92 ('Residual Attention Network for Image Classification')51.3M?19.54.82017
CondenseNet (G=C=8) ('CondenseNet: An Efficient DenseNet using Learned Group Convolutions')4.8M?26.28.32017
DPN-68 ('Dual Path Networks')12,611,6022,351.84M23.246.792017
ShuffleNet x1.0 (g=1) ('ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices')1,531,936148.13M34.9313.892017
DiracNetV2-18 ('DiracNets: Training Very Deep Neural Networks Without Skip-Connections')11,511,7841,796.62M31.4711.702017
DiracNetV2-34 ('DiracNets: Training Very Deep Neural Networks Without Skip-Connections')21,616,2323,646.93M28.759.932017
SENet-16 ('Squeeze-and-Excitation Networks')31,366,1685,081.30M25.658.202017
SENet-154 ('Squeeze-and-Excitation Networks')115,088,98420,745.78M18.624.612017
MobileNet ('MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications')4,231,976579.80M26.618.952017
NASNet-A 4@1056 ('Learning Transferable Architectures for Scalable Image Recognition')5,289,978584.90M25.688.162017
NASNet-A 6@4032('Learning Transferable Architectures for Scalable Image Recognition')88,753,15023,976.44M18.144.212017
DLA-34 ('Deep Layer Aggregation')15,742,1043,071.37M25.367.942017
AirNet50-1x64d (r=2) ('Attention Inspiring Receptive-Fields Network for Learning Invariant Representations')27.43M?22.486.212018
BAM-ResNet-50 ('BAM: Bottleneck Attention Module')25.92M?23.686.962018
CBAM-ResNet-50 ('CBAM: Convolutional Block Attention Module')28.1M?23.026.382018
1.0-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design')921,816285.82M40.7717.852018
1.5-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design')1,953,616550.97M33.8113.012018
2.0-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design')3,366,344897.60M29.6310.662018
ShuffleNetV2 ('ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design')2,278,604149.72M31.4411.632018
456-MENet-24×1(g=3) ('Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications')5.3M?28.49.82018
FD-MobileNet ('FD-MobileNet: Improved MobileNet with A Fast Downsampling Strategy')2,901,288147.46M34.2313.382018
MobileNetV2 ('MobileNetV2: Inverted Residuals and Linear Bottlenecks')3,504,960329.36M26.978.872018
IGCV3 ('IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks')3.5M?28.229.542018
DARTS ('DARTS: Differentiable Architecture Search')4.9M?26.99.02018
PNASNet-5 ('Progressive Neural Architecture Search')5.1M?25.88.12018
AmoebaNet-C ('Regularized Evolution for Image Classifier Architecture Search')5.1M?24.37.62018
MnasNet ('MnasNet: Platform-Aware Neural Architecture Search for Mobile')4,308,816317.67M31.5811.742018
IBN-Net50-a ('Two at Once: Enhancing Learning andGeneralization Capacities via IBN-Net')??22.546.322018
MarginNet ('Large Margin Deep Networks for Classification')??22.0?2018
A^2 Net ('A^2-Nets: Double Attention Networks')??23.06.52018
FishNeXt-150 ('FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction')26.2M?21.5?2018
Shape-ResNet ('IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS')25.5M?23.286.722019
SimCNN(k=3 train) ('Greedy Layerwise Learning Can Scale to ImageNet')??28.410.22019
SKNet-50 ('Selective Kernel Networks')27.5M?20.79?2019
SRM-ResNet-50 ('SRM : A Style-based Recalibration Module for Convolutional Neural Networks')25.62M?22.876.492019
EfficientNet-B0 ('EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks')5,288,548414.31M24.777.522019
EfficientNet-B7b ('EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks')66,347,96039,010.98M15.943.222019
ProxylessNAS ('PROXYLESSNAS: DIRECT NEURAL ARCHITECTURE SEARCH ON TARGET TASK AND HARDWARE')??24.97.52019
MixNet-L ('MixNet: Mixed Depthwise Convolutional Kernels')7.3M?21.15.82019
ECA-Net50 ('ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks')24.37M3.86G22.526.322019
ECA-Net101 ('ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks')7.3M7.35G21.355.662019
ACNet-Densenet121 ('ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks')??24.187.232019
LIP-ResNet-50 ('LIP: Local Importance-based Pooling')23.9M5.33G21.816.042019
LIP-ResNet-101 ('LIP: Local Importance-based Pooling')42.9M9.06G20.675.402019
LIP-DenseNet-BC-121 ('LIP: Local Importance-based Pooling')8.7M4.13G23.366.842019
MuffNet_1.0 ('MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning')2.3M146M30.1?2019
MuffNet_1.5 ('MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning')3.4M300M26.9?2019
ResNet-34-Bin-5 ('Making Convolutional Networks Shift-Invariant Again')21.8M3,672.68M25.80?2019
ResNet-50-Bin-5 ('Making Convolutional Networks Shift-Invariant Again')25.5M3,877.95M22.96?2019
MobileNetV2-Bin-5 ('Making Convolutional Networks Shift-Invariant Again')3,504,960329.36M27.50?2019
FixRes ResNeXt101 WSL ('Fixing the train-test resolution discrepancy')829M?13.62.02019
Noisy Student*(L2) ('Self-training with Noisy Student improves ImageNet classification')480M?12.61.82019
TResNet-M ('TResNet: High Performance GPU-Dedicated Architecture')29.4M5.5G19.3?2020
DA-NAS-C ('DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search')?467M23.8?2020
ResNeSt-50 ('ResNeSt: Split-Attention Networks')27.5M5.39G18.87?2020
ResNeSt-101 ('ResNeSt: Split-Attention Networks')48.3M10.2G17.73?2020
ResNet-50-FReLU ('Funnel Activation for Visual Recognition')25.5M3.87G22.40?2020
ResNet-101-FReLU ('Funnel Activation for Visual Recognition')44.5M7.6G22.10?2020
ResNet-50-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks')25.6M?19.334.912020
ResNet-50-MEALv2 + CutMix ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks')25.6M?19.024.652020
MobileNet V3-Large-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks')5.48M?23.086.682020
EfficientNet-B0-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks')5.29M?21.716.052020
T2T-ViT-7 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet')4.2M0.6G28.8?2021
T2T-ViT-14 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet')19.4M4.8G19.4?2021
T2T-ViT-19 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet')39.0M8.0G18.8?2021
NFNet-F0 ('High-Performance Large-Scale Image Recognition Without Normalization')71.5M12.38G16.43.22021
NFNet-F1 ('High-Performance Large-Scale Image Recognition Without Normalization')132.6M35.54G15.42.92021
NFNet-F6+SAM ('High-Performance Large-Scale Image Recognition Without Normalization')438.4M377.28G13.52.12021
EfficientNetV2-S ('EfficientNetV2: Smaller Models and Faster Training')24M8.8G16.1?2021
EfficientNetV2-M ('EfficientNetV2: Smaller Models and Faster Training')55M24G14.9?2021
EfficientNetV2-L ('EfficientNetV2: Smaller Models and Faster Training')121M53G14.3?2021
EfficientNetV2-S (21k) ('EfficientNetV2: Smaller Models and Faster Training')24M8.8G15.0?2021
EfficientNetV2-M (21k) ('EfficientNetV2: Smaller Models and Faster Training')55M24G13.9?2021
EfficientNetV2-L (21k) ('EfficientNetV2: Smaller Models and Faster Training')121M53G13.2?2021

Segmentation models

ModelYearPASCAL-ContextCityscapes (mIOU)PASCAL VOC 2012 (mIOU)COCO StuffADE20K VAL (mIOU)
U-Net ('U-Net: Convolutional Networks for Biomedical Image Segmentation')2015?????
DeconvNet ('Learning Deconvolution Network for Semantic Segmentation')2015??72.5??
ParseNet ('ParseNet: Looking Wider to See Better')201540.4?69.8??
Piecewise ('Efficient piecewise training of deep structured models for semantic segmentation')201543.371.678.0??
SegNet ('SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation')2016?56.1???
FCN ('Fully Convolutional Networks for Semantic Segmentation')201637.865.362.222.729.39
ENet ('ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation')2016?58.3???
DilatedNet ('MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS')2016??67.6?32.31
PixelNet ('PixelNet: Towards a General Pixel-Level Architecture')2016??69.8??
RefineNet ('RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation')201647.373.683.433.640.70
LRR ('Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation')2016?71.879.3??
FRRN ('Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes')2016?71.8???
MultiNet ('MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving')2016?????
DeepLab ('DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs')201745.764.879.7??
LinkNet ('LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation')2017?????
Tiramisu ('The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation')2017?????
ICNet ('ICNet for Real-Time Semantic Segmentation on High-Resolution Images')2017?70.6???
ERFNet ('Efficient ConvNet for Real-time Semantic Segmentation')2017?68.0???
PSPNet ('Pyramid Scene Parsing Network')201747.880.285.4?44.94
GCN ('Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network')2017?76.982.2??
Segaware ('Segmentation-Aware Convolutional Networks Using Local Attention Masks')2017??69.0??
PixelDCN ('PIXEL DECONVOLUTIONAL NETWORKS')2017??73.0??
DeepLabv3 ('Rethinking Atrous Convolution for Semantic Image Segmentation')2017??85.7??
DUC, HDC ('Understanding Convolution for Semantic Segmentation')2018?77.1???
ShuffleSeg ('SHUFFLESEG: REAL-TIME SEMANTIC SEGMENTATION NETWORK')2018?59.3???
AdaptSegNet ('Learning to Adapt Structured Output Space for Semantic Segmentation')2018?46.7???
TuSimple-DUC ('Understanding Convolution for Semantic Segmentation')201880.1?83.1??
R2U-Net ('Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation')2018?????
Attention U-Net ('Attention U-Net: Learning Where to Look for the Pancreas')2018?????
DANet ('Dual Attention Network for Scene Segmentation')201852.681.5?39.7?
ENCNet ('Context Encoding for Semantic Segmentation')201851.775.885.9?44.65
ShelfNet ('ShelfNet for Real-time Semantic Segmentation')201848.475.884.2??
LadderNet ('LADDERNET: MULTI-PATH NETWORKS BASED ON U-NET FOR MEDICAL IMAGE SEGMENTATION')2018?????
CCC-ERFnet ('Concentrated-Comprehensive Convolutions for lightweight semantic segmentation')2018?69.01???
DifNet-101 ('DifNet: Semantic Segmentation by Diffusion Networks')201845.1?73.2??
BiSeNet(Res18) ('BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation')2018??74.728.1?
ESPNet ('ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation')2018??63.01??
SPADE ('Semantic Image Synthesis with Spatially-Adaptive Normalization')2019?62.3?37.438.5
SeamlessSeg ('Seamless Scene Segmentation')2019?77.5???
EMANet ('Expectation-Maximization Attention Networks for Semantic Segmentation')2019??88.239.9?

Detection models

ModelYearVOC07 (mAP@IoU=0.5)VOC12 (mAP@IoU=0.5)COCO (mAP)
R-CNN ('Rich feature hierarchies for accurate object detection and semantic segmentation')201458.5??
OverFeat ('OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks')2014???
MultiBox ('Scalable Object Detection using Deep Neural Networks')201429.0??
SPP-Net ('Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition')201459.2??
MR-CNN ('Object detection via a multi-region & semantic segmentation-aware CNN model')201578.273.9?
AttentionNet ('AttentionNet: Aggregating Weak Directions for Accurate Object Detection')2015???
Fast R-CNN ('Fast R-CNN')201570.068.4?
Fast R-CNN ('Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks')201573.270.436.8
YOLO v1 ('You Only Look Once: Unified, Real-Time Object Detection')201666.457.9?
G-CNN ('G-CNN: an Iterative Grid Based Object Detector')201666.866.4?
AZNet ('Adaptive Object Detection Using Adjacency and Zoom Prediction')201670.4?22.3
ION ('Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks')201680.177.933.1
HyperNet ('HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection')201676.371.4?
OHEM ('Training Region-based Object Detectors with Online Hard Example Mining')201678.976.322.4
MPN ('A MultiPath Network for Object Detection')2016??33.2
SSD ('SSD: Single Shot MultiBox Detector')201676.874.931.2
GBDNet ('Crafting GBD-Net for Object Detection')201677.2?27.0
CPF ('Contextual Priming and Feedback for Faster R-CNN')201676.472.6?
MS-CNN ('A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection')2016???
R-FCN ('R-FCN: Object Detection via Region-based Fully Convolutional Networks')201679.577.629.9
PVANET ('PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection')2016???
DeepID-Net ('DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection')201669.0??
NoC ('Object Detection Networks on Convolutional Feature Maps')201671.668.827.2
DSSD ('DSSD : Deconvolutional Single Shot Detector')201781.580.0?
TDM ('Beyond Skip Connections: Top-Down Modulation for Object Detection')2017??37.3
FPN ('Feature Pyramid Networks for Object Detection')2017??36.2
YOLO v2 ('YOLO9000: Better, Faster, Stronger')201778.673.421.6
RON ('RON: Reverse Connection with Objectness Prior Networks for Object Detection')201777.675.4?
DCN ('Deformable Convolutional Networks')2017???
DeNet ('DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling')201777.173.933.8
CoupleNet ('CoupleNet: Coupling Global Structure with Local Parts for Object Detection')201782.780.434.4
RetinaNet ('Focal Loss for Dense Object Detection')2017??39.1
Mask R-CNN ('Mask R-CNN')2017??39.8
DSOD ('DSOD: Learning Deeply Supervised Object Detectors from Scratch')201777.776.3?
SMN ('Spatial Memory for Context Reasoning in Object Detection')201770.0??
YOLO v3 ('YOLOv3: An Incremental Improvement')2018??33.0
SIN ('Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships')201876.073.123.2
STDN ('Scale-Transferrable Object Detection')201880.9??
RefineDet ('Single-Shot Refinement Neural Network for Object Detection')201883.883.541.8
MegDet ('MegDet: A Large Mini-Batch Object Detector')2018???
RFBNet ('Receptive Field Block Net for Accurate and Fast Object Detection')201882.2??
CornerNet ('CornerNet: Detecting Objects as Paired Keypoints')2018??42.1
LibraRetinaNet ('Libra R-CNN: Towards Balanced Learning for Object Detection')2019??43.0
YOLACT-700 ('YOLACT Real-time Instance Segmentation')2019??31.2
DetNASNet(3.8) ('DetNAS: Backbone Search for Object Detection')2019??42.0
YOLOv4 ('YOLOv4: Optimal Speed and Accuracy of Object Detection')2020??46.7
SOLO ('SOLO: Segmenting Objects by Locations')2020??37.8
D-SOLO ('SOLO: Segmenting Objects by Locations')2020??40.5
SNIPER ('Scale Normalized Image Pyramids with AutoFocus for Object Detection')202186.6?47.9
AutoFocus ('Scale Normalized Image Pyramids with AutoFocus for Object Detection')202185.8?47.9