Awesome
Computer Vision Pretrained Models
What is pre-trained Model?
A pre-trained model is a model created by some one else to solve a similar problem. Instead of building a model from scratch to solve a similar problem, we can use the model trained on other problem as a starting point. A pre-trained model may not be 100% accurate in your application.
For example, if you want to build a self learning car. You can spend years to build a decent image recognition algorithm from scratch or you can take inception model (a pre-trained model) from Google which was built on ImageNet data to identify images in those pictures.
Other Pre-trained Models
Model Deployment library
Framework
Model visualization
You can see visualizations of each model's network architecture by using Netron.
Tensorflow <a name="tensorflow"/>
Model Name | Description | Framework | License |
---|---|---|---|
ObjectDetection | Localizing and identifying multiple objects in a single image. | Tensorflow | Apache License |
Mask R-CNN | The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone. | Tensorflow | The MIT License (MIT) |
Faster-RCNN | This is an experimental Tensorflow implementation of Faster RCNN - a convnet for object detection with a region proposal network. | Tensorflow | MIT License |
YOLO TensorFlow | This is tensorflow implementation of the YOLO:Real-Time Object Detection. | Tensorflow | Custom |
YOLO TensorFlow ++ | TensorFlow implementation of 'YOLO: Real-Time Object Detection', with training and an actual support for real-time running on mobile devices. | Tensorflow | GNU GENERAL PUBLIC LICENSE |
MobileNet | MobileNets trade off between latency, size and accuracy while comparing favorably with popular models from the literature. | Tensorflow | The MIT License (MIT) |
DeepLab | Deep labeling for semantic image segmentation. | Tensorflow | Apache License |
Colornet | Neural Network to colorize grayscale images. | Tensorflow | Not Found |
SRGAN | Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. | Tensorflow | Not Found |
DeepOSM | Train TensorFlow neural nets with OpenStreetMap features and satellite imagery. | Tensorflow | The MIT License (MIT) |
Domain Transfer Network | Implementation of Unsupervised Cross-Domain Image Generation. | Tensorflow | MIT License |
Show, Attend and Tell | Attention Based Image Caption Generator. | Tensorflow | MIT License |
android-yolo | Real-time object detection on Android using the YOLO network, powered by TensorFlow. | Tensorflow | Apache License |
DCSCN Super Resolution | This is a tensorflow implementation of "Fast and Accurate Image Super Resolution by Deep CNN with Skip Connection and Network in Network", a deep learning based Single-Image Super-Resolution (SISR) model. | Tensorflow | Not Found |
GAN-CLS | This is an experimental tensorflow implementation of synthesizing images. | Tensorflow | Not Found |
U-Net | For Brain Tumor Segmentation. | Tensorflow | Not Found |
Improved CycleGAN | Unpaired Image to Image Translation. | Tensorflow | MIT License |
Im2txt | Image-to-text neural network for image captioning. | Tensorflow | Apache License |
SLIM | Image classification models in TF-Slim. | Tensorflow | Apache License |
DELF | Deep local features for image matching and retrieval. | Tensorflow | Apache License |
Compression | Compressing and decompressing images using a pre-trained Residual GRU network. | Tensorflow | Apache License |
AttentionOCR | A model for real-world image text extraction. | Tensorflow | Apache License |
Keras <a name="keras"/>
Model Name | Description | Framework | License |
---|---|---|---|
Mask R-CNN | The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone. | Keras | The MIT License (MIT) |
VGG16 | Very Deep Convolutional Networks for Large-Scale Image Recognition. | Keras | The MIT License (MIT) |
VGG19 | Very Deep Convolutional Networks for Large-Scale Image Recognition. | Keras | The MIT License (MIT) |
ResNet | Deep Residual Learning for Image Recognition. | Keras | The MIT License (MIT) |
ResNet50 | Deep Residual Learning for Image Recognition. | Keras | The MIT License (MIT) |
Nasnet | NASNet refers to Neural Architecture Search Network, a family of models that were designed automatically by learning the model architectures directly on the dataset of interest. | Keras | The MIT License (MIT) |
MobileNet | MobileNet v1 models for Keras. | Keras | The MIT License (MIT) |
MobileNet V2 | MobileNet v2 models for Keras. | Keras | The MIT License (MIT) |
MobileNet V3 | MobileNet v3 models for Keras. | Keras | The MIT License (MIT) |
efficientnet | Rethinking Model Scaling for Convolutional Neural Networks. | Keras | The MIT License (MIT) |
Image analogies | Generate image analogies using neural matching and blending. | Keras | The MIT License (MIT) |
Popular Image Segmentation Models | Implementation of Segnet, FCN, UNet and other models in Keras. | Keras | MIT License |
Ultrasound nerve segmentation | This tutorial shows how to use Keras library to build deep neural network for ultrasound image nerve segmentation. | Keras | MIT License |
DeepMask object segmentation | This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks. | Keras | Not Found |
Monolingual and Multilingual Image Captioning | This is the source code that accompanies Multilingual Image Description with Neural Sequence Models. | Keras | BSD-3-Clause License |
pix2pix | Keras implementation of Image-to-Image Translation with Conditional Adversarial Networks by Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. | Keras | Not Found |
Colorful Image colorization | B&W to color. | Keras | Not Found |
CycleGAN | Implementation of Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. | Keras | MIT License |
DualGAN | Implementation of DualGAN: Unsupervised Dual Learning for Image-to-Image Translation. | Keras | MIT License |
Super-Resolution GAN | Implementation of Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. | Keras | MIT License |
PyTorch <a name="pytorch"/>
Model Name | Description | Framework | License |
---|---|---|---|
detectron2 | Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms | PyTorch | Apache License 2.0 |
FastPhotoStyle | A Closed-form Solution to Photorealistic Image Stylization. | PyTorch | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public Licens |
pytorch-CycleGAN-and-pix2pix | A Closed-form Solution to Photorealistic Image Stylization. | PyTorch | BSD License |
maskrcnn-benchmark | Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch. | PyTorch | MIT License |
deep-image-prior | Image restoration with neural networks but without learning. | PyTorch | Apache License 2.0 |
StarGAN | StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. | PyTorch | MIT License |
faster-rcnn.pytorch | This project is a faster faster R-CNN implementation, aimed to accelerating the training of faster R-CNN object detection models. | PyTorch | MIT License |
pix2pixHD | Synthesizing and manipulating 2048x1024 images with conditional GANs. | PyTorch | BSD License |
Augmentor | Image augmentation library in Python for machine learning. | PyTorch | MIT License |
albumentations | Fast image augmentation library. | PyTorch | MIT License |
Deep Video Analytics | Deep Video Analytics is a platform for indexing and extracting information from videos and images | PyTorch | Custom |
semantic-segmentation-pytorch | Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset. | PyTorch | BSD 3-Clause License |
An End-to-End Trainable Neural Network for Image-based Sequence Recognition | This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC loss for image-based sequence recognition tasks, such as scene text recognition and OCR. | PyTorch | The MIT License (MIT) |
UNIT | PyTorch Implementation of our Coupled VAE-GAN algorithm for Unsupervised Image-to-Image Translation. | PyTorch | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License |
Neural Sequence labeling model | Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation. | PyTorch | Apache License |
faster rcnn | This is a PyTorch implementation of Faster RCNN. This project is mainly based on py-faster-rcnn and TFFRCNN. For details about R-CNN please refer to the paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks by Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. | PyTorch | MIT License |
pytorch-semantic-segmentation | PyTorch for Semantic Segmentation. | PyTorch | MIT License |
EDSR-PyTorch | PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution'. | PyTorch | MIT License |
image-classification-mobile | Collection of classification models pretrained on the ImageNet-1K. | PyTorch | MIT License |
FaderNetworks | Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017. | PyTorch | Creative Commons Attribution-NonCommercial 4.0 International Public License |
neuraltalk2-pytorch | Image captioning model in pytorch (finetunable cnn in branch with_finetune). | PyTorch | MIT License |
RandWireNN | Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition". | PyTorch | Not Found |
stackGAN-v2 | Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++. | PyTorch | MIT License |
Detectron models for Object Detection | This code allows to use some of the Detectron models for object detection from Facebook AI Research with PyTorch. | PyTorch | Apache License |
DEXTR-PyTorch | This paper explores the use of extreme points in an object (left-most, right-most, top, bottom pixels) as input to obtain precise object segmentation for images and videos. | PyTorch | GNU GENERAL PUBLIC LICENSE |
pointnet.pytorch | Pytorch implementation for "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. | PyTorch | MIT License |
self-critical.pytorch | This repository includes the unofficial implementation Self-critical Sequence Training for Image Captioning and Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. | PyTorch | MIT License |
vnet.pytorch | A Pytorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. | PyTorch | BSD 3-Clause License |
piwise | Pixel-wise segmentation on VOC2012 dataset using pytorch. | PyTorch | BSD 3-Clause License |
pspnet-pytorch | PyTorch implementation of PSPNet segmentation network. | PyTorch | Not Found |
pytorch-SRResNet | Pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. | PyTorch | The MIT License (MIT) |
PNASNet.pytorch | PyTorch implementation of PNASNet-5 on ImageNet. | PyTorch | Apache License |
img_classification_pk_pytorch | Quickly comparing your image classification models with the state-of-the-art models. | PyTorch | Not Found |
Deep Neural Networks are Easily Fooled | High Confidence Predictions for Unrecognizable Images. | PyTorch | MIT License |
pix2pix-pytorch | PyTorch implementation of "Image-to-Image Translation Using Conditional Adversarial Networks". | PyTorch | Not Found |
NVIDIA/semantic-segmentation | A PyTorch Implementation of Improving Semantic Segmentation via Video Propagation and Label Relaxation, In CVPR2019. | PyTorch | CC BY-NC-SA 4.0 license |
Neural-IMage-Assessment | A PyTorch Implementation of Neural IMage Assessment. | PyTorch | Not Found |
torchxrayvision | Pretrained models for chest X-ray (CXR) pathology predictions. Medical, Healthcare, Radiology | PyTorch | Apache License |
pytorch-image-models | PyTorch image models, scripts, pretrained weights -- (SE)ResNet/ResNeXT, DPN, EfficientNet, MixNet, MobileNet-V3/V2, MNASNet, Single-Path NAS, FBNet, and more | PyTorch | Apache License 2.0 |
Caffe <a name="caffe"/>
Model Name | Description | Framework | License |
---|---|---|---|
OpenPose | OpenPose represents the first real-time multi-person system to jointly detect human body, hand, and facial keypoints (in total 130 keypoints) on single images. | Caffe | Custom |
Fully Convolutional Networks for Semantic Segmentation | Fully Convolutional Models for Semantic Segmentation. | Caffe | Not Found |
Colorful Image Colorization | Colorful Image Colorization. | Caffe | BSD-2-Clause License |
R-FCN | R-FCN: Object Detection via Region-based Fully Convolutional Networks. | Caffe | MIT License |
cnn-vis | Inspired by Google's recent Inceptionism blog post, cnn-vis is an open-source tool that lets you use convolutional neural networks to generate images. | Caffe | The MIT License (MIT) |
DeconvNet | Learning Deconvolution Network for Semantic Segmentation. | Caffe | Custom |
MXNet <a name="mxnet"/>
Model Name | Description | Framework | License |
---|---|---|---|
Faster RCNN | Region Proposal Network solves object detection as a regression problem. | MXNet | Apache License, Version 2.0 |
SSD | SSD is an unified framework for object detection with a single network. | MXNet | MIT License |
Faster RCNN+Focal Loss | The code is unofficial version for focal loss for Dense Object Detection. | MXNet | Not Found |
CNN-LSTM-CTC | I realize three different models for text recognition, and all of them consist of CTC loss layer to realize no segmentation for text images. | MXNet | Not Found |
Faster_RCNN_for_DOTA | This is the official repo of paper DOTA: A Large-scale Dataset for Object Detection in Aerial Images. | MXNet | Apache License |
RetinaNet | Focal loss for Dense Object Detection. | MXNet | Not Found |
MobileNetV2 | This is a MXNet implementation of MobileNetV2 architecture as described in the paper Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. | MXNet | Apache License |
neuron-selectivity-transfer | This code is a re-implementation of the imagenet classification experiments in the paper Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. | MXNet | Apache License |
MobileNetV2 | This is a Gluon implementation of MobileNetV2 architecture as described in the paper Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. | MXNet | Apache License |
sparse-structure-selection | This code is a re-implementation of the imagenet classification experiments in the paper Data-Driven Sparse Structure Selection for Deep Neural Networks. | MXNet | Apache License |
FastPhotoStyle | A Closed-form Solution to Photorealistic Image Stylization. | MXNet | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License |
Contributions
Your contributions are always welcome!!
Please have a look at contributing.md