Home

Awesome

TensorRTx

TensorRTx aims to implement popular deep learning networks with TensorRT network definition API.

Why don't we use a parser (ONNX parser, UFF parser, caffe parser, etc), but use complex APIs to build a network from scratch? I have summarized the advantages in the following aspects.

The basic workflow of TensorRTx is:

  1. Get the trained models from pytorch, mxnet or tensorflow, etc. Some pytorch models can be found in my repo pytorchx, the remaining are from popular open-source repos.
  2. Export the weights to a plain text file -- .wts file.
  3. Load weights in TensorRT, define the network, build a TensorRT engine.
  4. Load the TensorRT engine and run inference.

News

Tutorials

Test Environment

  1. TensorRT 7.x
  2. TensorRT 8.x(Some of the models support 8.x)

How to run

Each folder has a readme inside, which explains how to run the models inside.

Models

Following models are implemented.

NameDescription
mlpthe very basic model for starters, properly documented
lenetthe simplest, as a "hello world" of this project
alexneteasy to implement, all layers are supported in tensorrt
googlenetGoogLeNet (Inception v1)
inceptionInception v3, v4
mnasnetMNASNet with depth multiplier of 0.5 from the paper
mobilenetMobileNet v2, v3-small, v3-large
resnetresnet-18, resnet-50 and resnext50-32x4d are implemented
senetse-resnet50
shufflenetShuffleNet v2 with 0.5x output channels
squeezenetSqueezeNet 1.1 model
vggVGG 11-layer model
yolov3-tinyweights and pytorch implementation from ultralytics/yolov3
yolov3darknet-53, weights and pytorch implementation from ultralytics/yolov3
yolov3-sppdarknet-53, weights and pytorch implementation from ultralytics/yolov3
yolov4CSPDarknet53, weights from AlexeyAB/darknet, pytorch implementation from ultralytics/yolov3
yolov5yolov5 v1.0-v7.0 of ultralytics/yolov5, detection, classification and instance segmentation
yolov7yolov7 v0.1, pytorch implementation from WongKinYiu/yolov7
yolov8yolov8, pytorch implementation from ultralytics/ultralytics
yolov9The Pytorch implementation is WongKinYiu/yolov9.
yolopyolop, pytorch implementation from hustvl/YOLOP
retinafaceresnet50 and mobilnet0.25, weights from biubug6/Pytorch_Retinaface
arcfaceLResNet50E-IR, LResNet100E-IR and MobileFaceNet, weights from deepinsight/insightface
retinafaceAntiCovmobilenet0.25, weights from deepinsight/insightface, retinaface anti-COVID-19, detect face and mask attribute
dbnetScene Text Detection, weights from BaofengZan/DBNet.pytorch
crnnpytorch implementation from meijieru/crnn.pytorch
ufldpytorch implementation from Ultra-Fast-Lane-Detection, ECCV2020
hrnethrnet-image-classification and hrnet-semantic-segmentation, pytorch implementation from HRNet-Image-Classification and HRNet-Semantic-Segmentation
psenetPSENet Text Detection, tensorflow implementation from liuheng92/tensorflow_PSENet
ibnnetIBN-Net, pytorch implementation from XingangPan/IBN-Net, ECCV2018
unetU-Net, pytorch implementation from milesial/Pytorch-UNet
repvggRepVGG, pytorch implementation from DingXiaoH/RepVGG
lprnetLPRNet, pytorch implementation from xuexingyu24/License_Plate_Detection_Pytorch
refinedetRefineDet, pytorch implementation from luuuyi/RefineDet.PyTorch
densenetDenseNet-121, from torchvision.models
rcnnFasterRCNN and MaskRCNN, model from detectron2
tsmTSM: Temporal Shift Module for Efficient Video Understanding, ICCV2019
scaled-yolov4yolov4-csp, pytorch from WongKinYiu/ScaledYOLOv4
centernetCenterNet DLA-34, pytorch from xingyizhou/CenterNet
efficientnetEfficientNet b0-b8 and l2, pytorch from lukemelas/EfficientNet-PyTorch
detrDE⫶TR, pytorch from facebookresearch/detr
swin-transformerSwin Transformer - Semantic Segmentation, only support Swin-T. The Pytorch implementation is microsoft/Swin-Transformer
real-esrganReal-ESRGAN. The Pytorch implementation is real-esrgan
superpointSuperPoint. The Pytorch model is from magicleap/SuperPointPretrainedNetwork
csrnetCSRNet. The Pytorch implementation is leeyeehoo/CSRNet-pytorch
EfficientAdEfficientAd: Accurate Visual Anomaly Detection at Millisecond-Level Latencies. From anomalib

Model Zoo

The .wts files can be downloaded from model zoo for quick evaluation. But it is recommended to convert .wts from pytorch/mxnet/tensorflow model, so that you can retrain your own model.

GoogleDrive | BaiduPan pwd: uvv2

Tricky Operations

Some tricky operations encountered in these models, already solved, but might have better solutions.

NameDescription
BatchNormImplement by a scale layer, used in resnet, googlenet, mobilenet, etc.
MaxPool2d(ceil_mode=True)use a padding layer before maxpool to solve ceil_mode=True, see googlenet.
average pool with paddinguse setAverageCountExcludesPadding() when necessary, see inception.
relu6use Relu6(x) = Relu(x) - Relu(x-6), see mobilenet.
torch.chunk()implement the 'chunk(2, dim=C)' by tensorrt plugin, see shufflenet.
channel shuffleuse two shuffle layers to implement channel_shuffle, see shufflenet.
adaptive pooluse fixed input dimension, and use regular average pooling, see shufflenet.
leaky reluI wrote a leaky relu plugin, but PRelu in NvInferPlugin.h can be used, see yolov3 in branch trt4.
yolo layer v1yolo layer is implemented as a plugin, see yolov3 in branch trt4.
yolo layer v2three yolo layers implemented in one plugin, see yolov3-spp.
upsamplereplaced by a deconvolution layer, see yolov3.
hsigmoidhard sigmoid is implemented as a plugin, hsigmoid and hswish are used in mobilenetv3
retinaface output decodeimplement a plugin to decode bbox, confidence and landmarks, see retinaface.
mishmish activation is implemented as a plugin, mish is used in yolov4
prelumxnet's prelu activation with trainable gamma is implemented as a plugin, used in arcface
HardSwishhard_swish = x * hard_sigmoid, used in yolov5 v3.0
LSTMImplemented pytorch nn.LSTM() with tensorrt api

Speed Benchmark

ModelsDeviceBatchSizeModeInput Shape(HxW)FPS
YOLOv3-tinyXeon E5-2620/GTX10801FP32608x608333
YOLOv3(darknet53)Xeon E5-2620/GTX10801FP32608x60839.2
YOLOv3(darknet53)Xeon E5-2620/GTX10801INT8608x60871.4
YOLOv3-spp(darknet53)Xeon E5-2620/GTX10801FP32608x60838.5
YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10801FP32608x60835.7
YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10804FP32608x60840.9
YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10808FP32608x60841.3
YOLOv5-s v3.0Xeon E5-2620/GTX10801FP32608x608142
YOLOv5-s v3.0Xeon E5-2620/GTX10804FP32608x608173
YOLOv5-s v3.0Xeon E5-2620/GTX10808FP32608x608190
YOLOv5-m v3.0Xeon E5-2620/GTX10801FP32608x60871
YOLOv5-l v3.0Xeon E5-2620/GTX10801FP32608x60843
YOLOv5-x v3.0Xeon E5-2620/GTX10801FP32608x60829
YOLOv5-s v4.0Xeon E5-2620/GTX10801FP32608x608142
YOLOv5-m v4.0Xeon E5-2620/GTX10801FP32608x60871
YOLOv5-l v4.0Xeon E5-2620/GTX10801FP32608x60840
YOLOv5-x v4.0Xeon E5-2620/GTX10801FP32608x60827
RetinaFace(resnet50)Xeon E5-2620/GTX10801FP32480x64090
RetinaFace(resnet50)Xeon E5-2620/GTX10801INT8480x640204
RetinaFace(mobilenet0.25)Xeon E5-2620/GTX10801FP32480x640417
ArcFace(LResNet50E-IR)Xeon E5-2620/GTX10801FP32112x112333
CRNNXeon E5-2620/GTX10801FP3232x1001000

Help wanted, if you got speed results, please add an issue or PR.

Acknowledgments & Contact

Any contributions, questions and discussions are welcomed, contact me by following info.

E-mail: wangxinyu_es@163.com

WeChat ID: wangxinyu0375 (可加我微信进tensorrtx交流群,备注:tensorrtx)