Awesome

blob# MTCNN_TensorRT

MTCNN Face detection algorithm's C++ implementation with NVIDIA TensorRT Inference acceleration SDK.

This repository is based on https://github.com/AlphaQi/MTCNN-light.git

Notations

2018/11/14: I have ported most of the computing to GPU using OpenCV CUDA warper and CUDA kernels wrote by myself. See branch all_gpu for more details, note that you need opencv 3.0+ built with CUDA support to run the projects. The speed is about 5-10 times faster on my GTX1080 GPU than master branch.

2018/10/2: Good news! Now you can run the whole MTCNN using TenorRT 3.0 or 4.0!

I adopt the original models from offical project https://github.com/kpzhang93/MTCNN_face_detection_alignment and do the following modifications: Considering TensorRT don't support PRelu layer, which is widely used in MTCNN, one solution is to add Plugin Layer (costome layer) but experiments show that this method breaks the CBR process in TensorRT and is very slow. I use Relu layer, Scale layer and ElementWise addition Layer to replace Prelu (as illustrated below), which only adds a bit of computation and won't affect CBR process, the weights of scale layers derive from original Prelu layers.

modification

Required environments

OpenCV (on ubuntu just run sudo apt-get install libopencv-dev to install opencv)
CUDA 9.0
TensorRT 3.04 or TensorRT 4.16 (I only test these two versions)
Cmake >=3.5
A digital camera to run camera test.

Build

Replace the tensorrt and cuda path in CMakeLists.txt
Configure the detection parameters in mtcnn.cpp (min face size, the nms thresholds , etc)
Choose the running modes (camera test or single image test)
cmake .
make -j
./main

Results

The result will be like this in single image test mode:

single

Speed

On my computer with nvidia-gt730 grapic card (its performance is very very poor) and intel i5 6500 cpu, when the min face-size is set to 60 pixels, the above image costs 20 to 30ms.

TODO

Inplement the whole processing using GPU computing.