Home

Awesome

GPU accelerated TensorFlow Lite / TensorRT applications.

TFLite-2.7

This repository contains several applications which invoke DNN inference with TensorFlow Lite GPU Delegate or TensorRT.

Target platform: Linux PC / NVIDIA Jetson / RaspberryPi.

1. Applications

Blazeface

DBFace

Age Gender Estimation

Image Classification

Object Detection

Facemesh

Hair Segmentation

3D Handpose

Iris Detection

3D Object Detection

Blazepose

Posenet

3D Human Pose Estimation

Depth Estimation (DenseDepth)

Semantic Segmentation

Face Segmentation

Selfie to Anime

Anime GAN

U^2-Net portrait drawing

Artistic Style Transfer

MIRNet

Boundless

Text Detection

2. How to Build & Run

<a name="build_for_x86_64">2.1. Build for x86_64 Linux</a>

2.1.1. setup environment
$ sudo apt install libgles2-mesa-dev 
$ mkdir ~/work
$ mkdir ~/lib
$
$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh
2.1.2. build TensorFlow Lite library.
$ cd ~/work 
$ git clone https://github.com/terryky/tflite_gles_app.git
$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4.sh

(Tensorflow configure will start after a while. Please enter according to your environment)

$
$ ln -s tensorflow_r2.4 ./tensorflow
$
$ cp ./tensorflow/bazel-bin/tensorflow/lite/libtensorflowlite.so ~/lib
$ cp ./tensorflow/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so ~/lib
2.1.3. build an application.
$ cd ~/work/tflite_gles_app/gl2handpose
$ make -j4
2.1.4. run an application.
$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
$ cd ~/work/tflite_gles_app/gl2handpose
$ ./gl2handpose

<a name="build_for_aarch64">2.2. Build for aarch64 Linux (Jetson Nano, Raspberry Pi)</a>

2.2.1. build TensorFlow Lite library on Host PC.
(HostPC)$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$
(HostPC)$ mkdir ~/work
(HostPC)$ cd ~/work 
(HostPC)$ git clone https://github.com/terryky/tflite_gles_app.git
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4_aarch64.sh

# If you want to build XNNPACK-enabled TensorFlow Lite, use the following script.
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4_with_xnnpack_aarch64.sh

(Tensorflow configure will start after a while. Please enter according to your environment)
2.2.2. copy Tensorflow Lite libraries to target Jetson / Raspi.
(HostPC)scp ~/work/tensorflow_r2.4/bazel-bin/tensorflow/lite/libtensorflowlite.so jetson@192.168.11.11:/home/jetson/lib
(HostPC)scp ~/work/tensorflow_r2.4/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so jetson@192.168.11.11:/home/jetson/lib
2.2.3. clone Tensorflow repository on target Jetson / Raspi.
(Jetson/Raspi)$ cd ~/work
(Jetson/Raspi)$ git clone -b r2.4 https://github.com/tensorflow/tensorflow.git
(Jetson/Raspi)$ cd tensorflow
(Jetson/Raspi)$ ./tensorflow/lite/tools/make/download_dependencies.sh
2.2.4. build an application.
(Jetson/Raspi)$ sudo apt install libgles2-mesa-dev libdrm-dev
(Jetson/Raspi)$ cd ~/work 
(Jetson/Raspi)$ git clone https://github.com/terryky/tflite_gles_app.git
(Jetson/Raspi)$ cd ~/work/tflite_gles_app/gl2handpose

# on Jetson
(Jetson)$ make -j4 TARGET_ENV=jetson_nano TFLITE_DELEGATE=GPU_DELEGATEV2

# on Raspberry pi without GPUDelegate (recommended)
(Raspi )$ make -j4 TARGET_ENV=raspi4

# on Raspberry pi with GPUDelegate (low performance)
(Raspi )$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=GPU_DELEGATEV2

# on Raspberry pi with XNNPACK
(Raspi )$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=XNNPACK
2.2.5. run an application.
(Jetson/Raspi)$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
(Jetson/Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Jetson/Raspi)$ ./gl2handpose
about VSYNC

On Jetson Nano, display sync to vblank (VSYNC) is enabled to avoid the tearing by default . To enable/disable VSYNC, run app with the following command.

# enable VSYNC (default).
(Jetson)$ export __GL_SYNC_TO_VBLANK=1; ./gl2handpose

# disable VSYNC. framerate improves, but tearing occurs.
(Jetson)$ export __GL_SYNC_TO_VBLANK=0; ./gl2handpose

<a name="build_for_armv7l">2.3 Build for armv7l Linux (Raspberry Pi)</a>

2.3.1. build TensorFlow Lite library on Host PC.
(HostPC)$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$
(HostPC)$ mkdir ~/work
(HostPC)$ cd ~/work 
(HostPC)$ git clone https://github.com/terryky/tflite_gles_app.git
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.3/build_libtflite_r2.3_armv7l.sh

(Tensorflow configure will start after a while. Please enter according to your environment)
2.3.2. copy Tensorflow Lite libraries to target Raspberry Pi.
(HostPC)scp ~/work/tensorflow_r2.3/bazel-bin/tensorflow/lite/libtensorflowlite.so pi@192.168.11.11:/home/pi/lib
(HostPC)scp ~/work/tensorflow_r2.3/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so pi@192.168.11.11:/home/pi/lib
2.3.3. setup environment on Raspberry Pi.
(Raspi)$ sudo apt install libgles2-mesa-dev libegl1-mesa-dev xorg-dev
(Raspi)$ sudo apt update
(Raspi)$ sudo apt upgrade
2.3.4. clone Tensorflow repository on target Raspi.
(Raspi)$ cd ~/work
(Raspi)$ git clone -b r2.3 https://github.com/tensorflow/tensorflow.git
(Raspi)$ cd tensorflow
(Raspi)$ ./tensorflow/lite/tools/make/download_dependencies.sh
2.3.5. build an application on target Raspi..
(Raspi)$ cd ~/work 
(Raspi)$ git clone https://github.com/terryky/tflite_gles_app.git
(Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Raspi)$ make -j4 TARGET_ENV=raspi4  #disable GPUDelegate. (recommended)

#enable GPUDelegate. but it cause low performance on Raspi4.
(Raspi)$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=GPU_DELEGATEV2
2.3.6. run an application on target Raspi..
(Raspi)$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
(Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Raspi)$ ./gl2handpose

for more detail infomation, please refer this article.

3. About Input video stream

Both Live camera and video file are supported as input methods.

<a name="uvc_camera">3.1. Live UVC Camera (default)</a>

<img src="gl2handpose/gl2handpose_mov.gif" width="500">
(Target)$ sudo apt-get install v4l-utils

# confirm current resolution settings
(Target)$ v4l2-ctl --all

# query available resolutions
(Target)$ v4l2-ctl --list-formats-ext

# set capture resolution (160x120)
(Target)$ v4l2-ctl --set-fmt-video=width=160,height=120

# set capture resolution (640x480)
(Target)$ v4l2-ctl --set-fmt-video=width=640,height=480
-------------------------------
 capture_devie  : /dev/video0
 capture_devtype: V4L2_CAP_VIDEO_CAPTURE
 capture_buftype: V4L2_BUF_TYPE_VIDEO_CAPTURE
 capture_memtype: V4L2_MEMORY_MMAP
 WH(640, 480), 4CC(MJPG), bpl(0), size(341333)
-------------------------------
ERR: camera_capture.c(87): pixformat(MJPG) is not supported.
ERR: camera_capture.c(87): pixformat(MJPG) is not supported.
...

please try to change your camera settings to use YUYV pixelformat like following command :

$ sudo apt-get install v4l-utils
$ v4l2-ctl --set-fmt-video=width=640,height=480,pixelformat=YUYV --set-parm=30
$ ./gl2handpose -x

<a name="video_file">3.2 Recorded Video file</a>

# setup dependent libralies.
(Target)$ sudo apt install libavcodec-dev libavdevice-dev libavfilter-dev libavformat-dev libavresample-dev libavutil-dev

# build an app with ENABLE_VDEC options
(Target)$ cd ~/work/tflite_gles_app/gl2facemesh
(Target)$ make -j4 ENABLE_VDEC=true

# run an app with a video file name as an argument.
(Target)$ ./gl2facemesh -v assets/sample_video.mp4

4. Tested platforms

You can select the platform by editing Makefile.env.

5. Performance of inference [ms]

Blazeface

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp321010
TensorFlow LiteCPU int877
TensorFlow Lite GPU DelegateGPU fp167010
TensorRTGPU fp16--?

Classification (mobilenet_v1_1.0_224)

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp326950
TensorFlow LiteCPU int82829
TensorFlow Lite GPU DelegateGPU fp1636037
TensorRTGPU fp16--19

Object Detection (ssd_mobilenet_v1_coco)

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp32150113
TensorFlow LiteCPU int86264
TensorFlow Lite GPU DelegateGPU fp1698090
TensorRTGPU fp16--32

Facemesh

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp322930
TensorFlow LiteCPU int82427
TensorFlow Lite GPU DelegateGPU fp1615020
TensorRTGPU fp16--?

Hair Segmentation

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp32410400
TensorFlow LiteCPU int8??
TensorFlow Lite GPU DelegateGPU fp1627030
TensorRTGPU fp16--?

3D Handpose

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp3211685
TensorFlow LiteCPU int88087
TensorFlow Lite GPU DelegateGPU fp1688090
TensorRTGPU fp16--?

3D Object Detection

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp32470302
TensorFlow LiteCPU int8248249
TensorFlow Lite GPU DelegateGPU fp161990235
TensorRTGPU fp16--108

Posenet (posenet_mobilenet_v1_100_257x257)

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp329270
TensorFlow LiteCPU int85355
TensorFlow Lite GPU DelegateGPU fp1680380
TensorRTGPU fp16--18

Semantic Segmentation (deeplabv3_257)

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp3210880
TensorFlow LiteCPU int8??
TensorFlow Lite GPU DelegateGPU fp1679090
TensorRTGPU fp16--?

Selfie to Anime

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp32?7700
TensorFlow LiteCPU int8??
TensorFlow Lite GPU DelegateGPU fp16??
TensorRTGPU fp16--?

Artistic Style Transfer

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp321830950
TensorFlow LiteCPU int8??
TensorFlow Lite GPU DelegateGPU fp162440215
TensorRTGPU fp16--?

Text Detection (east_text_detection_320x320)

FrameworkPrecisionRaspberry Pi 4 <br> [ms]Jetson nano <br> [ms]
TensorFlow LiteCPU fp321020680
TensorFlow LiteCPU int8378368
TensorFlow Lite GPU DelegateGPU fp164665388
TensorRTGPU fp16--?

6. Related Articles

7. Acknowledgements