Home

Awesome

Tensorflow realtime_object_detection on Jetson Xavier/TX2/TX1, PC

About this repository

forked from GustavZ/realtime_object_detection: https://github.com/GustavZ/realtime_object_detection
And focused on model split technique of ssd_mobilenet_v1.

Download model from here: tf1_detection_model_zoo

wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz

and here: TensorFlow DeepLab Model Zoo

wget http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz

Support models

Modelmodel_typesplit_shape
ssd_mobilenet_v1_coco_11_06_2017nms_v01917
ssd_mobilenet_v1_coco_2017_11_17nms_v11917
ssd_inception_v2_coco_2017_11_17nms_v11917
ssd_mobilenet_v1_coco_2018_01_28nms_v21917
ssdlite_mobilenet_v2_coco_2018_05_09nms_v21917
ssd_inception_v2_coco_2018_01_28nms_v21917
ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_03nms_v21917
ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_03nms_v21917
ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03nms_v251150
ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03nms_v251150
ssd_mobilenet_v1_ppn_shared_box_predictor_300x300_coco14_sync_2018_07_03nms_v23000
faster_rcnn_inception_v2_coco_2018_01_28faster_v2
faster_rcnn_resnet50_coco_2018_01_28faster_v2
faster_rcnn_resnet101_coco_2018_01_28faster_v2
faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28faster_v2
mask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28mask_v1
mask_rcnn_inception_v2_coco_2018_01_28mask_v1
mask_rcnn_resnet101_atrous_coco_2018_01_28mask_v1
mask_rcnn_resnet50_atrous_coco_2018_01_28mask_v1
deeplabv3_mnv2_pascal_train_aug_2018_01_29deeplab_v3
deeplabv3_mnv2_pascal_trainval_2018_01_29deeplab_v3
deeplabv3_pascal_train_aug_2018_01_04deeplab_v3
deeplabv3_pascal_trainval_2018_01_04deeplab_v3

See also:<br>

Getting Started:

Requirements:

pip install --upgrade pyyaml

Also, OpenCV >= 3.1 and Tensorflow >= 1.4 (1.6 is good)

config.yml

Image

with run_image.py
Please create 'images' directory and put image files.(jpeg,jpg,png)
Subdirectories can also be used.

image_input: 'images'       # input image dir

Movie

with run_video.py

movie_input: 'input.mp4'    # mp4 or avi. Movie file.

Camera

with run_stream.py
This is OpenCV argument.

camera_input: 0
camera_input: 1
camera_input: "nvarguscamerasrc ! video/x-raw(memory:NVMM), width=1280, height=720,format=NV12, framerate=120/1 ! nvvidconv ! video/x-raw,format=I420 ! videoflip method=rotate-180 ! appsink"
camera_input: "nvcamerasrc ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720,format=(string)I420, framerate=(fraction)30/1 ! nvvidconv flip-method=0 ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink"

Save to file

save_to_file: True

Without Visualization

I do not know why, but in TX2 force_gpu_compatible: True it will be faster.

force_gpu_compatible: True
visualize: False
force_gpu_compatible: False
visualize: False

With Visualization

Visualization is heavy. Visualization FPS possible to limit.<br> Display FPS: Detection FPS.<br>

visualize: True
vis_worker: False
max_vis_fps: 0
vis_text: True
visualize: True
vis_worker: False
max_vis_fps: 30
vis_text: True
visualize: True
vis_worker: True
max_vis_fps: 30
vis_text: True
model_type: 'nms_v2'

The difference between 'nms_v1' and 'nms_v2' is BatchMultiClassNonMaxSuppression inputs.<br> model_type: trt_v1 is somewhat special. See config.yml.<br>

# ssd_mobilenet_v1_coco_2018_01_28
model_type: 'nms_v2'
model_path: 'models/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb'
label_path: 'models/labels/mscoco_label_map.pbtxt'
num_classes: 90
learned sizesplit_shape
300x3001917
400x4003309
500x5005118
600x6007326

See also: Learn Split Model

model_type: 'trt_v1'
precision_model: 'FP32'     # 'FP32', 'FP16', 'INT8'
model: 'ssd_inception_v2_coco_2018_01_28'
label_path: 'models/labels/mscoco_label_map.pbtxt'
num_classes: 90

Console Log

FPS:25.8  Frames:130 Seconds: 5.04248   | 1FRAME total: 0.11910   cap: 0.00013   gpu: 0.03837   cpu: 0.02768   lost: 0.05293   send: 0.03834   | VFPS:25.4  VFrames:128 VDrops: 1 

FPS: detection fps. average fps of fps_interval (5sec). <br> Frames: detection frames in fps_interval. <br> Seconds: fps_interval running time. <br>

<hr>

1FRAME<br> total: 1 frame's processing time. 0.1 means delay and 10 fps if it is single-threading(split_model: False). In multi-threading(split_model: True), this value means delay. <br> cap: time of capture camera image and transform for model input. <br> gpu: sess.run() time of gpu part. <br> cpu: sess.run() time of cpu part. <br> lost: time of overhead, something sleep etc. <br> send: time of multi-processing queue, block and pipe time. <br>

<hr>

VFPS: visualization fps. <br> VFrames: visualization frames in fps_interval. <br> VDrops: When multi-processing visualization is bottleneck, drops. <br>

Updates:

My Setup:

NVPMODEL

ModeMode NameDenver 2FrequencyARM A57FrequencyGPU Frequency
0Max-N22.0 GHz42.0 GHz1.30 GHz
1Max-Q041.2 GHz0.85 GHz
2Max-P Core-All21.4 GHz41.4 GHz1.12 GHz
3Max-P ARM042.0 GHz1.12 GHz
4Max-P Denver22.0 GHz01.12 GHz

Max-N

sudo nvpmodel -m 0
sudo ./jetson_clocks.sh

Max-P ARM(Default)

sudo nvpmodel -m 3
sudo ./jetson_clocks.sh

Show current mode

sudo nvpmodel -q --verbose

Current Max Performance of ssd_mobilenet_v1_coco_2018_01_28

FPSMachineSizeSplit ModelVisualizeModeCPUWattAmpereVolt-AmpereModelclasses
227PC160x120TrueFalse-27-33%182W1.82A183VAfrozen_inference_graph.pb90
223PC160x120TrueTrue, Worker 30 FPS Limit-28-36%178W1.77A180VAfrozen_inference_graph.pb90
213PC544x288TrueFalse-49-52%178W1.79A180VAfrozen_inference_graph.pb90
212PC160x120TrueTrue-30-34%179W1.82A183VAfrozen_inference_graph.pb90
207PC544x288TrueTrue, Worker 30 FPS Limit-48-53%178W1.76A178VAfrozen_inference_graph.pb90
190PC544x288TrueTrue-52-58%176W1.80A177VAfrozen_inference_graph.pb90
174PC1280x720TrueFalse-42-49%172W1.72A174VAfrozen_inference_graph.pb90
163PC1280x720TrueTrue, Worker 30 FPS Limit-47-53%170W1.69A170VAfrozen_inference_graph.pb90
153PC1280x720TrueTrue, Worker 60 FPS Limit-51-56%174W1.73A173VAfrozen_inference_graph.pb90
146PC1280x720TrueTrue, Worker No Limit (VFPS:67)-57-61%173W1.70A174VAfrozen_inference_graph.pb90
77PC1280x720TrueTrue-29-35%142W1.43A144VAfrozen_inference_graph.pb90
60Xavier160x120TrueFalseMax-N34-42%31.7W0.53A54.5VAfrozen_inference_graph.pb90
59Xavier544x288TrueFalseMax-N39-45%31.8W0.53A54.4VAfrozen_inference_graph.pb90
58Xavier1280x720TrueFalseMax-N38-48%31.6W0.53A55.1VAfrozen_inference_graph.pb90
54Xavier160x120TrueTrueMax-N39-44%31.4W0.52A54.4VAfrozen_inference_graph.pb90
52Xavier544x288TrueTrueMax-N39-50%31.4W0.55A56.0VAfrozen_inference_graph.pb90
48Xavier1280x720TrueTrueMax-N44-76%32.5W0.54A55.6VAfrozen_inference_graph.pb90
43TX2160x120TrueFalseMax-N65-76%18.6W0.28A29.9VAfrozen_inference_graph.pb90
40TX2544x288TrueFalseMax-N60-77%18.0W0.28A29.8VAfrozen_inference_graph.pb90
38TX21280x720TrueFalseMax-N62-75%17.7W0.27A29.2VAfrozen_inference_graph.pb90
37TX2160x120TrueTrueMax-N5-68%17.7W0.27A28.0VAfrozen_inference_graph.pb90
37TX2160x120TrueFalseMax-P ARM80-86%13.8W0.22A23.0VAfrozen_inference_graph.pb90
37TX2160x120TrueTrueMax-P ARM77-80%14.0W0.22A23.1VAfrozen_inference_graph.pb90
35TX2544x288TrueTrueMax-N20-71%17.0W0.27A27.7VAfrozen_inference_graph.pb90
35TX2544x288TrueFalseMax-P ARM82-86%13.6W0.22A22.8VAfrozen_inference_graph.pb90
34TX21280x720TrueFalseMax-P ARM82-87%13.6W0.21A22.2VAfrozen_inference_graph.pb90
32TX2544x288TrueTrueMax-P ARM79-85%13.4W0.21A22.3VAfrozen_inference_graph.pb90
31TX21280x720TrueTrueMax-N46-75%16.9W0.26A28.1VAfrozen_inference_graph.pb90
27TX1160x120TrueFalse-71-80%17.3W0.27A28.2VAfrozen_inference_graph.pb90
26TX21280x720TrueTrueMax-P ARM78-86%12.6W0.20A21.2VAfrozen_inference_graph.pb90
26TX1544x288TrueFalse-74-82%17.2W0.27A29.0VAfrozen_inference_graph.pb90
26TX1160x120TrueTrue-69-81%17.1W0.27A28.7VAfrozen_inference_graph.pb90
24TX11280x720TrueFalse-73-80%17.6W0.27A29.3VA6frozen_inference_graph.pb90
23TX1544x288TrueTrue-77-82%16.7W0.27A28.2VAfrozen_inference_graph.pb90
19TX11280x720TrueTrue-78-86%15.8W0.26A26.7VAfrozen_inference_graph.pb90

on Xavier 544x288:<br> <br> on PC 544x288:<br> <br> on TX2 544x288:<br> <br>

Youtube

Robot Car and Realtime Object Detection

TX2

Object Detection vs Semantic Segmentation on TX2

TX2

Realtime Object Detection on TX2

TX2

Realtime Object Detection on TX1

TX1

Movie's FPS is little bit slow down. Because run ssd_movilenet_v1 with desktop capture.<br> Capture command:<br>

gst-launch-1.0 -v ximagesrc use-damage=0 ! nvvidconv ! 'video/x-raw(memory:NVMM),alignment=(string)au,format=(string)I420,framerate=(fraction)25/1,pixel-aspect-ratio=(fraction)1/1' ! omxh264enc !  'video/x-h264,stream-format=(string)byte-stream' ! h264parse ! avimux ! filesink location=capture.avi

Training ssd_mobilenet with own data

https://github.com/naisy/train_ssd_mobilenet

Multi-Threading for Realtime Object Detection

Multi-Threading for Realtime Object Detection

Learn Split Model

Learn Split Model