Home

Awesome

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">

Deploying Deep Learning

Welcome to our instructional guide for inference and realtime vision DNN library for NVIDIA Jetson devices. This project uses TensorRT to run optimized networks on GPUs from C++ or Python, and PyTorch for training models.

Supported DNN vision primitives include imageNet for image classification, detectNet for object detection, segNet for semantic segmentation, poseNet for pose estimation, and actionNet for action recognition. Examples are provided for streaming from live camera feeds, making webapps with WebRTC, and support for ROS/ROS2.

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-primitives.jpg">

Follow the Hello AI World tutorial for running inference and transfer learning onboard your Jetson, including collecting your own datasets, training your own models with PyTorch, and deploying them with TensorRT.

Table of Contents

>   JetPack 6 is now supported on Orin devices (developer.nvidia.com/jetpack) <br/> >   Check out the Generative AI and LLM tutorials on Jetson AI Lab! <br/> >   See the Change Log for the latest updates and new features. <br/>

Hello AI World

Hello AI World can be run completely onboard your Jetson, including live inferencing with TensorRT and transfer learning with PyTorch. For installation instructions, see System Setup. It's then recommended to start with the Inference section to familiarize yourself with the concepts, before diving into Training your own models.

System Setup

Inference

Training

WebApp Frameworks

Appendix

Jetson AI Lab

<a href="https://www.jetson-ai-lab.com"><img align="right" width="200" height="200" src="https://nvidia-ai-iot.github.io/jetson-generative-ai-playground/images/JON_Gen-AI-panels.png"></a>

The Jetson AI Lab has additional tutorials on LLMs, Vision Transformers (ViT), and Vision Language Models (VLM) that run on Orin (and in some cases Xavier). Check out some of these:

<a href="https://www.jetson-ai-lab.com/tutorial_nanoowl.html"><img src="https://github.com/NVIDIA-AI-IOT/nanoowl/raw/main/assets/jetson_person_2x.gif"></a>

NanoOWL - Open Vocabulary Object Detection ViT (container: nanoowl)

<a href="https://youtu.be/X-OXxPiUTuU"><img width="600px" src="https://raw.githubusercontent.com/dusty-nv/jetson-containers/docs/docs/images/live_llava.gif"></a>

Live Llava on Jetson AGX Orin (container: local_llm)

<a href="https://youtu.be/dRmAGGuupuE"><img width="600px" src="https://raw.githubusercontent.com/dusty-nv/jetson-containers/docs/docs/images/live_llava_bear.jpg"></a>

Live Llava 2.0 - VILA + Multimodal NanoDB on Jetson Orin (container: local_llm)

<a href="https://youtu.be/ayqKpQNd1Jw"><img src="https://raw.githubusercontent.com/dusty-nv/jetson-containers/docs/docs/images/nanodb_horse.gif"></a>

Realtime Multimodal VectorDB on NVIDIA Jetson (container: nanodb)

Video Walkthroughs

Below are screencasts of Hello AI World that were recorded for the Jetson AI Certification course:

DescriptionVideo
<a href="https://www.youtube.com/watch?v=QXIwdsyK7Rw&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=9" target="_blank">Hello AI World Setup</a><br/>Download and run the Hello AI World container on Jetson Nano, test your camera feed, and see how to stream it over the network via RTP.<a href="https://www.youtube.com/watch?v=QXIwdsyK7Rw&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=9" target="_blank"><img src=https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/thumbnail_setup.jpg width="750"></a>
<a href="https://www.youtube.com/watch?v=QatH8iF0Efk&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=10" target="_blank">Image Classification Inference</a><br/>Code your own Python program for image classification using Jetson Nano and deep learning, then experiment with realtime classification on a live camera stream.<a href="https://www.youtube.com/watch?v=QatH8iF0Efk&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=10" target="_blank"><img src=https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/thumbnail_imagenet.jpg width="750"></a>
<a href="https://www.youtube.com/watch?v=sN6aT9TpltU&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=11" target="_blank">Training Image Classification Models</a><br/>Learn how to train image classification models with PyTorch onboard Jetson Nano, and collect your own classification datasets to create custom models.<a href="https://www.youtube.com/watch?v=sN6aT9TpltU&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=11" target="_blank"><img src=https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/thumbnail_imagenet_training.jpg width="750"></a>
<a href="https://www.youtube.com/watch?v=obt60r8ZeB0&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=12" target="_blank">Object Detection Inference</a><br/>Code your own Python program for object detection using Jetson Nano and deep learning, then experiment with realtime detection on a live camera stream.<a href="https://www.youtube.com/watch?v=obt60r8ZeB0&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=12" target="_blank"><img src=https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/thumbnail_detectnet.jpg width="750"></a>
<a href="https://www.youtube.com/watch?v=2XMkPW_sIGg&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=13" target="_blank">Training Object Detection Models</a><br/>Learn how to train object detection models with PyTorch onboard Jetson Nano, and collect your own detection datasets to create custom models.<a href="https://www.youtube.com/watch?v=2XMkPW_sIGg&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=13" target="_blank"><img src=https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/thumbnail_detectnet_training.jpg width="750"></a>
<a href="https://www.youtube.com/watch?v=AQhkMLaB_fY&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=14" target="_blank">Semantic Segmentation</a><br/>Experiment with fully-convolutional semantic segmentation networks on Jetson Nano, and run realtime segmentation on a live camera stream.<a href="https://www.youtube.com/watch?v=AQhkMLaB_fY&list=PL5B692fm6--uQRRDTPsJDp4o0xbzkoyf8&index=14" target="_blank"><img src=https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/thumbnail_segnet.jpg width="750"></a>

API Reference

Below are links to reference documentation for the C++ and Python libraries from the repo:

jetson-inference

C++Python
Image RecognitionimageNetimageNet
Object DetectiondetectNetdetectNet
SegmentationsegNetsegNet
Pose EstimationposeNetposeNet
Action RecognitionactionNetactionNet
Background RemovalbackgroundNetactionNet
Monocular DepthdepthNetdepthNet

jetson-utils

These libraries are able to be used in external projects by linking to libjetson-inference and libjetson-utils.

Code Examples

Introductory code walkthroughs of using the library are covered during these steps of the Hello AI World tutorial:

Additional C++ and Python samples for running the networks on images and live camera streams can be found here:

C++Python
   Image Recognitionimagenet.cppimagenet.py
   Object Detectiondetectnet.cppdetectnet.py
   Segmentationsegnet.cppsegnet.py
   Pose Estimationposenet.cppposenet.py
   Action Recognitionactionnet.cppactionnet.py
   Background Removalbackgroundnet.cppbackgroundnet.py
   Monocular Depthdepthnet.cppdepthnet.py

note: see the Array Interfaces section for using memory with other Python libraries (like Numpy, PyTorch, ect)

These examples will automatically be compiled while Building the Project from Source, and are able to run the pre-trained models listed below in addition to custom models provided by the user. Launch each example with --help for usage info.

Pre-Trained Models

The project comes with a number of pre-trained models that are available to use and will be automatically downloaded:

Image Recognition

NetworkCLI argumentNetworkType enum
AlexNetalexnetALEXNET
GoogleNetgooglenetGOOGLENET
GoogleNet-12googlenet-12GOOGLENET_12
ResNet-18resnet-18RESNET_18
ResNet-50resnet-50RESNET_50
ResNet-101resnet-101RESNET_101
ResNet-152resnet-152RESNET_152
VGG-16vgg-16VGG-16
VGG-19vgg-19VGG-19
Inception-v4inception-v4INCEPTION_V4

Object Detection

ModelCLI argumentNetworkType enumObject classes
SSD-Mobilenet-v1ssd-mobilenet-v1SSD_MOBILENET_V191 (COCO classes)
SSD-Mobilenet-v2ssd-mobilenet-v2SSD_MOBILENET_V291 (COCO classes)
SSD-Inception-v2ssd-inception-v2SSD_INCEPTION_V291 (COCO classes)
TAO PeopleNetpeoplenetPEOPLENETperson, bag, face
TAO PeopleNet (pruned)peoplenet-prunedPEOPLENET_PRUNEDperson, bag, face
TAO DashCamNetdashcamnetDASHCAMNETperson, car, bike, sign
TAO TrafficCamNettrafficcamnetTRAFFICCAMNETperson, car, bike, sign
TAO FaceDetectfacedetectFACEDETECTface
<details> <summary>Legacy Detection Models</summary>
ModelCLI argumentNetworkType enumObject classes
DetectNet-COCO-Dogcoco-dogCOCO_DOGdogs
DetectNet-COCO-Bottlecoco-bottleCOCO_BOTTLEbottles
DetectNet-COCO-Chaircoco-chairCOCO_CHAIRchairs
DetectNet-COCO-Airplanecoco-airplaneCOCO_AIRPLANEairplanes
ped-100pednetPEDNETpedestrians
multiped-500multipedPEDNET_MULTIpedestrians, luggage
facenet-120facenetFACENETfaces
</details>

Semantic Segmentation

DatasetResolutionCLI ArgumentAccuracyJetson NanoJetson Xavier
Cityscapes512x256fcn-resnet18-cityscapes-512x25683.3%48 FPS480 FPS
Cityscapes1024x512fcn-resnet18-cityscapes-1024x51287.3%12 FPS175 FPS
Cityscapes2048x1024fcn-resnet18-cityscapes-2048x102489.6%3 FPS47 FPS
DeepScene576x320fcn-resnet18-deepscene-576x32096.4%26 FPS360 FPS
DeepScene864x480fcn-resnet18-deepscene-864x48096.9%14 FPS190 FPS
Multi-Human512x320fcn-resnet18-mhp-512x32086.5%34 FPS370 FPS
Multi-Human640x360fcn-resnet18-mhp-512x32087.1%23 FPS325 FPS
Pascal VOC320x320fcn-resnet18-voc-320x32085.9%45 FPS508 FPS
Pascal VOC512x320fcn-resnet18-voc-512x32088.5%34 FPS375 FPS
SUN RGB-D512x400fcn-resnet18-sun-512x40064.3%28 FPS340 FPS
SUN RGB-D640x512fcn-resnet18-sun-640x51265.1%17 FPS224 FPS
<details> <summary>Legacy Segmentation Models</summary>
NetworkCLI ArgumentNetworkType enumClasses
Cityscapes (2048x2048)fcn-alexnet-cityscapes-hdFCN_ALEXNET_CITYSCAPES_HD21
Cityscapes (1024x1024)fcn-alexnet-cityscapes-sdFCN_ALEXNET_CITYSCAPES_SD21
Pascal VOC (500x356)fcn-alexnet-pascal-vocFCN_ALEXNET_PASCAL_VOC21
Synthia (CVPR16)fcn-alexnet-synthia-cvprFCN_ALEXNET_SYNTHIA_CVPR14
Synthia (Summer-HD)fcn-alexnet-synthia-summer-hdFCN_ALEXNET_SYNTHIA_SUMMER_HD14
Synthia (Summer-SD)fcn-alexnet-synthia-summer-sdFCN_ALEXNET_SYNTHIA_SUMMER_SD14
Aerial-FPV (1280x720)fcn-alexnet-aerial-fpv-720pFCN_ALEXNET_AERIAL_FPV_720p2
</details>

Pose Estimation

ModelCLI argumentNetworkType enumKeypoints
Pose-ResNet18-Bodyresnet18-bodyRESNET18_BODY18
Pose-ResNet18-Handresnet18-handRESNET18_HAND21
Pose-DenseNet121-Bodydensenet121-bodyDENSENET121_BODY18

Action Recognition

ModelCLI argumentClasses
Action-ResNet18-Kineticsresnet181040
Action-ResNet34-Kineticsresnet341040

Recommended System Requirements

The Transfer Learning with PyTorch section of the tutorial speaks from the perspective of running PyTorch onboard Jetson for training DNNs, however the same PyTorch code can be used on a PC, server, or cloud instance with an NVIDIA discrete GPU for faster training.

Extra Resources

In this area, links and resources for deep learning are listed:

Two Days to a Demo (DIGITS)

note: the DIGITS/Caffe tutorial from below is deprecated. It's recommended to follow the Transfer Learning with PyTorch tutorial from Hello AI World.

<details> <summary>Expand this section to see original DIGITS tutorial (deprecated)</summary> <br/> The DIGITS tutorial includes training DNN's in the cloud or PC, and inference on the Jetson with TensorRT, and can take roughly two days or more depending on system setup, downloading the datasets, and the training speed of your GPU. </details>

<p align="center"><sup>© 2016-2019 NVIDIA | </sup><a href="#deploying-deep-learning"><sup>Table of Contents</sup></a></p>