

DNN Inference Nodes for ROS/ROS2

This package contains DNN inference nodes and camera/video streaming nodes for ROS/ROS2 with support for NVIDIA Jetson Nano / TX1 / TX2 / Xavier / Orin devices and TensorRT.

The nodes use the image recognition, object detection, and semantic segmentation DNN's from the jetson-inference library and NVIDIA Hello AI World tutorial, which come with several built-in pretrained networks for classification, detection, and segmentation and the ability to load customized user-trained models.

The camera & video streaming nodes support the following input/output interfaces:

Various distribution of ROS are supported either from source or through containers (including Melodic, Noetic, Foxy, Galactic, Humble, and Iron). The same branch supports both ROS1 and ROS2.

Table of Contents


The easiest way to get up and running is by cloning jetson-inference (which ros_deep_learning is a submodule of) and running the pre-built container, which automatically mounts the required model directories and devices:

$ git clone --recursive --depth=1 https://github.com/dusty-nv/jetson-inference
$ cd jetson-inference
$ docker/run.sh --ros=humble  # noetic, foxy, galactic, humble, iron

note: the ros_deep_learning nodes rely on data from the jetson-inference tree for storing models, so clone and mount jetson-inference/data if you're using your own container or source installation method.

The --ros argument to the docker/run.sh script selects the ROS distro to use. They in turn use the ros:$ROS_DISTRO-pytorch container images from jetson-containers, which include jetson-inference and this.

For previous information about building the ros_deep_learning package for an uncontainerized ROS installation, expand the section below (the parts about installing ROS may require adapting for the particular version of ROS/ROS2 that you want to install)

<details> <summary>Legacy Install Instructions</summary>


These ROS nodes use the DNN objects from the jetson-inference project (aka Hello AI World). To build and install jetson-inference, see this page or run the commands below:

$ cd ~
$ sudo apt-get install git cmake
$ git clone --recursive --depth=1 https://github.com/dusty-nv/jetson-inference
$ cd jetson-inference
$ mkdir build
$ cd build
$ cmake ../
$ make -j$(nproc)
$ sudo make install
$ sudo ldconfig

Before proceeding, it's worthwhile to test that jetson-inference is working properly on your system by following this step of the Hello AI World tutorial:


Install the ros-melodic-ros-base or ros-eloquent-ros-base package on your Jetson following these directions:

Depending on which version of ROS you're using, install some additional dependencies and create a workspace:

ROS Melodic

$ sudo apt-get install ros-melodic-image-transport ros-melodic-vision-msgs

For ROS Melodic, create a Catkin workspace (~/ros_workspace) using these steps:

ROS Eloquent

$ sudo apt-get install ros-eloquent-vision-msgs \
                       ros-eloquent-launch-xml \
                       ros-eloquent-launch-yaml \

For ROS Eloquent, create a workspace (~/ros_workspace) to use:

$ mkdir -p ~/ros2_example_ws/src


Next, navigate into your ROS workspace's src directory and clone ros_deep_learning:

$ cd ~/ros_workspace/src
$ git clone https://github.com/dusty-nv/ros_deep_learning

Then build it - if you are using ROS Melodic, use catkin_make. If you are using ROS2 Eloquent, use colcon build:

$ cd ~/ros_workspace/

# ROS Melodic
$ catkin_make
$ source devel/setup.bash 

# ROS2 Eloquent
$ colcon build
$ source install/local_setup.bash 

The nodes should now be built and ready to use. Remember to source the overlay as shown above so that ROS can find the nodes.



Before proceeding, if you're using ROS Melodic make sure that roscore is running first:

$ roscore

If you're using ROS2, running the core service is no longer required.

Video Viewer

First, it's recommended to test that you can stream a video feed using the video_source and video_output nodes. See Camera Streaming & Multimedia for valid input/output streams, and substitute your desired input and output argument below. For example, you can use video files for the input or output, or use V4L2 cameras instead of MIPI CSI cameras. You can also use RTP/RTSP streams over the network.

$ roslaunch ros_deep_learning video_viewer.ros1.launch input:=csi://0 output:=display://0

# ROS2
$ ros2 launch ros_deep_learning video_viewer.ros2.launch input:=csi://0 output:=display://0

imagenet Node

You can launch a classification demo with the following commands - substitute your desired camera or video path to the input argument below (see here for valid input/output streams).

Note that the imagenet node also publishes classification metadata on the imagenet/classification topic in a vision_msgs/Detection2DArray message -- see the Topics & Parameters section below for more info.

$ roslaunch ros_deep_learning imagenet.ros1.launch input:=csi://0 output:=display://0

# ROS2
$ ros2 launch ros_deep_learning imagenet.ros2.launch input:=csi://0 output:=display://0

detectnet Node

To launch an object detection demo, substitute your desired camera or video path to the input argument below (see here for valid input/output streams). Note that the detectnet node also publishes the metadata in a vision_msgs/Detection2DArray message -- see the Topics & Parameters section below for more info.

$ roslaunch ros_deep_learning detectnet.ros1.launch input:=csi://0 output:=display://0

# ROS2
$ ros2 launch ros_deep_learning detectnet.ros2.launch input:=csi://0 output:=display://0

segnet Node

To launch a semantic segmentation demo, substitute your desired camera or video path to the input argument below (see here for valid input/output streams). Note that the segnet node also publishes raw segmentation results to the segnet/class_mask topic -- see the Topics & Parameters section below for more info.

$ roslaunch ros_deep_learning segnet.ros1.launch input:=csi://0 output:=display://0

# ROS2
$ ros2 launch ros_deep_learning segnet.ros2.launch input:=csi://0 output:=display://0

Topics & Parameters

Below are the message topics and parameters that each node implements.

imagenet Node

Topic NameI/OMessage TypeDescription
image_inInputsensor_msgs/ImageRaw input image
classificationOutputvision_msgs/Classification2DClassification results (class ID + confidence)
vision_infoOutputvision_msgs/VisionInfoVision metadata (class labels parameter list name)
overlayOutputsensor_msgs/ImageInput image overlayed with the classification results
Parameter NameTypeDefaultDescription
model_namestring"googlenet"Built-in model name (see here for valid values)
model_pathstring""Path to custom caffe or ONNX model
prototxt_pathstring""Path to custom caffe prototxt file
input_blobstring"data"Name of DNN input layer
output_blobstring"prob"Name of DNN output layer
class_labels_pathstring""Path to custom class labels file
class_labels_HASHvector<string>class namesList of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic)

detectnet Node

Topic NameI/OMessage TypeDescription
image_inInputsensor_msgs/ImageRaw input image
detectionsOutputvision_msgs/Detection2DArrayDetection results (bounding boxes, class IDs, confidences)
vision_infoOutputvision_msgs/VisionInfoVision metadata (class labels parameter list name)
overlayOutputsensor_msgs/ImageInput image overlayed with the detection results
Parameter NameTypeDefaultDescription
model_namestring"ssd-mobilenet-v2"Built-in model name (see here for valid values)
model_pathstring""Path to custom caffe or ONNX model
prototxt_pathstring""Path to custom caffe prototxt file
input_blobstring"data"Name of DNN input layer
output_cvgstring"coverage"Name of DNN output layer (coverage/scores)
output_bboxstring"bboxes"Name of DNN output layer (bounding boxes)
class_labels_pathstring""Path to custom class labels file
class_labels_HASHvector<string>class namesList of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic)
overlay_flagsstring"box,labels,conf"Flags used to generate the overlay (some combination of none,box,labels,conf)
mean_pixel_valuefloat0.0Mean pixel subtraction value to be applied to input (normally 0)
thresholdfloat0.5Minimum confidence value for positive detections (0.0 - 1.0)

segnet Node

Topic NameI/OMessage TypeDescription
image_inInputsensor_msgs/ImageRaw input image
vision_infoOutputvision_msgs/VisionInfoVision metadata (class labels parameter list name)
overlayOutputsensor_msgs/ImageInput image overlayed with the classification results
color_maskOutputsensor_msgs/ImageColorized segmentation class mask out
class_maskOutputsensor_msgs/Image8-bit single-channel image where each pixel is a classID
Parameter NameTypeDefaultDescription
model_namestring"fcn-resnet18-cityscapes-1024x512"Built-in model name (see here for valid values)
model_pathstring""Path to custom caffe or ONNX model
prototxt_pathstring""Path to custom caffe prototxt file
input_blobstring"data"Name of DNN input layer
output_blobstring"score_fr_21classes"Name of DNN output layer
class_colors_pathstring""Path to custom class colors file
class_labels_pathstring""Path to custom class labels file
class_labels_HASHvector<string>class namesList of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic)
mask_filterstring"linear"Filtering to apply to color_mask topic (linear or point)
overlay_filterstring"linear"Filtering to apply to overlay topic (linear or point)
overlay_alphafloat180.0Alpha blending value used by overlay topic (0.0 - 255.0)

video_source Node

Topic NameI/OMessage TypeDescription
rawOutputsensor_msgs/ImageRaw output image (BGR8)
resourcestring"csi://0"Input stream URI (see here for valid protocols)
codecstring""Manually specify codec for compressed streams (see here for valid values)
widthint0Manually specify desired width of stream (0 = stream default)
heightint0Manually specify desired height of stream (0 = stream default)
framerateint0Manually specify desired framerate of stream (0 = stream default)
loopint0For video files: 0 = don't loop, >0 = # of loops, -1 = loop forever
flipstring""Set the flip method for MIPI CSI cameras (see here for valid values)

video_output Node

Topic NameI/OMessage TypeDescription
image_inInputsensor_msgs/ImageRaw input image
resourcestring"display://0"Output stream URI (see here for valid protocols)
codecstring"h264"Codec used for compressed streams (see here for valid values)
bitrateint4000000Target VBR bitrate of encoded streams (in bits per second)