Awesome

PointPillars Inference with TensorRT

This repository contains sources and model for pointpillars inference using TensorRT.

Overall inference has below phases:

Voxelize points cloud into 10-channel features
Run TensorRT engine to get detection feature
Parse detection feature and apply NMS

Prerequisites

Prepare Model && Data

We provide a Dockerfile to ease environment setup. Please execute the following command to build the docker image after nvidia-docker installation:

cd docker && docker build . -t pointpillar

We can then run the docker with the following command:

nvidia-docker run --rm -ti -v /home/$USER/:/home/$USER/ --net=host --rm pointpillar:latest

For model exporting, please run the following command to clone pcdet repo and install custom CUDA extensions:

git clone https://github.com/open-mmlab/OpenPCDet.git
cd OpenPCDet && git checkout 846cf3e && python3 setup.py develop

Download PTM to ckpts/, then use below command to export ONNX model:

python3 tool/export_onnx.py --ckpt ckpts/pointpillar_7728.pth --out_dir model

Use below command to evaluate on kitti dataset, follow Evaluation on Kitti to get more detail for dataset preparation.

sh tool/evaluate_kitti_val.sh

Setup Runtime Environment

Nvidia Jetson Orin + CUDA 11.4 + cuDNN 8.9.0 + TensorRT 8.6.11

Compile && Run

sudo apt-get install git-lfs && git lfs install
git clone https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars.git
cd CUDA-PointPillars && . tool/environment.sh
mkdir build && cd build
cmake .. && make -j$(nproc)
cd ../ && sh tool/build_trt_engine.sh
cd build && ./pointpillar ../data/ ../data/ --timer

FP16 Performance && Metrics

Average perf in FP16 on the training set(7481 instances) of KITTI dataset.

| Function(unit:ms) | Orin   |
| ----------------- | ------ |
| Voxelization      | 0.18   |
| Backbone & Head   | 4.87   |
| Decoder & NMS     | 1.79   |
| Overall           | 6.84   |

3D moderate metrics on the validation set(3769 instances) of KITTI dataset.

|                   | Car@R11 | Pedestrian@R11 | Cyclist@R11  | 
| ----------------- | --------| -------------- | ------------ |
| CUDA-PointPillars | 77.00   | 52.50          | 62.26        |
| OpenPCDet         | 77.28   | 52.29          | 62.68        |

Note

Voxelization has random output since GPU processes all points simultaneously while points selection for a voxel is random.