Awesome

3D Object Tracking with Transformer

Overview

Introduction
Performance
Setup
QuickStart
Acknowledgment
Citation

Introduction

This is the official code release of "3D Object Tracking with Transformer"(Accepted as Contributed paper in BMVC 2021) Paper

Abstract

Feature fusion and similarity computation are two core problems in 3D object tracking, especially for object tracking using sparse and disordered point clouds. Feature fusion could make similarity computing more efficient by including target object information. However, most existing LiDAR-based approaches directly use the extracted point cloud feature to compute similarity while ignoring the attention changes of object regions during tracking. In this paper, we propose a feature fusion network based on transformer architecture. Benefiting from the self-attention mechanism, the transformer encoder captures the inter- and intra- relations among different regions of the point cloud. By using cross-attention, the transformer decoder fuses features and includes more target cues into the current point cloud feature to compute the region attentions, which makes the similarity computing more efficient. Based on this feature fusion network, we propose an end-to-end point cloud object tracking framework, a simple yet effective method for 3D object tracking using point clouds.

Performance

Here, we show the latest performance of our LTTR. In order to better open source our code, we reconstruct the code and optimized some parameters compared to the version in the paper, and the performances are as follows:

Kitti Dataset

	Car	Ped	Van	Cyclist	Mean
Success	68.6	45.5	39.5	70.7	56.1
Precision	79.2	70.6	45.2	90.6	72.7

The pretrained model could be downloaded at this Link

Setup

Installation

conda create -n lttr python=3.8 -y
conda activate lttr

pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

# please refer to https://github.com/traveller59/spconv
pip install spconv-cu111

git clone https://github.com/3bobo/lttr
cd lttr/
pip install -r requirements.txt

python setup.py develop

Dataset preparation

Download the dataset from KITTI Tracking and organize the downloaded files as follows:

lttr                                           
|-- data                                     
|   |-- kitti                                                                          
│   │   └── training
│   │       ├── calib
│   │       ├── label_02
│   │       └── velodyne

QuickStart

Train

For training, you can customize the training by modifying the parameters in the yaml file of the corresponding model, such as 'CLASS_NAMES'.

After configuring the yaml file, run the following command to parser the path of config file and the training tag.

cd lttr/tools
# python train.py --cfg_file cfgs/kitti_models/car.yaml --extra_tag car
python train.py --cfg_file $model_config_path

For training with ddp, you can execute the following command ( ensure be root dir ):

cd lttr/tools
bash scripts/dist_train.sh $NUM_GPUs --cfg_file $model_config_path

Eval

cd lttr/tools
# for single model
python test.py --cfg_file $model_config_path --ckpt $your_saved_ckpt
# for all saved model
python test.py --cfg_file $model_config_path --ckpt $your_saved_ckpt --eval_all

The evaluation results are saved to the same path as the model, such as 'output/kitti_models/car'.

Acknowledgment

This repo is built upon P2B and OpenPCDet.
Thank lucidrains for his implementation of TNT.
Thank traveller59 for his implementation of Spconv.
Thank tianweiy for his implementation of CenterPoint.

Citation

If you find the project useful for your research, you may cite,

@inproceedings{lttr,
    author    = {Yubo Cui and Zheng Fang and Jiayao Shan and Zuoxu Gu and Sifan Zhou},
    title     = {3D Object Tracking with Transformer},
    booktitle = {32nd British Machine Vision Conference 2021, {BMVC} 2021, Online,
                November 22-25, 2021},
    pages     = {317},
    publisher = {{BMVA} Press},
    year      = {2021},
    }