Home

Awesome

Deeper and Wider Siamese Networks for Real-Time Visual Tracking

We are hiring research interns for visual tracking and neural architecture search projects: houwen.peng@microsoft.com

News

Introduction

Siamese networks have drawn great attention in visual tracking because of their balanced accuracy and speed. However, the backbone network utilized in these trackers is still the classical AlexNet, which does not fully take advantage of the capability of modern deep neural networks.

Our proposals improve the performances of fully convolutional siamese trackers by,

  1. introducing CIR and CIR-D units to unveil the power of deeper and wider networks like ResNet and Inceptipon;
  2. designing backbone networks according to the analysis on internal network factors (e.g. receptive field, stride, output feature size), which affect tracking performances.
<div align="center"> <img src="demo/vis.gif" width="800px" /> <!-- <p>Example SiamFC, SiamRPN and SiamMask outputs.</p> --> </div> <!-- :tada::tada: **Highlight !!** Siamese tracker is severely sensitive to hyper-parameter, which is a common sense in tracking field. Although significant progresses have been made in some works, the result is hard to reproduce. In this case, we provide a [parameter tuning toolkit]() to make our model being reproduced easily. We hope our efforts and supplies will be helpful to your work. -->

Main Results

Main results on VOT and OTB

ModelsOTB13OTB15VOT15VOT16VOT17
Alex-FC0.6080.5790.2890.2350.188
Alex-RPN-0.6370.3490.3440.244
CIResNet22-FC0.6630.6440.3180.3030.234
CIResIncep22-FC0.6620.6420.3100.2950.236
CIResNext23-FC0.6590.6330.2970.2780.229
CIResNet22-RPN0.6740.6660.3810.3760.294

Main results trained with GOT-10k (SiamFC)

ModelsOTB13OTB15VOT15VOT16VOT17
Alex-FC----0.188
CIResNet22-FC0.6640.6540.3610.3350.266
CIResNet22W-FC0.6890.6740.3680.3520.269
CIResIncep22-FC0.6730.6500.3320.3050.251
CIResNext22-FC0.6680.6510.3360.3040.246
Raw Results:paperclip: OTB2013:paperclip: OTB2015:paperclip: VOT15:paperclip: VOT16:paperclip: VOT17
<!-- - Download pretrained on GOT10K [model](https://drive.google.com/file/d/1xvexXCUCB0gCYFnShj3NQ4Xuk52lLLtE/view?usp=sharing). -->

New added results

BenchmarkVOT18VOT19GOT10KVISDRONE19LaSOT
Performance0.2700.2420.4160.3830.384
Raw Results:paperclip: VOT18:paperclip: VOT19:paperclip: GOT10K:paperclip: VISDRONE:paperclip: LaSOT

Environment

The code is developed with Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz GPU: NVIDIA .GTX1080

Quick Start

Test

See details in test.md

Train

See details in train.md

:cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud::cloud:

Citation

If any part of our paper and code is helpful to your work, please generously cite with:

@InProceedings{SiamDW_2019_CVPR,
author = {Zhang, Zhipeng and Peng, Houwen},
title = {Deeper and Wider Siamese Networks for Real-Time Visual Tracking},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
} 

License

Licensed under an MIT license.