We introduce the Deep Motion Modeling Network (DMM-Net) that performs implicit detection and association of the objects in an end-to-end manner. DMM-Net models comprehensive object features over multiple frames and simultaneously infers object motion parameters, categories and visibilities. These outputs are readily used to update the tracklets for efficient MOT. DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for jointly performing detection and tracking on the popular UA-DETRAC challenge - orders of magnitude faster than the existing methods with better performance.


201911Finish the papers :-)
201910Preparing papers
201908Get Result on Omini-MOT dataset
201908Can Train on Omini-MOT dataset
201907Can Train on MOT17 dataset
201906Can Train on ``CVPR 2019 Tracking Challenge''
201905Can Train On the Whole UA-DETRAC dataset
201905Design the tracker
201904Recording Five Cities Training Dataset
201903Start A Plan of Create New Dataset
201902Optimized this network
201812Can Do the Basic Detection
201811Design the Loss Fucntion
201810Try the UA-DETRAC dataset
201809Re-design the input and output
201808Design the whole network
201807Start this idea



Schematics of end-to-end trainable DMM-Net: <img src="https://latex.codecogs.com/svg.latex?N_F"/> frames and their time stamps <img src="https://latex.codecogs.com/svg.latex?t_1:t_2"/> are input to the network. The frame sequence is first processed with a Feature Extractor comprising 3D ResNet-like convolutional groups. Outputs of selected groups are processed by Motion Subnet, Classifier Subnet, and Visibility Subnet. Each sub-network uses 3D convolutions to learn features that are concatenated and used to predict motion parameters (<img src="https://latex.codecogs.com/svg.latex?O_M\in\mathbb{R}^{N_T\times&space;N_P\times&space;4}"/>), object categories (<img src="https://latex.codecogs.com/svg.latex?O_C\in\mathbb{R}^{N_T\times&space;N_C}"/>), and visibility (<img src="https://latex.codecogs.com/svg.latex?O_V\in\mathbb{R}^{N_F\times&space;N_T\times&space;2}"/>), where <img src="https://latex.codecogs.com/svg.latex?N_T"/> <img src="https://latex.codecogs.com/svg.latex?N_P"/> and <img src="https://latex.codecogs.com/svg.latex?N_C"/> denote the number of anchor tunnels, motion parameters and object categories.


We directly deploy the trained network into the DMM Tracker (DMMT), as shown in the following figure. <img src="https://latex.codecogs.com/svg.latex?2N_F"/> frames are processed by the tracker, where the trained DMM-Net selects <img src="https://latex.codecogs.com/svg.latex?N_F"/> frames as its input, and outputs predicted tunnels containing all possible object's motion parameter matrice <img src="https://latex.codecogs.com/svg.latex?(O_M)"/>, category matrice <img src="https://latex.codecogs.com/svg.latex?(O_C)"/> and visibility matrice <img src="https://latex.codecogs.com/svg.latex?(O_V)"/>, which are then filtered by the Tunnel Filter. After that, the track set <img src="https://latex.codecogs.com/svg.latex?\mathcal{T}_{t_i}"/> is updated by associating the filtered tunnels by their IOU with previous track set <img src="https://latex.codecogs.com/svg.latex?\mathcal{T}_{t_{i-1}}"/>.


This tracker can achieve 120+ fps for jointly performing detection and tracking.



This work is based on the Pytroch and 3D ResNet. It also inspired by SSD and DAN.


The methods provided on this page are published under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License . This means that you must attribute the work in the manner specified by the authors, you may not use this work for commercial purposes and if you alter, transform, or build upon this work, you may distribute the resulting work only under the same license. If you are interested in commercial usage you can contact us for further options.

