Home

Awesome

A Hierarchical Deep Temporal Model for Group Activity Recognition. Mostafa S. Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, Greg Mori. IEEE Computer Vision and Pattern Recognition 2016

Contents

  1. History
  2. Abstract
  3. Model
  4. Dataset
  5. Experiments
  6. Installation
  7. License and Citation
  8. Poster and Powerpoint

History

Abstract

In group activity recognition, the temporal dynamics of the whole activity can be inferred based on the dynamics of the individual people representing the activity. We build a deep model to capture these dynamics based on LSTM models. To make use of these observations, we present a 2-stage deep temporal model for the group activity recognition problem. In our model, a LSTM model is designed to represent action dynamics of individual people in a sequence and another LSTM model is designed to aggregate person-level information for whole activity understanding. We evaluate our model over two datasets: the Collective Activity Dataset and a new volleyball dataset.

Model

<img src="https://github.com/mostafa-saad/deep-activity-rec/blob/master/img/fig1.png" alt="Figure 1" height="400" >

Figure 1: High level figure for group activity recognition via a hierarchical model. Each person in a scene is modeled using a temporal model that captures his/her dynamics, these models are integrated into a higher-level model that captures scene-level activity.

<img src="https://github.com/mostafa-saad/deep-activity-rec/blob/master/img/fig2-b.png" alt="Figure 2" height="400" >

Figure 2: Detailed figure for the model. Given tracklets of K-players, we feed each tracklet in a CNN, followed by a person LSTM layer to represent each player's action. We then pool over all people's temporal features in the scene. The output of the pooling layer is feed to the second LSTM network to identify the whole teams activity.

<img src="https://github.com/mostafa-saad/deep-activity-rec/blob/master/img/fig3.jpg" alt="Figure 3" height="400" >

Figure 3: Previous basic mode drops spatial information. In updated model, 2-group pooling to capture spatial arrangements of players.

Dataset

NEW Download Link (all below combined google drive.

Old Download Link.

If links don't work at some point, please email me (mostafa.saad.fci@gmail.com)

Download Error: Got quota issue? Google 'How To Fix Google Drive Download Quota Exceeded'

UPDATE 1: many people asked for extracted trajectories. In fact, as in our code, we generate them on the fly using Dlib Tracker. I extrated and saved them to disk (I did few verifications). Hopefully this helps more. Download.

UPDATE 2: My College, Jiawei (Eric) He, Recently trained 2 Faster-RCNN detectors using the training detections. One detector just detects the person. The other one detects the action of the person. Each row has format: [Image name # of detections x y w h confidence category (for each detection)]. Multiple scenarios such data can be useful and cut your time. I did few verifications over them. Notice, these data are not used in our models. They are provided to help :). Download.

UPDATE 3 - NEW: Special thanks for Norimichi Ukita (a professor at Toyota Technological Institute) for providing manual annotations for the trajectories on all video sequences. Download. Kindely checkout the README file for data format and cite their paper if used the annotations (Heatmapping of People Involved in Group Activities, Kohei Sendo and Norimichi Ukita, MVA 2019)

UPDATE 4 - NEW: Special thanks for Mauricio Perez. In their recent paper: Skeleton-based relational reasoning for group activity analysis they manually annotated the ball locations in the frames. Kindely cite their paper if you used their dataset extension

We collected a new dataset using publicly available YouTube volleyball videos. We annotated 4830 frames that were handpicked from 55 videos with 9 player action labels and 8 team activity labels.

<img src="https://github.com/mostafa-saad/deep-activity-rec/blob/master/img/dataset1.jpg" alt="Figure 3" height="400" >

Figure 3: A frame labeled as Left Spike and bounding boxes around each team players is annotated in the dataset.

<img src="https://github.com/mostafa-saad/deep-activity-rec/blob/master/img/dataset2.jpg" alt="Figure 4" height="400" >

Figure 4: For each visible player, an action label is annotaed.

We used 3493 frames for training, and the remaining 1337 frames for testing. The train-test split of is performed at video level, rather than at frame level so that it makes the evaluation of models more convincing. The list of action and activity labels and related statistics are tabulated in following tables:

Group Activity ClassNo. of Instances
Right set644
Right spike623
Right pass801
Right winpoint295
Left winpoint367
Left pass826
Left spike642
Left set633
Action ClassesNo. of Instances
Waiting3601
Setting1332
Digging2333
Falling1241
Spiking1216
Blocking2458
Jumping341
Moving5121
Standing38696

Further information:

Experiments

<img src="https://github.com/mostafa-saad/deep-activity-rec/blob/master/img/table-ac.png" alt="Figure 5" height="300" >

Table 1: Comparison of the team activity recognition performance of baselines against our model evaluated on the Volleyball Dataset. Experiments are using 2 group styles with max pool strategy. Last 3 entries comparison against Improved Dense Trajectories approach.

Installation

License and Citation

Source code is released under the BSD 2-Clause license

In case using our extended dataset, please site the following 2 publications. Otherwise, cite a suitable subset of them:

@inproceedings{msibrahiCVPR16deepactivity,
  author    = {Mostafa S. Ibrahim and Srikanth Muralidharan and Zhiwei Deng and Arash Vahdat and Greg Mori},
  title     = {A Hierarchical Deep Temporal Model for Group Activity Recognition.},
  booktitle = {2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2016}
}

@inproceedings{msibrahiPAMI16deepactivity,
  author    = {Mostafa S. Ibrahim and Srikanth Muralidharan and Zhiwei Deng and Arash Vahdat and Greg Mori},
  title     = {Hierarchical Deep Temporal Models for Group Activity Recognition.},
  journal   = {arXiv preprint arXiv:1607.02643},
  year      = {2016}
}

Poster and Powerpoint

<img src="https://github.com/mostafa-saad/deep-activity-rec/blob/master/extra/poster.jpg" alt="Poster" height="400" >

Mostafa on left and Srikanth on right while presenting the poster.