Home

Awesome

Activity Anticipation

Human activity anticipation from videos using an LSTM-based model. problem statement

Requirements

Model

Extract features from each frame with a CNN and pass the sequence to an LSTM, in a separate network. model

Data

I use the TV Human Interactions (TVHI) dataset. The dataset consists of people performing four different actions: hand shake, high five, hug, and kiss, with a total of 200 videos (excluding the clips that don't contain any of the interactions).

Please extract the above files and store the videos inside the ./videos directory, annotations inside the ./annotations directory.<br/>

For the CNN, I use Inception V3, pre-trained on ImageNet.

Extract the compressed file and put inception_v3.ckpt into the ./inception_checkpoint directory.

Usage

First, extract features from the frames before the annotated action begins in each video:

$ python preprocessing.py

Then, generate the train_file.csv file containing the ground_truth label information of the dataset:

$ python generate\_video\_mapping.py

To train the model with default parameters:

$ python train.py

Performance

Activity# of training data# of augmented training data# of validation data
hand shake2731520
high five3032020
hug3036220
kiss2829320
<br/> Train the model for 15 epochs.

loss accuracy

Related works