Home

Awesome

Video Classification

The repository builds a quick and simple code for video classification (or action recognition) using UCF101 with PyTorch. A video is viewed as a 3D image or several continuous 2D images (Fig.1). Below are two simple neural nets models:

Dataset

alt text

UCF101 has total 13,320 videos from 101 actions. Videos have various time lengths (frames) and different 2d image size; the shortest is 28 frames.

To avoid painful video preprocessing like frame extraction and conversion such as OpenCV or FFmpeg, here I used a preprocessed dataset from feichtenhofer directly. If you want to convert or extract video frames from scratch, here are some nice tutorials:

Models

1. 3D CNN (train from scratch)

Use several 3D kernels of size (a,b,c) and channels n, e.g., (a, b, c, n) = (3, 3, 3, 16) to convolve with video input, where videos are viewed as 3D images. Batch normalization and dropout are also used.

2. CNN + RNN (CRNN)

The CRNN model is a pair of CNN encoder and RNN decoder (see figure below):

<img src="./fig/CRNN.png" width="650">

Training & testing

Usage

For tutorial purpose, I try to build code as simple as possible. Essentially, only 3 files are needed to for each model. eg., for 3D-CNN model

0. Prerequisites

1. Download preprocessed UCF101 dataset

For convenience, we use preprocessed UCF101 dataset already sliced into RGB images feichtenhofer/twostreamfusion:

Put the 3 parts in same folder to unzip. The folder has default name: jpegs_256.

2. Set parameters & path

In UCF101_CRNN.py, for example set

data_path = "./UCF101/jpegs_256/"         # UCF101 video path
action_name_path = "./UCF101actions.pkl"
save_model_path = "./model_ckpt/"

3. Train & test model

$ python UCF101_3DCNN/CRNN/ResNetCRNN.py    

4. Model ouputs

By default, the model outputs:

To check model prediction:

<img src="./fig/wrong_pred.png" width="600">

Version Warrning!

As of today (May 31, 2019), it is found that in Pytorch 1.1.0 flatten_parameters() doesn't work under torch.no_grad and DataParallel (for multiple GPUs). Early versions before Pytorch 1.0.1 still run OK. See Issues

Thanks to raghavgarg97's report.

Device & performance

networkbest epochtesting accuracy
3D CNN450.84 %
2D CNN + LSTM2554.62 %
2D ResNet152-CNN + LSTM5385.68 %
<img src="./fig/loss_3DCNN.png" width="650"> <img src="./fig/loss_CRNN.png" width="650"> <img src="./fig/loss_ResNetCRNN.png" width="650"> <br>