Awesome
Video Classification Using 3D ResNet
This is a torch code for video (action) classification using 3D ResNet trained by this code.
The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes.
This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames.
PyTorch (Python) version of this code is available here.
Requirements
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh
luarocks install json
- FFmpeg, FFprobe
wget http://johnvansickle.com/ffmpeg/releases/ffmpeg-release-64bit-static.tar.xz
tar xvf ffmpeg-release-64bit-static.tar.xz
cd ./ffmpeg-3.3.3-64bit-static/; sudo cp ffmpeg ffprobe /usr/local/bin;
- Python 3
Preparation
- Download this code.
- Download the pretrained model.
- We recommend ResNet-34.
Usage
th main.lua --input ./input --output ./output.json --model ./resnet-34-kinetics.t7
To visualize the classification results, use generate_result_video/generate_result_video.py
.
Citation
If you use this code, please cite the following:
@article{hara3dresnets
author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh}
title={Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition}
journal={arXiv preprint}
volume={arXiv:1708.07632}
year={2017}
}