Home

Awesome

End-to-end Learning of Action Detection from Frame Glimpses in Videos

By Serena Yeung, Olga Russakovsky, Greg Mori, Li Fei-Fei

This code is a Torch implementation of an end-to-end approach for action detection in videos that learns to directly predict the temporal bounds of actions. Details of the work can be found here.

Citation

@article{yeung2015end,
  title={End-to-end Learning of Action Detection from Frame Glimpses in Videos},
  author={Yeung, Serena and Russakovsky, Olga and Mori, Greg and Fei-Fei, Li},
  journal={arXiv preprint arXiv:1511.06984},
  year={2015}
}

Usage

Run th train.lua to train an action detection model for an action class from a video dataset. The following command line arguments must be specified:

For more details, see util/DataHandler.lua. Run th train.lua --help to see additional command line options that may be specified.

Acknowledgments

This code is based partly on code from Wojciech Zaremba's learning to execute, Andrej Karpathy's char-rnn, and Element Research's dpnn (deep extensions to nn).