Awesome

Awesome Action Recognition:

A curated list of action recognition and related area (e.g. object recognition, pose estimation) resources, inspired by awesome-computer-vision.

Action Recognition and Video Understanding
Object Recognition
Pose Estimation
Competitions

Action Recognition and Video Understanding

Summary posts

Deep Learning for Videos: A 2018 Guide to Action Recognition - Summary of major landmark action recognition research papers till 2018
Literature Survey: Human Action Recognition - Brief human action recognition literature survey of work published between 2014 and 2019.

Video Representation

Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition - J. Choi et al., NeurIPS2019. [project web] [code] [arXiv]
SlowFast Networks for Video Recognition - C. Feichtenhofer et al., ICCV2019. [code]
Large-scale weakly-supervised pre-training for video action recognition - D. Ghadiyaram et al., arXiv2019.
Video Classification with Channel-Separated Convolutional Networks - D. Tran et al., arXiv2019.
DistInit: Learning Video Representations without a Single Labeled Video - R. Girdhar et al., arXiv2019.
SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition - B. Korbar et al., arXiv2019.
Video Action Transformer Network - R. Girdhar et al., CVPR2019. [project web]
Learning Correspondence from the Cycle-consistency of Time - X. Wang et al., CVPR2019. [code] [project web]
Representation Flow for Action Recognition - AJ. Piergiovanni and M. S. Ryoo et al., CVPR2019.
Collaborative Spatiotemporal Feature Learning for Video Action Recognition - C. Li et al., CVPR2019.
Learning Video Representations from Correspondence Proposals - X. Liu et al., CVPR2019.
Timeception for Complex Action Recognition - N. Hussein et al., CVPR2019.
The Visual Centrifuge: Model-Free Layered Video Representations - J.-B. Alayrac et al., CVPR2019.
Long-Term Feature Banks for Detailed Video Understanding - C.-Y. Wu. et al., CVPR2019. [code]
Temporal Relational Reasoning in Videos - B. Zhou et al., ECCV2018. [code] [project web]
Action Recognition Zoo - Codes for popular action recognition models, written based on pytorch, verified on the something-something dataset.
Videos as Space-Time Region Graphs - X. Wang and A. Gupta, ECCV2018.
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? - K. Hara et al., CVPR2019. [code]
A Closer Look at Spatiotemporal Convolutions for Action Recognition - D. Tran et al., CVPR2018. [code] [PyTorch]
Attend and Interact: Higher-Order Object Interactions for Video Understanding - CY. Ma et al., CVPR 2018.
Non-Local Neural Networks - X. Wang et al., CVPR2018. [code]
Rethinking Spatiotemporal Feature Learning For Video Understanding - S. Xie et al., arXiv2017.
ConvNet Architecture Search for Spatiotemporal Feature Learning - D. Tran et al, arXiv2017. Note: Aka Res3D. [code]: In the repository, C3D-v1.1 is the Res3D implementation.
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks - Z. Qui et al, ICCV2017. [code]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset - J. Carreira et al, CVPR2017. [code][PyTorch code], [another PyTorch code]
Learning Spatiotemporal Features with 3D Convolutional Networks - D. Tran et al, ICCV2015. [the official Caffe code] [project web] Note: Aka C3D. [Python Wrapper] Note that the official caffe does not support python wrapper. [TensorFlow], [TensorFlow + Keras], [Another TensorFlow Implemetation], [Keras C3D Project web]: [Keras code], [Pretrained weights].
Deep Temporal Linear Encoding Networks - A. Diba et al, CVPR2017.
Temporal Convolutional Networks: A Unified Approach to Action Segmentation and Detection - C. Lea et al, CVPR 2017. [code]
Long-term Temporal Convolutions - G. Varol et al, TPAMI2017. [project web] [code]
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition - L. Wang et al, arXiv 2016. [code]
Convolutional Two-Stream Network Fusion for Video Action Recognition - C. Feichtenhofer et al, CVPR2016. [code]
Two-Stream Convolutional Networks for Action Recognition in Videos - K. Simonyan and A. Zisserman, NIPS2014.
Temporal Recurrent Networks for Online Action Detection - M. Xu et al, ICCV2019. [code]
Long Short-Term Transformer for Online Action Detection - M. Xu et al, Neurips2021. [code]

Useful Code Repos on Video Representation Learning

Action Recognition Datasets

Video Dataset Overview from Antoine Miech
HACS
Moments in Time, paper
AVA, paper, [INRIA web] for missing videos
Kinetics, paper, download toolkit
OOPS - A dataset of unintentional action, paper
COIN - a large-scale dataset for comprehensive instructional video analysis, paper
YouTube-8M, technical report
YouTube-BB, technical report
DALY Daily Action Localization in Youtube videos. Note: Weakly supervised action detection dataset. Annotations consist of start and end time of each action, one bounding box per each action per video.
20BN-JESTER, 20BN-SOMETHING-SOMETHING
ActivityNet Note: They provide a download script and evaluation code here .
Charades
Charades-Ego, paper - First person and third person video aligned dataset
EPIC-Kitchens, paper - First person videos recorded in kitchens. Note they provide download scripts and a python library here
Sports-1M - Large scale action recognition dataset.
THUMOS14 Note: It overlaps with UCF-101 dataset.
THUMOS15 Note: It overlaps with UCF-101 dataset.
HOLLYWOOD2: Spatio-Temporal annotations
UCF-101, annotation provided by THUMOS-14, and corrupted annotation list, UCF-101 corrected annotations and different version annotaions. And there are also some pre-computed spatiotemporal action detection results
UCF-50.
UCF-Sports, note: the train/test split link in the official website is broken. Instead, you can download it from here.
HMDB
J-HMDB
LIRIS-HARL
KTH
MSR Action Note: It overlaps with KTH datset.
Sports Videos in the Wild
NTU RGB+D
Mixamo Mocap Dataset
UWA3D Multiview Activity II Dataset
Northwestern-UCLA Dataset
SYSU 3D Human-Object Interaction Dataset
MEVA (Multiview Extended Video with Activities) Dataset

Video Annotation

Efficiently scaling up crowdsourced video annotation - C. Vondrick et. al, IJCV2013. [code]
The Design and Implementation of ViPER - D. Mihalcik and D. Doermann, Technical report.
VTT: Visual Object Tagging Tool. Modern app to annotate objects in videos and images. It facilitates the development of an end-to-end machine learning pipeline encompassing the annotation/export/import of assets. Moreover, it could run as a native app or via web.
VIA: VGG Image Annotator. Simple and standalone manual annotation web-app for image, audio and video. It runs in the web browser and does not require any installation or setup.

Object Recognition

Object Detection

Deformable Convolutional Networks - J. Dai et al., ICCV2017. [official code]
Detectron - Open Source Object Detection Framework from Facebook AI Research. Includes Mask R-CNN, FPN, and etc. Caffe2 implementation.
Mask R-CNN - K. He et al, [Detectron], [TensorFlow + Keras], [MXNet], [TensorFlow], [PyTorch] - State-of-the-art object detection/instance segmentation algorithm.
Faster R-CNN - S. Ren et al, NIPS2015. [official MatCaffe code], [PyCaffe], [TensorFlow], [Another TF implementation] [Keras] - State-of-the-art object detector.
YOLO - J. Redmon et al, CVPR2016. [official code], [TensorFLow] - Fast object detector.
YOLO9000 - J. Redmon and A. Farhadi, CVPR2017. [official code] - State-of-the-art object detector which can detect 9000 objects in realtime.
SSD - W. Liu et al, ECCV2016. [official PyCaffe code], [TensorFlow], [Keras] - State-of-the-art object detector with realtime processing speed.
RetinaNet - Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár, Facebook AI Research FAIR & ICCV 2017.[Keras] - State-of-the-art object detector with realtime processing speed.

Video Object Detection

[Detect to Track and Track to Detect] - C. Feichtenhofer et al., ICCV2017. [code], [project web]
[Flow-Guided Feature Aggregation for Video Object Detection] - X. Zhu et al., ICCV2017. [code], aka FGFA

Video Object Detection Datasets

Pose Estimation

AlphaPose - PyTorch based realtime and accurate pose estimation and tracking tool from SJTU.
Detect-and-Track: Efficient Pose Estimation in Videos - R. Girdhar et al., arXiv2017.
OpenPose Library - Caffe based realtime pose estimation library from CMU.
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields - Z. Cao et al, CVPR2017. [code] depends on the [caffe RT pose] - Earlier version of OpenPose from CMU.
DensePose [code] - Dense pose human estimation in the wild implemented in the Detectron framework.
MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network - M. Kocabas et al, ECCV2018. [code]
DeepLabCut: markerless pose estimation of user-defined body parts with deep learning - A. Mathis et al, Nature Neuroscience 2018. [code]

Competitions

ActEV (Activities in Extended Video - Activity detection in security camera videos. Runs through 2021. Hosted by NIST.

Licenses

License

To the extent possible under law, Jinwoo Choi has waived all copyright and related or neighboring rights to this work.

Contributing

Please read the contribution guidelines. Then please feel free to send me pull requests or email (jinchoi@vt.edu) to add links.

Awesome

Awesome Action Recognition:

Contents

Action Recognition and Video Understanding

Summary posts

Video Representation

Useful Code Repos on Video Representation Learning

Action Classification

Skeleton-Based Action Classification

Temporal Action Detection

Spatio-Temporal Action Detection

Ego-Centric Action Recognition

Miscellaneous

Action Recognition Datasets

Video Annotation

Object Recognition

Object Detection

Video Object Detection

Video Object Detection Datasets

Pose Estimation

Pose Estimation

Competitions

Competitions

Licenses

Contributing