Awesome

Enabling Detailed Action Recognition Evaluation Through Video Dataset Augmentation

Jihoon Chung, Yu Wu, Olga Russakovsky
Princeton University

teaser

This is the official implementation of HAT toolkit using SeMask and E2FGVI. Although we provide a demo usage of the toolkit using MMAction2, the toolkit can easily be implemented in other human action recognizer models.

Methods

The toolkit makes use of three modified datasets generated from the original video dataset.

Background Only Videos
Human segmentation is removed from the video frames and inpainted to make the person 'invisible'.
Human Only Videos
Only the human region is remained intacted, rest of the pixels are substituted with a dataset average color.
Action Swap Videos
A background swapped from a different video.

Implementations

For efficiency, we only keep original frames, human segmentations, and Background Only Videos. We advise to generate Human Only Videos and Action Swap Videos online within the dataloader.

Data Preparation

Copy (or softlink) your video dataset in the data/{dataset_name}/ori folder. We have included two example videos from Kinetics-400. The given videos are sufficient enough to demo the toolkit.

Generate Human Segmentation

Please check the instructions.

Generate Inpainted Frames

Please check the instructions.

Testing HAT using MMAction2

Here, we use MMAction2 as the basis for our human action recognizer in our paper. You can implement your own dataloader if your tool does not involve MMAction2.

Please check the instructions.

Downloads

We offer pre-generated files (human segmentation, inpainting frames, and original frames) for Kinetics-400 and UCF101.

Acknowledgements

We are grateful for the support from the National Science Foundation under Grant No. 2112562, Microsoft, Princeton SEAS Project X Innovation Fund, and Princeton First Year Ph.D. Fellowship to JC.