Awesome
Enabling Detailed Action Recognition Evaluation Through Video Dataset Augmentation
Jihoon Chung, Yu Wu, Olga Russakovsky
Princeton University
This is the official implementation of HAT toolkit using SeMask and E2FGVI. Although we provide a demo usage of the toolkit using MMAction2, the toolkit can easily be implemented in other human action recognizer models.
Methods
The toolkit makes use of three modified datasets generated from the original video dataset.
- Background Only Videos
Human segmentation is removed from the video frames and inpainted to make the person 'invisible'. - Human Only Videos
Only the human region is remained intacted, rest of the pixels are substituted with a dataset average color. - Action Swap Videos
A background swapped from a different video.
Implementations
For efficiency, we only keep original frames, human segmentations, and Background Only Videos. We advise to generate Human Only Videos and Action Swap Videos online within the dataloader.
Data Preparation
Copy (or softlink) your video dataset in the data/{dataset_name}/ori
folder. We have included two example videos from Kinetics-400. The given videos are sufficient enough to demo the toolkit.
Generate Human Segmentation
Please check the instructions.
Generate Inpainted Frames
Please check the instructions.
Testing HAT using MMAction2
Here, we use MMAction2 as the basis for our human action recognizer in our paper. You can implement your own dataloader if your tool does not involve MMAction2.
Please check the instructions.
Downloads
We offer pre-generated files (human segmentation, inpainting frames, and original frames) for Kinetics-400 and UCF101.
Acknowledgements
We are grateful for the support from the National Science Foundation under Grant No. 2112562, Microsoft, Princeton SEAS Project X Innovation Fund, and Princeton First Year Ph.D. Fellowship to JC.