Awesome
THU<sup>E-ACT</sup>-50: A Real-World Event-Based Action Recognition Benchmark
📢 Update: We are excited to announce the release of a larger and more comprehensive dataset, THU<sup>MV-EACT</sup>-50, which extends the THU<sup>E-ACT</sup>-50 to include multi-view action recognition. For more details, please visit THU-MV-EACT-50.
Introduced by the paper "Action Recognition and Benchmark Using Event Cameras" in TPAMI 2023, THU<sup>E-ACT</sup>-50 stands as a large-scale, real-world event-specific action recognition dataset with more than 4 times the size of the current largest event-based action recognition dataset. It contains 50 action categories and is primarily designed for whole-body motions and indoor healthcare applications. This repository provides access to the dataset, alongside detailed information about its contents and structure.
<img src="figures/sample-sequences.jpg" alt="Sample-sequences" style="zoom: 33%;" />Dataset Overview
THU<sup>E-ACT</sup>-50 is designed to address the limitations of existing event-based action recognition datasets, which are often too small and limited in the range of actions they cover. The dataset consists of two parts: the standard THU<sup>E-ACT</sup>-50 and a more challenging version,THU<sup>E-ACT</sup>-50 CHL, which is designed to test the robustness of algorithms under challenging conditions.
The dataset comprises a diverse set of action categories, including whole-body motions, indoor healthcare applications, detail-oriented actions, confusing actions, human-object interactions, and two-player interactive movements. With a total of 10,500 video recordings for the standard THU<sup>E-ACT</sup>-50 and 2,330 recordings for the challenging THU<sup>E-ACT</sup>-50 CHL, this dataset provides an extensive and varied collection of action sequences for researchers to explore and evaluate their models.
Dataset Description
Standard THU<sup>E-ACT</sup>-50
- 50 event-specific action categories
- 105 socially recruited subjects
- 10,500 video recordings
- CeleX-V event camera with a spatial resolution of 1280x800
- Two oblique front views of the actor
Challenging THU<sup>E-ACT</sup>-50 CHL
- Challenging scenarios with different illumination conditions and action magnitudes
- 50 event-specific action categories
- 18 on-campus students as subjects
- 2,330 video recordings
- DAVIS346 event camera with a spatial resolution of 346x260
- Front, left, right, and back views
- Two different scenarios: long corridor and open hall
- Challenging conditions including: <img src="figures/different-light.jpg" alt="Different-light" style="zoom:18%;" />
List of Actions
ID | Action | ID | Action | ID | Action | ID | Action | ID | Action |
---|---|---|---|---|---|---|---|---|---|
A0 | Walking | A10 | Cross arms | A20 | Calling with phone | A30 | Fan | A40 | Check time |
A1 | Running | A11 | Salute | A21 | Reading | A31 | Open umbrella | A41 | Drink water |
A2 | Jump up | A12 | Squat down | A22 | Tai chi | A32 | Close umbrella | A42 | Wipe face |
A3 | Running in circles | A13 | Sit down | A23 | Swing objects | A33 | Put on glasses | A43 | Long jump |
A4 | Falling down | A14 | Stand up | A24 | Throw | A34 | Take off glasses | A44 | Push up |
A5 | Waving one hand | A15 | Sit and stand | A25 | Staggering | A35 | Pick up | A45 | Sit up |
A6 | Waving two hands | A16 | Knead face | A26 | Headache | A36 | Put on bag | A46 | Shake hands (two-players) |
A7 | Clap | A17 | Nod head | A27 | Stomachache | A37 | Take off bag | A47 | Fighting (two-players) |
A8 | Rub hands | A18 | Shake head | A28 | Back pain | A38 | Put object into bag | A48 | Handing objects (two-players) |
A9 | Punch | A19 | Thumb up | A29 | Vomit | A39 | Take object out of bag | A49 | Lifting chairs (two-players) |
Evaluation Criteria
To evaluate the performance of event-based action recognition methods on the THU<sup>E-ACT</sup>-50 and THU<sup>E-ACT</sup>-50 CHL datasets, we divided the subjects in a ratio of 8:2 to create disjoint identity sets for training and testing. The training and test sets of the THU<sup>E-ACT</sup>-50 dataset contain 85 and 20 persons, respectively, while the training and test sets of the THU<sup>E-ACT</sup>-50 CHL dataset contain 14 and 4 persons, respectively.
We report the following evaluation metrics for each dataset:
- Top-1 Accuracy: The percentage of test videos for which the model correctly predicts the action category with the highest confidence.
- Top-N Accuracy: The percentage of test videos for which the correct action category is within the top N predictions made by the model.
Dataset Download
We're pleased to announce the release of the THU<sup>E-ACT</sup>-50 and THU<sup>E-ACT</sup>-50 CHL datasets.
THU<sup>E-ACT</sup>-50
- OneDrive: Download Here
- BaiduYun: Download Here (Access Code:
4csp
)
Note: After decompression, the dataset will require about 332GB of storage space.
THU<sup>E-ACT</sup>-50 CHL
- Google Drive: Download Here
- BaiduYun: Download Here (Access Code:
fdnd
)
Note: After decompression, the dataset will occupy approximately 4.6GB of storage space.
Dataset Format
In the two datasets, the division for training and test sets can be found in the train.txt
and test.txt
files, respectively. Each line consists of File Name and Action ID.
The preprocessing operations for the 2 datasets can be found in dataset.py
.
THU<sup>E-ACT</sup>-50
In the THU-EACT-50 dataset, which is provided in the .csv format, the data is structured with 5 columns as follows:
- y: Represents the y-coordinate of the event.
- x: Represents the x-coordinate of the event.
- b: This is an additional brightness value provided by the CeleX-V camera. It's worth noting that for our method, this value is not utilized.
- p: The polarity value. It contains three categories: 1, -1, and 0. In our experiments, we ignore the 0 values and consider 1 as positive polarity and -1 as negative polarity.
- t: Represents the timestamp of the event.
THU<sup>E-ACT</sup>-50 CHL
For the THU-EACT-50-CHL dataset, which is available in the .npy format, each line contains 4 elements:
- x: Represents the x-coordinate of the event.
- y: Represents the y-coordinate of the event.
- t: Represents the timestamp of the event.
- p: The polarity value. In this dataset, the polarity only includes standard values of 1 and 0. Here, 1 represents positive polarity, and 0 represents negative polarity.
Acknowledgements
We would like to express our sincere gratitude to Tsinghua University, partner companies, and organizations for their invaluable support and collaboration in making this dataset possible. Additionally, we extend our thanks to all the volunteers who participated in the data collection process. Their contributions have been instrumental in the development and evaluation of this benchmark.
License
This dataset is licensed under the MIT License.
Citing Our Work
If you find this dataset beneficial for your research, please cite our works:
@article{gao2023action,
title={Action Recognition and Benchmark Using Event Cameras},
author={Gao, Yue and Lu, Jiaxuan and Li, Siqi and Ma, Nan and Du, Shaoyi and Li, Yipeng and Dai, Qionghai},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2023},
volume={45},
number={12},
pages={14081-14097},
publisher={IEEE}
}
@article{gao2024hypergraph,
title={Hypergraph-Based Multi-View Action Recognition Using Event Cameras},
author={Gao, Yue and Lu, Jiaxuan and Li, Siqi and Li, Yipeng and Du, Shaoyi},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2024},
publisher={IEEE}
}