Awesome

THUE-ACT-50: A Real-World Event-Based Action Recognition Benchmark

📢 Update: We are excited to announce the release of a larger and more comprehensive dataset, THUMV-EACT-50, which extends the THUE-ACT-50 to include multi-view action recognition. For more details, please visit THU-MV-EACT-50.

Introduced by the paper "Action Recognition and Benchmark Using Event Cameras" in TPAMI 2023, THUE-ACT-50 stands as a large-scale, real-world event-specific action recognition dataset with more than 4 times the size of the current largest event-based action recognition dataset. It contains 50 action categories and is primarily designed for whole-body motions and indoor healthcare applications. This repository provides access to the dataset, alongside detailed information about its contents and structure.

Dataset Overview

THUE-ACT-50 is designed to address the limitations of existing event-based action recognition datasets, which are often too small and limited in the range of actions they cover. The dataset consists of two parts: the standard THUE-ACT-50 and a more challenging version,THUE-ACT-50 CHL, which is designed to test the robustness of algorithms under challenging conditions.

The dataset comprises a diverse set of action categories, including whole-body motions, indoor healthcare applications, detail-oriented actions, confusing actions, human-object interactions, and two-player interactive movements. With a total of 10,500 video recordings for the standard THUE-ACT-50 and 2,330 recordings for the challenging THUE-ACT-50 CHL, this dataset provides an extensive and varied collection of action sequences for researchers to explore and evaluate their models.

Dataset Description

Standard THUE-ACT-50

50 event-specific action categories
105 socially recruited subjects
10,500 video recordings
CeleX-V event camera with a spatial resolution of 1280x800
Two oblique front views of the actor

Challenging THUE-ACT-50 CHL

Challenging scenarios with different illumination conditions and action magnitudes
50 event-specific action categories
18 on-campus students as subjects
2,330 video recordings
DAVIS346 event camera with a spatial resolution of 346x260
Front, left, right, and back views
Two different scenarios: long corridor and open hall
Challenging conditions including: <img src="figures/different-light.jpg" alt="Different-light" style="zoom:18%;" />

List of Actions

ID	Action	ID	Action	ID	Action	ID	Action	ID	Action
A0	Walking	A10	Cross arms	A20	Calling with phone	A30	Fan	A40	Check time
A1	Running	A11	Salute	A21	Reading	A31	Open umbrella	A41	Drink water
A2	Jump up	A12	Squat down	A22	Tai chi	A32	Close umbrella	A42	Wipe face
A3	Running in circles	A13	Sit down	A23	Swing objects	A33	Put on glasses	A43	Long jump
A4	Falling down	A14	Stand up	A24	Throw	A34	Take off glasses	A44	Push up
A5	Waving one hand	A15	Sit and stand	A25	Staggering	A35	Pick up	A45	Sit up
A6	Waving two hands	A16	Knead face	A26	Headache	A36	Put on bag	A46	Shake hands (two-players)
A7	Clap	A17	Nod head	A27	Stomachache	A37	Take off bag	A47	Fighting (two-players)
A8	Rub hands	A18	Shake head	A28	Back pain	A38	Put object into bag	A48	Handing objects (two-players)
A9	Punch	A19	Thumb up	A29	Vomit	A39	Take object out of bag	A49	Lifting chairs (two-players)

Evaluation Criteria

To evaluate the performance of event-based action recognition methods on the THUE-ACT-50 and THUE-ACT-50 CHL datasets, we divided the subjects in a ratio of 8:2 to create disjoint identity sets for training and testing. The training and test sets of the THUE-ACT-50 dataset contain 85 and 20 persons, respectively, while the training and test sets of the THUE-ACT-50 CHL dataset contain 14 and 4 persons, respectively.

We report the following evaluation metrics for each dataset:

Top-1 Accuracy: The percentage of test videos for which the model correctly predicts the action category with the highest confidence.
Top-N Accuracy: The percentage of test videos for which the correct action category is within the top N predictions made by the model.

Dataset Download

We're pleased to announce the release of the THUE-ACT-50 and THUE-ACT-50 CHL datasets.

THUE-ACT-50

OneDrive: Download Here
BaiduYun: Download Here (Access Code: 4csp)

Note: After decompression, the dataset will require about 332GB of storage space.

THUE-ACT-50 CHL

Google Drive: Download Here
BaiduYun: Download Here (Access Code: fdnd)

Note: After decompression, the dataset will occupy approximately 4.6GB of storage space.

Dataset Format

In the two datasets, the division for training and test sets can be found in the train.txt and test.txt files, respectively. Each line consists of File Name and Action ID.

The preprocessing operations for the 2 datasets can be found in dataset.py.

THUE-ACT-50

In the THU-EACT-50 dataset, which is provided in the .csv format, the data is structured with 5 columns as follows:

y: Represents the y-coordinate of the event.
x: Represents the x-coordinate of the event.
b: This is an additional brightness value provided by the CeleX-V camera. It's worth noting that for our method, this value is not utilized.
p: The polarity value. It contains three categories: 1, -1, and 0. In our experiments, we ignore the 0 values and consider 1 as positive polarity and -1 as negative polarity.
t: Represents the timestamp of the event.

THUE-ACT-50 CHL

For the THU-EACT-50-CHL dataset, which is available in the .npy format, each line contains 4 elements:

x: Represents the x-coordinate of the event.
y: Represents the y-coordinate of the event.
t: Represents the timestamp of the event.
p: The polarity value. In this dataset, the polarity only includes standard values of 1 and 0. Here, 1 represents positive polarity, and 0 represents negative polarity.

Acknowledgements

We would like to express our sincere gratitude to Tsinghua University, partner companies, and organizations for their invaluable support and collaboration in making this dataset possible. Additionally, we extend our thanks to all the volunteers who participated in the data collection process. Their contributions have been instrumental in the development and evaluation of this benchmark.

License

This dataset is licensed under the MIT License.

Citing Our Work

If you find this dataset beneficial for your research, please cite our works:

@article{gao2023action,
  title={Action Recognition and Benchmark Using Event Cameras},
  author={Gao, Yue and Lu, Jiaxuan and Li, Siqi and Ma, Nan and Du, Shaoyi and Li, Yipeng and Dai, Qionghai},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2023},
  volume={45},
  number={12},
  pages={14081-14097},
  publisher={IEEE}
}

@article{gao2024hypergraph,
  title={Hypergraph-Based Multi-View Action Recognition Using Event Cameras},
  author={Gao, Yue and Lu, Jiaxuan and Li, Siqi and Li, Yipeng and Du, Shaoyi},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  publisher={IEEE}
}