Home

Awesome

FR-AGCN

Forward-reverse Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition

Abstract

In this work, we propose the novel forward-reverse adaptive graph convolutional networks (FR-AGCN) for skeleton-based action recognition. The sequences of joints and bones, as well as their reverse information, are modeled in the multi-stream networks at the same time. By extracting the features of forward and reverse deep information and performing multi-stream fusion, this strategy can significantly improve the recognition accuracy.

This paper has been submitted to Neurocomputing. This work has been carefully revised based on the professional comments of the editors and reviewers.

Article available online: https://doi.org/10.1016/j.neucom.2021.12.054

DateState
Sep 03, 2021manuscript submitted to journal
Oct 06, 2021revised and reconsidered
Oct 25, 2021revision submitted to journal
Dec 07, 2021accepted with minor revision
Dec 09, 2021revision submitted to journal
Dec 13, 2021accepted

Environment

PyTorch version >=0.4

Notes

We separate the three datasets for experiments: 'NTU RGB+D' & 'NTU RGB+D 120' & UAV-Human.

Here are some important notes:

  1. Please perform data preprocessing before training.

We set two parameters when using the interframe interpolation strategy for data augmentation, i.e., fu and S. The frame numbers of all samples are unified to fu first. Here we define the data segmentation factor S, which means that down-sampling is performed every S frames in the temporal dimension.

You can try to change these parameters to increase or decrease the size of the input data, which will have a certain impact on the final model performance.

If the memory is not enough, it is recommended to separate the benchmark for preprocessing. Moreover, it is recommended to set enough virtual memory.

We conducted detailed experiments on the CS benchmark of NTU 60. The results are as follows:

fuSFJ(%)FB(%)FJB(%)
600186.8586.8188.88
600287.4487.6889.29
600386.0287.0688.77
600485.9386.6988.34
300186.4286.7988.86
300285.7385.9888.23

Therefore, it is recommended to set fu=600 and S=2 to process three datasets. If the GPU memory is insufficient, please reduce the batchsize as appropriate during training.

  1. Please use the parameters saved in the training process to test before performing the multi-stream fusion operation.

You need to select the parameter file based on the training result and modify the test file in the config.

  1. For ease of description, we define single-stream and multi-stream networks according to input data. Specifically, for single-stream input, FJ-AGCN, RJ-AGCN, FB-AGCN, RB-AGCN indicate that the input to the AGCN are forward joints data, reverse joints data, forward bones data, reverse bones data, respectively. For multi-stream input, FR-AGCN represents the networks that integrates the above four single streams. Moreover, FJB-AGCN means that FJ-AGCN and FB-AGCN are terminally fused, FRJ-AGCN indicates that FJ-AGCN and RJ-AGCN are finally fused. RJB-AGCN, FRB-AGCN can be deduced by analogy.

Here, we compare the performance of using each type of input data separately and perform score fusion to obtain the final prediction. The results based on AGCN are shown as follows:

MethodsCS(%)CV(%)X-Sub(%)X-Set(%)CSv1(%)CSv2(%)
FJ-AGCN87.4494.0881.2381.5740.0865.66
RJ-AGCN87.7894.0081.2382.1439.2363.40
FB-AGCN87.6893.9883.5283.6438.4363.15
RB-AGCN88.0393.6683.4483.6638.8663.75
FRJ-AGCN88.7495.1783.2583.8641.9767.68
FRB-AGCN89.5594.9985.6285.5041.1366.51
FJB-AGCN89.2995.3485.5885.7742.7868.75
RJB-AGCN89.8595.2085.4786.0542.2267.92
FR-AGCN90.4695.8386.6086.9943.9869.50

Data Preparation

For 'NTU RGB+D':

For 'NTU RGB+D 120':

For 'UAV-Human':

Training & Testing

For 'NTU RGB+D':

For 'NTU RGB+D 120':

For 'UAV-Human':

Acknowledgements

This work is based on

2s-AGCN (https://github.com/lshiwjx/2s-AGCN)

Thanks to the original authors for their work! Our work is only the improvement of the data preprocessing part based on it. However, we hope that the research content of this forward and inverse sequences can be inspiring for someone.

Meanwhile, we are very grateful to the creators of these three datasets, i.e., NTU RGB+D 60, NTU RGB+D 120, UAV-Human. Your selfless work has made a great contribution to the computer vision community!

Last but not least, the authors are very grateful for the selfless and constructive suggestions of the reviewers.

Citation

@article{HU2022624,
title = {Forward-reverse adaptive graph convolutional networks for skeleton-based action recognition},
journal = {Neurocomputing},
volume = {492},
pages = {624-636},
year = {2022},
author = {Zesheng Hu and Zihao Pan and Qiang Wang and Lei Yu and Shumin Fei},
}

Contact

If you find that the above description is not clear, or you have other issues that need to be communicated when conducting the experiment, please leave a message on github. Besides, we look forward to discussions about skeleton-based action recognition.

Now I am a Phd student at Nanjing Normal University. Feel free to contact me via email:

 `zeshenghu@njnu.edu.cn`