Home

Awesome

MultiModal-Transformers-for-Nurse-Activity-Recognition [CVPR2022]

This repo is for official implementation of the paper "Multimodal transformer for Nurse Activity Recognition", published in the Fifth International Workshop on Computer Vision for Physiological Measurement (CVPM), in conjunction with CVPR 2022.

Introduction

This paper proposes a novel transformer based real world action recognition method. The proposed method involves two single modality transformer models, for performing action recogniton on Nurse-Activity-Recogntion-dataset(2019). First single moadlity transformer extract sptio-temporal features from skeletal joints of data the subjects and tries to recognize nurse activities from just single modality data. Second single modality transformer performs action recogniton by modeling correlation between acceleration of the performer. Both models are shwon as follows.

<img src="https://github.com/Momilijaz96/MMT_for_NCRC/blob/main/images/single.png ">
Single Modality Transformers (a) Skeletal Joints Model (b) Acceleraion Model

We propose a multi-modal transformer by combining both skeletal joints and acceleration data models' final cls tokens and also introuce an additional cross view fusion between both model's layer to develop stronger and better feature vectors for final action recognition. In fusion layer, the spatio-temporal skeletal joints tokens attend to the self-encoded acceleration tokens, which is repeated in all layers. Our result deonstrate the fusing acceleration and skeletal joints gives better action recogniton performance as compare to single modality transformers and simple fusion of both models wiithout cross view fusion.

alt text
Cross View Fusion Model (a) Cross View Fusion (b) MultiModal Transformer with CrossView Fusion

Results and Checkpoints

ModelAccuracyF1-scorePrecisionRecallCheckPoint
Skeleton Model76.767.069.170.5SkeletonModel.pth
Acceleration Model45.610.99.314.9AccModel.pth
Simple Fusion75.071.675.672.3SimpleFusion.pth
Cross View Fusion Model81.878.4.79.478.3CrossViewFusion.pth

Comparison with state-of-the-art

We compare our methods with all other existing solutions reported on the NCRC dataset, including the hand-crafted-feature-based KNN winning entry. NCRC dataset offers three different sensors data during course of performing action, including

Sensors UsedMethodValidation Accuracy
Acceleration and Motion Capture (Ours)Transformers81.8
Motion Capture and LocationKNN80.2
Motion CaptureST-GCN64.6
All ModalitiesCNN46.5
AccelerationRandom Forest43.1
Motion Capture and LocationGRU29.3

Graphs shown below reflect the effectivness of proposed solution. Pn right, he bar graph shows class Wise F1-score comparison with top two solutions posted for the nurse Activity Recogniton challenge dataset, STGCN and KNN. We can see for almost all classes our proposed solution out-performs the ST-GCN and hand-crafted feature based KNN method. On right, we have validation accuracy for all existing solutions as mentioned in table above.

<img src="https://github.com/Momilijaz96/MMT_for_NCRC/blob/main/images/results.png" width="700"/>

Usage

Requirements

Create a conda environment and install dependencies from given requirements.txt.

conda create --name myenv python=3.6
conda env create -f Tools/mmt_env.yml

Training

Download the data and put the path of acceleration and skeletal joints data and labels in the config file. Simply run the following command to train the crossview fusion model on the NurseCareActivityRecognition dataset.

python3 train_ncrc.py 

Note: For training another model, you can simply import relevant model in train_ncrc script.

Inference

For inference load desired chcekpoint and select a model name. For example for validation on NCRC data using CrossView fusion model, run. Where CKTP_PATH is the path to correspoding downloaded checkpoint model, and a valid model name can be

python3 validation_ncrc.py --ckpt_path [CKPT PATH] --model 'crossview_fusion_model'

Citation

If you find this useful in your work, please give a ⭐ and consider citing:

@article{momal2022multimodal_transformer,
  title={Multimodal Transformer for Nurse Activity Recognition},
  author={Momal Ijaz, Renato Diaz ,Chen Chen},
  journal={arXiv preprint arXiv:2204.04564},
  year={2022}}