Home

Awesome

Continual Online Action Detection Transformer

This repository contains the Online Action Recognition Experiments from our work on Continual Transformers.

This repository is a fork of the official source code for the "OadTR: Online Action Detection with Transformers" (ICCV2021) ["Paper"], with model variations specified in different branches.

Set-up

Package Dependencies

Install the dependencies from the original OadTR project

pip install pytorch torchvision numpy json tensorboard-logger

Install Continual Transformer blocks

pip install --upgrade git+https://github.com/LukasHedegaard/continual-transformers.git

Pretrained features

When you have downloaded and placed the THUMOS featues under ~/data, you can select the features by appending the following to your python command:

Experiments

CoOadTR

From the main branch the CoOadTR model can be run with the following: command

python main.py --num_layers 1 --enc_layers 64 --cpe_factor 1

Here, num_layers denotes the number of transformer blocks (1 or 2), enc_layers is the sequence length, and cpe_factor is a multiplier for the number of unique circular positional embeddings (1>=x>=2).

OadTR ablations

Each conducted experiment has its own branch. An overview of the ablated features and associated results is found in the table below for the TSN-Anet features:

Encoder-layersDecoderClass-tokenCircular encodingmAP (%)branchcommand
3✔︎✔︎-57.8original (baseline)python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64
3-✔︎-56.8no-decoderpython main.py --num_layers 3 --enc_layers 64
2-✔︎-55.6no-decoderpython main.py --num_layers 2 --enc_layers 64
2---55.5no-decoder-no-cls-tokenpython main.py --num_layers 2 --enc_layers 64
1--✔︎ (len n)55.7no-decoder-no-cls-token-shifting-tokenspython main.py --num_layers 1 --enc_layers 64
1--✔︎ (len 2n)55.8no-decoder-no-cls-token-shifting-tokens-2xpython main.py --num_layers 1 --enc_layers 64

THUMOS

Modelbranchcommand
OadTRoriginalpython main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --feature <FEATURE>
OadTR-b2no-decoder-no-cls-tokenpython main.py --num_layers 2 --enc_layers 64 --feature <FEATURE>
OadTR-b2no-decoder-no-cls-tokenpython main.py --num_layers 1 --enc_layers 64 --feature <FEATURE>
CoOadTR-b2mainpython main.py --num_layers 2 --enc_layers 64 --feature <FEATURE>
CoOadTR-b1mainpython main.py --num_layers 1 --enc_layers 64 --feature <FEATURE>

Where <FEATURE> is either "anet" or "kin" for ActivityNet and Kinetics pretrained features, respectively. --dim_feature should be 3072 for "anet" , and 4096 for "kin".

TVSeries

Modelbranchcommand
OadTRoriginal-tvseriespython main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --feature <FEATURE>
OadTR-b2no-decoder-no-cls-token-tvseriespython main.py --num_layers 2 --enc_layers 64 --feature <FEATURE>
OadTR-b2no-decoder-no-cls-token-tvseriespython main.py --num_layers 1 --enc_layers 64 --feature <FEATURE>
CoOadTR-b2mainpython main.py --dataset tvseries --num_layers 2 --enc_layers 64 --feature <FEATURE>
CoOadTR-b1mainpython main.py --dataset tvseries --num_layers 1 --enc_layers 64 --feature <FEATURE>

Where <FEATURE> is the name of your .pickle file of extracted features (either A.Net or Kin. features), placed in the ~/data folder.