Awesome

An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition

Kiyoon Kim, Davide Moltisanti, Oisin Mac Aodha, Laura Sevilla-Lara
In BMVC 2022. arXiv Presentation video

Dataset downloads (labels only)

Installation

conda create -n videoai python=3.9
conda activate videoai
conda install pytorch==1.12.1 torchvision cudatoolkit=10.2 -c pytorch
### For RTX 30xx GPUs,
#conda install pytorch==1.12.1 torchvision cudatoolkit=11.3 -c pytorch
 

git clone --recurse-submodules https://github.com/kiyoon/verb_ambiguity
cd verb_ambiguity
git submodule update --recursive
cd submodules/video_datasets_api
pip install -e .
cd ../experiment_utils
pip install -e .
cd ../..
pip install -e .

Optional: Pillow-SIMD and libjepg-turbo to improve dataloading performance.
Run this at the end of the installation:

conda uninstall -y --force pillow pil jpeg libtiff libjpeg-turbo
pip   uninstall -y         pillow pil jpeg libtiff libjpeg-turbo
conda install -yc conda-forge libjpeg-turbo
CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
conda install -y jpeg libtiff

Running feature experiments using pre-extracted features

Download pre-extracted features.

Download EPIC-Kitchens-100 TSM features
Download EPIC-Kitchens-100 TSM feature neighbours (optional): Using this neighbour cache will reduce the preparation time of the training by skipping neighbour search.
Download Confusing-HMDB-102 TSM features

Exract in data/EPIC_KITCHENS_100 or data/hmdb51.
Run the training code. Change the dataset and exp_name variables to select different experiments.

#!/bin/bash
exp_root="$HOME/experiments"  # Experiment results will be saved here.

export CUDA_VISIBLE_DEVICES=0
num_gpus=1
export VAI_USE_NEIGHBOUR_CACHE=True     # Only for EPIC-Kitchens-100-SPMV. It will bypass neighbour search if the cache is available, otherwise it will run and cache the results.
export VAI_NUM_NEIGHBOURS=15
export VAI_PSEUDOLABEL_THR=0.1

subfolder="k=$VAI_NUM_NEIGHBOURS,thr=$VAI_PSEUDOLABEL_THR"           # Name subfolder as you like.

dataset=epic100_verb_features
#dataset=confusing_hmdb_102_features

exp_name="concat_RGB_flow_assume_negative"
#exp_name="concat_RGB_flow_weak_assume_negative"
#exp_name="concat_RGB_flow_binary_labelsmooth"
#exp_name="concat_RGB_flow_binary_negative_labelsmooth"
#exp_name="concat_RGB_flow_binary_focal"
#exp_name="concat_RGB_flow_entropy_maximise"
#exp_name="concat_RGB_flow_mask_binary_ce"
#exp_name="concat_RGB_flow_pseudo_single_binary_ce"

# Training script
# -S creates a subdirectory in the name of your choice. (optional)
tools/run_singlenode.sh train $num_gpus -R $exp_root -D $dataset -c:d verbambig -M ch_beta.featuremodel -E $exp_name -c:e verbambig -S "$subfolder" #--wandb_project kiyoon_kim_verbambig

# Evaluating script
# -l -2 loads the best model (with the highest heldout validation accuracy)
# -p saves the predictions. (optional)
tools/run_singlenode.sh eval $num_gpus -R $exp_root -D $dataset -c:d verbambig -M ch_beta.featuremodel -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -p #--wandb

Running feature extraction or end-to-end experiments.

Prepare the dataset

EPIC-Kitchens-100-SPMV

Download rgb_frames and flow_frames. script.
Extract tar files. RGB script, flow script.
Clone EPIC-Kitchens-100 annotations at data/EPIC_KITCHENS_100/epic-kitchens-100-annotations.
Gulp the dataset. First, generate flow annotations using this and use this to gulp.
Generate dataset split files. RGB_script, flow_script
Get TSM pre-trained models from EPIC-Kitchens Action Models, and save them into data/pretrained/epic100.
Download the multi-verb annotations at data/EPIC_KITCHENS_100/ek100-val-multiple-verbs-halfagree-halfconfident-include_original-20220427.csv.
data/EPIC_KITCHENS_100 directory should have five directories and one file: epic-kitchens-100-annotations, splits_gulp_flow, splits_gulp_rgb, gulp_flow, gulp_rgb, ek100-val-multiple-verbs-halfagree-halfconfident-include_original-20220427.csv.

Confusing-HMDB-102

Download HMDB-51 videos. script
Extract them into frames of images. script
Generate optical flow. script
Gulp the dataset. script (Use rgb and flow_onefolder modality, and --class_folder).
Generate dataset split files. script (Use --confusion 2) Or just download the splits.
data/hmdb51 directory must have at least four directories: confusing102_splits_gulp_flow, confusing102_splits_gulp_rgb, gulp_flow, gulp_rgb.

Putting all together,

# Install unrar, nvidia-docker
# Execute from the root directory of this repo.
# Don't run all of them together. Some things may not run 

GPU_arch=pascal  # pascal / turing / ampere

conda activate videoai
submodules/video_datasets_api/tools/hmdb/download_hmdb.sh data/hmdb51
submodules/video_datasets_api/tools/hmdb/hmdb_extract_frames.sh data/hmdb51/videos data/hmdb51/frames
submodules/video_datasets_api/tools/hmdb/extract_flow_multigpu.sh data/hmdb51/frames data/hmdb51/flow $GPU_arch 0
python submodules/video_datasets_api/tools/gulp_jpeg_dir.py data/hmdb51/frames data/hmdb51/gulp_rgb rgb --class_folder
python submodules/video_datasets_api/tools/gulp_jpeg_dir.py data/hmdb51/flow data/hmdb51/gulp_flow flow_onefolder --class_folder
python tools/datasets/generate_hmdb_splits.py data/hmdb51/gulp_rgb data/hmdb51/confusing102_splits_gulp_rgb data/hmdb51/testTrainM
ulti_7030_splits --mode gulp --confusion 2
python tools/datasets/generate_hmdb_splits.py data/hmdb51/gulp_rgb data/hmdb51/confusing102_splits_gulp_rgb data/hmdb51/testTrainM
ulti_7030_splits --mode gulp --confusion 2

Run training, evaluation and feature extraction.

#!/bin/bash

exp_root="$HOME/experiments"  # Experiment results will be saved here.

export CUDA_VISIBLE_DEVICES=0
num_gpus=1
export VAI_NUM_NEIGHBOURS=15
export VAI_PSEUDOLABEL_THR=0.1


## Choose dataset
#dataset=epic100_verb
dataset=confusing_hmdb_102
export VAI_SPLITNUM=1   # only for confusing_hmdb_102 dataset.

## Choose model (RGB or flow)
model="tsm_resnet50_nopartialbn"
#model="ch_epic100.tsm_resnet50_flow"

## Choose loss
## For feature extraction, use "ce"
exp_name="ce"
#exp_name="assume_negative"
#exp_name="weak_assume_negative"
#exp_name="binary_labelsmooth"
#exp_name="binary_negative_labelsmooth"
#exp_name="binary_focal"
#exp_name="entropy_maximise"
#exp_name="mask_binary_ce"
#exp_name="pseudo_single_binary_ce"


# Name subfolder as you like.
if [[ $dataset == "epic100_verb" ]]
then
    subfolder="k=$VAI_NUM_NEIGHBOURS,thr=$VAI_PSEUDOLABEL_THR"
    extra_args=()
else
    subfolder="k=$VAI_NUM_NEIGHBOURS,thr=$VAI_PSEUDOLABEL_THR,split=$VAI_SPLITNUM"
    extra_args=(-c:d verbambig)
fi

# Training script
# -S creates a subdirectory in the name of your choice. (optional)
tools/run_singlenode.sh train $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" ${extra_args[@]} #--wandb_project kiyoon_kim_verbambig

if [[ $dataset == "epic100_verb" ]]
then
# Evaluating script
# -l -2 loads the best model (with the highest heldout validation accuracy)
# -p saves the predictions. (optional)
tools/run_singlenode.sh eval $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -p ${extra_args[@]} #--wandb
else
    echo "For Confusing-HMDB-102, there is no evaluation script. See summary.csv file and get the best number per metric."
fi


if [[ $exp_name == "ce" ]]
then
# Extract features
# -l -2 loads the best model (with the highest heldout validation accuracy)
tools/run_singlenode.sh feature $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -s traindata_testmode ${extra_args[@]} #--wandb
tools/run_singlenode.sh feature $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -s val ${extra_args[@]} #--wandb
fi

Once features are extracted, copy to data/ directory and edit dataset_configs/ch_verbambig/epic100_verb_features.py or dataset_configs/ch_verbambig/confusing_hmdb_102_features.py to update the corresponding feature path.

Refer to the Running feature experiments using pre-extracted features section for running experiments using the features.

Citing the paper

If you find our work or code useful, please cite:

@inproceedings{kim2022ambiguity,
  author    = {Kiyoon Kim and Davide Moltisanti and Oisin Mac Aodha and Laura Sevilla-Lara},
  title     = {An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition},
  booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
  publisher = {{BMVA} Press},
  year      = {2022},
  url       = {https://bmvc2022.mpi-inf.mpg.de/0356.pdf}
}

Framework Used

This repository is a fork of PyVideoAI framework.
Learn how to use it with PyVideoAI-examples notebooks.