Awesome
An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition
Kiyoon Kim, Davide Moltisanti, Oisin Mac Aodha, Laura Sevilla-Lara
In BMVC 2022. arXiv
Presentation video
Dataset downloads (labels only)
Installation
conda create -n videoai python=3.9
conda activate videoai
conda install pytorch==1.12.1 torchvision cudatoolkit=10.2 -c pytorch
### For RTX 30xx GPUs,
#conda install pytorch==1.12.1 torchvision cudatoolkit=11.3 -c pytorch
git clone --recurse-submodules https://github.com/kiyoon/verb_ambiguity
cd verb_ambiguity
git submodule update --recursive
cd submodules/video_datasets_api
pip install -e .
cd ../experiment_utils
pip install -e .
cd ../..
pip install -e .
Optional: Pillow-SIMD and libjepg-turbo to improve dataloading performance.
Run this at the end of the installation:
conda uninstall -y --force pillow pil jpeg libtiff libjpeg-turbo
pip uninstall -y pillow pil jpeg libtiff libjpeg-turbo
conda install -yc conda-forge libjpeg-turbo
CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
conda install -y jpeg libtiff
Running feature experiments using pre-extracted features
- Download pre-extracted features.
- Download EPIC-Kitchens-100 TSM features
- Download EPIC-Kitchens-100 TSM feature neighbours (optional): Using this neighbour cache will reduce the preparation time of the training by skipping neighbour search.
- Download Confusing-HMDB-102 TSM features
- Exract in
data/EPIC_KITCHENS_100
ordata/hmdb51
. - Run the training code. Change the dataset and exp_name variables to select different experiments.
#!/bin/bash
exp_root="$HOME/experiments" # Experiment results will be saved here.
export CUDA_VISIBLE_DEVICES=0
num_gpus=1
export VAI_USE_NEIGHBOUR_CACHE=True # Only for EPIC-Kitchens-100-SPMV. It will bypass neighbour search if the cache is available, otherwise it will run and cache the results.
export VAI_NUM_NEIGHBOURS=15
export VAI_PSEUDOLABEL_THR=0.1
subfolder="k=$VAI_NUM_NEIGHBOURS,thr=$VAI_PSEUDOLABEL_THR" # Name subfolder as you like.
dataset=epic100_verb_features
#dataset=confusing_hmdb_102_features
exp_name="concat_RGB_flow_assume_negative"
#exp_name="concat_RGB_flow_weak_assume_negative"
#exp_name="concat_RGB_flow_binary_labelsmooth"
#exp_name="concat_RGB_flow_binary_negative_labelsmooth"
#exp_name="concat_RGB_flow_binary_focal"
#exp_name="concat_RGB_flow_entropy_maximise"
#exp_name="concat_RGB_flow_mask_binary_ce"
#exp_name="concat_RGB_flow_pseudo_single_binary_ce"
# Training script
# -S creates a subdirectory in the name of your choice. (optional)
tools/run_singlenode.sh train $num_gpus -R $exp_root -D $dataset -c:d verbambig -M ch_beta.featuremodel -E $exp_name -c:e verbambig -S "$subfolder" #--wandb_project kiyoon_kim_verbambig
# Evaluating script
# -l -2 loads the best model (with the highest heldout validation accuracy)
# -p saves the predictions. (optional)
tools/run_singlenode.sh eval $num_gpus -R $exp_root -D $dataset -c:d verbambig -M ch_beta.featuremodel -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -p #--wandb
Running feature extraction or end-to-end experiments.
Prepare the dataset
EPIC-Kitchens-100-SPMV
- Download
rgb_frames
andflow_frames
.script
.
Extract tar files.RGB script
,flow script
. - Clone EPIC-Kitchens-100 annotations at
data/EPIC_KITCHENS_100/epic-kitchens-100-annotations
. - Gulp the dataset. First, generate flow annotations using this and use this to gulp.
- Generate dataset split files.
RGB_script
,flow_script
- Get TSM pre-trained models from EPIC-Kitchens Action Models, and save them into
data/pretrained/epic100
. - Download the multi-verb annotations at
data/EPIC_KITCHENS_100/ek100-val-multiple-verbs-halfagree-halfconfident-include_original-20220427.csv
. data/EPIC_KITCHENS_100
directory should have five directories and one file:epic-kitchens-100-annotations
,splits_gulp_flow
,splits_gulp_rgb
,gulp_flow
,gulp_rgb
,ek100-val-multiple-verbs-halfagree-halfconfident-include_original-20220427.csv
.
Confusing-HMDB-102
- Download HMDB-51 videos.
script
- Extract them into frames of images.
script
- Generate optical flow.
script
- Gulp the dataset.
script
(Usergb
andflow_onefolder
modality, and--class_folder
). - Generate dataset split files.
script
(Use--confusion 2
) Or just download the splits. data/hmdb51
directory must have at least four directories:confusing102_splits_gulp_flow
,confusing102_splits_gulp_rgb
,gulp_flow
,gulp_rgb
.
Putting all together,
# Install unrar, nvidia-docker
# Execute from the root directory of this repo.
# Don't run all of them together. Some things may not run
GPU_arch=pascal # pascal / turing / ampere
conda activate videoai
submodules/video_datasets_api/tools/hmdb/download_hmdb.sh data/hmdb51
submodules/video_datasets_api/tools/hmdb/hmdb_extract_frames.sh data/hmdb51/videos data/hmdb51/frames
submodules/video_datasets_api/tools/hmdb/extract_flow_multigpu.sh data/hmdb51/frames data/hmdb51/flow $GPU_arch 0
python submodules/video_datasets_api/tools/gulp_jpeg_dir.py data/hmdb51/frames data/hmdb51/gulp_rgb rgb --class_folder
python submodules/video_datasets_api/tools/gulp_jpeg_dir.py data/hmdb51/flow data/hmdb51/gulp_flow flow_onefolder --class_folder
python tools/datasets/generate_hmdb_splits.py data/hmdb51/gulp_rgb data/hmdb51/confusing102_splits_gulp_rgb data/hmdb51/testTrainM
ulti_7030_splits --mode gulp --confusion 2
python tools/datasets/generate_hmdb_splits.py data/hmdb51/gulp_rgb data/hmdb51/confusing102_splits_gulp_rgb data/hmdb51/testTrainM
ulti_7030_splits --mode gulp --confusion 2
Run training, evaluation and feature extraction.
#!/bin/bash
exp_root="$HOME/experiments" # Experiment results will be saved here.
export CUDA_VISIBLE_DEVICES=0
num_gpus=1
export VAI_NUM_NEIGHBOURS=15
export VAI_PSEUDOLABEL_THR=0.1
## Choose dataset
#dataset=epic100_verb
dataset=confusing_hmdb_102
export VAI_SPLITNUM=1 # only for confusing_hmdb_102 dataset.
## Choose model (RGB or flow)
model="tsm_resnet50_nopartialbn"
#model="ch_epic100.tsm_resnet50_flow"
## Choose loss
## For feature extraction, use "ce"
exp_name="ce"
#exp_name="assume_negative"
#exp_name="weak_assume_negative"
#exp_name="binary_labelsmooth"
#exp_name="binary_negative_labelsmooth"
#exp_name="binary_focal"
#exp_name="entropy_maximise"
#exp_name="mask_binary_ce"
#exp_name="pseudo_single_binary_ce"
# Name subfolder as you like.
if [[ $dataset == "epic100_verb" ]]
then
subfolder="k=$VAI_NUM_NEIGHBOURS,thr=$VAI_PSEUDOLABEL_THR"
extra_args=()
else
subfolder="k=$VAI_NUM_NEIGHBOURS,thr=$VAI_PSEUDOLABEL_THR,split=$VAI_SPLITNUM"
extra_args=(-c:d verbambig)
fi
# Training script
# -S creates a subdirectory in the name of your choice. (optional)
tools/run_singlenode.sh train $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" ${extra_args[@]} #--wandb_project kiyoon_kim_verbambig
if [[ $dataset == "epic100_verb" ]]
then
# Evaluating script
# -l -2 loads the best model (with the highest heldout validation accuracy)
# -p saves the predictions. (optional)
tools/run_singlenode.sh eval $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -p ${extra_args[@]} #--wandb
else
echo "For Confusing-HMDB-102, there is no evaluation script. See summary.csv file and get the best number per metric."
fi
if [[ $exp_name == "ce" ]]
then
# Extract features
# -l -2 loads the best model (with the highest heldout validation accuracy)
tools/run_singlenode.sh feature $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -s traindata_testmode ${extra_args[@]} #--wandb
tools/run_singlenode.sh feature $num_gpus -R $exp_root -D $dataset -M $model -E $exp_name -c:e verbambig -S "$subfolder" -l -2 -s val ${extra_args[@]} #--wandb
fi
Once features are extracted, copy to data/
directory and edit dataset_configs/ch_verbambig/epic100_verb_features.py
or dataset_configs/ch_verbambig/confusing_hmdb_102_features.py
to update the corresponding feature path.
Refer to the Running feature experiments using pre-extracted features section for running experiments using the features.
Citing the paper
If you find our work or code useful, please cite:
@inproceedings{kim2022ambiguity,
author = {Kiyoon Kim and Davide Moltisanti and Oisin Mac Aodha and Laura Sevilla-Lara},
title = {An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year = {2022},
url = {https://bmvc2022.mpi-inf.mpg.de/0356.pdf}
}
Framework Used
This repository is a fork of PyVideoAI framework.
Learn how to use it with PyVideoAI-examples notebooks.