Home

Awesome

Revisiting spatio-temporal layouts for compositional action recognition

Conference arXiv

Codebase for Revisiting spatio-temporal layouts for compositional action recognition.

Dependencies

If you use Poetry, running poetry install inside the project should suffice.

Preparing the data

Something-Something and Something-Else

You need to download the data splits and labels, the annotations, and the video sizes. Make sure that the annotations for the split you want to create datasets for are in a single directory. Then, use create_something_datasets.py to create the training and test datasets as:

python src/create_something_datasets.py --train_data_path "data/path-to-the-train-file.json"
                                        --val_data_path "data/path-to-the-val-file.json"
                                        --annotations_path "data/all-annotations-for-the-split/"

Action-Genome

You need to download the Action Genome data and the Charades data. Then, use create_action_genome_datasets.py to create the training and test datasets as:

python src/create_action_genome_datasets.py --action_genome_path "data/path-to-action-genome"
                                            --charades_path "data/path-to-charades"
                                            --save_datasets_path "data/directory-where-the-data-will-be-saved"

Model Zoo

Trained models currently available for the Something-Else and the Action Genome dataset. If a model is not currently available and you need it, feel free to reach out as we are still in the process of releasing the models (Including Something-Something V2).

ModelDatasetDownload
STLTSomething-Else Compositional Split DetectionsLink
LCFSomething-Else Compositional Split DetectionsLink
CAFSomething-Else Compositional Split DetectionsLink
CACNFSomething-Else Compositional Split DetectionsLink
STLTAction Genome OracleLink
STLTAction Genome DetectionsLink

Training and Inference

The codebase currently supports training and inference of STLT, LCF, CAF, CACNF models. Refer to the train.py and the inference.py scripts. Additonally, you need to download the Resnet3D, pretrained on Kinetics and similar from here, and add it in models/. To run inference with a trained model, e.g., STLT on Something-Else Compositional split, you can do the following:

poetry run python src/inference.py --checkpoint_path "models/comp_split_detect_stlt.pt" 
                                   --test_dataset_path "data/something-somethiing/comp_split_detect/val_dataset.json"
                                   --labels_path "data/something-something/comp_split_detect/something-something-v2-labels.json"
                                   --videoid2size_path "data/something-something/videoid2size.json"
                                   --dataset_type "layout"
                                   --model_name "stlt"
                                   --dataset_name "something"

To run inference with a pre-trained CACNF model you can do the following:

poetry run python src/inference.py --checkpoint_path "models/something-something/comp_split_detect_cacnf.pt"                                --test_dataset_path "data/something-something/comp_split_detect/val_dataset.json"                              --labels_path "data/something-something/comp_split_detect/something-something-v2-labels.json"
                                   --videoid2size_path "data/something-something/videoid2size.json" --batch_size 4 --dataset_type "multimodal"
                                   --model_name "cacnf"
                                   --dataset_name "something"
                                   --videos_path "data/something-something/dataset.hdf5"
                                   --resnet_model_path "models/something-something/r3d50_KMS_200ep.pth"

for both examples, make sure to provide your local paths to the dataset files and the pre-trained checkpoints.

Citation

If you find our code useful for your own research please use the following BibTeX entry.

@article{radevski2021revisiting,
  title={Revisiting spatio-temporal layouts for compositional action recognition},
  author={Radevski, Gorjan and Moens, Marie-Francine and Tuytelaars, Tinne},
  journal={arXiv preprint arXiv:2111.01936},
  year={2021}
}