Awesome

DEVIAS: Learning Disentangled Video Representations of Action and Scene

This repository is the official implementation of the paper "DEVIAS: Learning Disentangled Video Representations of Action and Scene", accepted as an Oral presentation at ECCV 2024🔥🔥.
[Project page]

Installation

Please prepare the environment following INSTALL.md.

Dataset

The following datasets are used in this project. You can download them via the provided links.

Kinetics-400 , UCF-101 , HVU
SCUBA (ICCV2023), HAT(Neurips2022)
Something-Something V2, ActivityNet, Diving48

Please download the datasets to replicate the results of this project. For the detail setting, please see DATASET.md.

Training

The instruction for training is in TRAIN.md.
We also provide the checkpoints of our DEVIAS trained on UCF-101 and Kinetics-400, and the scene model trained on Places-365 used to generate pseudo scene label. Please see the Drive.

Evaluation

We evaluate DEVIAS based on action and scene recognition performances across both seen and unseen action-scene combination scenarios. The instruction for evaluation is in EVAL.md.

Downstream Experiments

We find the disentangled action and scene representation of DEVIAS is beneficial for various downstream datasets. The instruction for downstream experiments is in DOWNSTREAM.md.

Acknowledgement

This codebase was built upon the work of VideoMAE and DivE. We appreciate their contributions to the original code.

Citation

If you find our code and work useful, please consider citing:

@article{bae2024devias,
  title={DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video Understanding},
  author={Bae, Kyungho and Ahn, Geo and Kim, Youngrae and Choi, Jinwoo},
  journal={European Conference on Computer Vision},
  year={2024}
}