Awesome
DEVIAS: Learning Disentangled Video Representations of Action and Scene
This repository is the official implementation of the paper "DEVIAS: Learning Disentangled Video Representations of Action and Scene", accepted as an Oral presentation at ECCV 2024🔥🔥.
[Project page]
Installation
Please prepare the environment following INSTALL.md.
Dataset
The following datasets are used in this project. You can download them via the provided links.
- Kinetics-400 , UCF-101 , HVU
- SCUBA (ICCV2023), HAT(Neurips2022)
- Something-Something V2, ActivityNet, Diving48
Please download the datasets to replicate the results of this project. For the detail setting, please see DATASET.md.
Training
The instruction for training is in TRAIN.md.
We also provide the checkpoints of our DEVIAS trained on UCF-101 and Kinetics-400, and the scene model trained on Places-365 used to generate pseudo scene label. Please see the Drive.
Evaluation
We evaluate DEVIAS based on action and scene recognition performances across both seen and unseen action-scene combination scenarios. The instruction for evaluation is in EVAL.md.
Downstream Experiments
We find the disentangled action and scene representation of DEVIAS is beneficial for various downstream datasets. The instruction for downstream experiments is in DOWNSTREAM.md.
Acknowledgement
This codebase was built upon the work of VideoMAE and DivE. We appreciate their contributions to the original code.
Citation
If you find our code and work useful, please consider citing:
@article{bae2024devias,
title={DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video Understanding},
author={Bae, Kyungho and Ahn, Geo and Kim, Youngrae and Choi, Jinwoo},
journal={European Conference on Computer Vision},
year={2024}
}