Awesome

<img src="figs/logo.png" align="center" width="42%"> <h3 align="center">Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective</h3> <a href="https://scholar.google.com/citations?user=a94WthkAAAAJ" target='_blank'>Pengfei Wei</a>1   <a href="https://scholar.google.com/citations?user=-j1j7TkAAAAJ" target='_blank'>Lingdong Kong</a>1,2   <a href="https://scholar.google.com/citations?user=2PxlmU0AAAAJ" target='_blank'>Xinghua Qu</a>1   <a href="https://scholar.google.com/citations?user=4FA6C0AAAAAJ" target='_blank'>Yi Ren</a>1   <a href="https://scholar.google.com/citations?user=0R20iBMAAAAJ" target='_blank'>Zhiqiang Xu</a>3   <a href="https://scholar.google.com/citations?user=XFtCe08AAAAJ" target='_blank'>Jing Jiang</a>4   <a href="https://scholar.google.com/citations?user=e6_J-lEAAAAJ" target='_blank'>Xiang Yin</a>1 1ByteDance AI Lab   2National University of Singapore   3MBZUAI   4University of Technology Sydney <a href="https://neurips.cc/" target='_blank'>NeurIPS 2023</a> <a href="https://arxiv.org/abs/2208.07365" target='_blank'> <img src="https://img.shields.io/badge/Paper-%F0%9F%93%83-firebrick"> </a> <a href="https://ldkong.com/TranSVAE" target='_blank'> <img src="https://img.shields.io/badge/Project-%F0%9F%94%97-red"> </a> <a href="https://huggingface.co/spaces/ldkong/TranSVAE" target='_blank'> <img src="https://img.shields.io/badge/Demo-%F0%9F%8E%AC-lightgray"> </a> <a href="https://zhuanlan.zhihu.com/p/553169112" target='_blank'> <img src="https://img.shields.io/badge/%E4%B8%AD%E8%AF%91%E7%89%88-%F0%9F%90%BC-lightblue"> </a> <a href="" target='_blank'> <img src="https://visitor-badge.laobi.icu/badge?page_id=ldkong1205.TranSVAE&left_color=gray&right_color=blue"> </a>

About

TranSVAE is a disentanglement framework designed for unsupervised video domain adaptation. It aims at disentangling the domain information from the data during the adaptation process. We consider the generation of cross-domain videos from two sets of latent factors: one encoding the static domain-related information and another encoding the temporal and semantic-related information. Objectives are enforced to constrain these latent factors to achieve domain disentanglement and transfer.

<img src="https://github.com/ldkong1205/TranSVAE/blob/main/figs/example.gif" align="center" width="60%"> Col1: Original sequences ("Human" $\mathcal{D}=\mathbf{P}_1$ and "Alien" $\mathcal{D}=\mathbf{P}_2$); Col2: Sequence reconstructions; Col3: Reconstructed sequences using $z_1^{\mathcal{D}},...,z_T^{\mathcal{D}}$; Col4: Domain transferred sequences with exchanged $z_d^{\mathcal{D}}$.

Visit our project page to explore more details. :paw_prints:

Updates

[2023.10] - We provide our extracted I3D features, kindly refer to this page for more details.
[2023.09] - TranSVAE was accepted to NeurIPS 2023! :tada:
[2022.08] - TranSVAE achieves 1st place among the UDA leaderboards of UCF-HMDB, Jester, and Epic-Kitchens, based on Paper-with-Code.
[2022.08] - Try a Gradio demo for domain disentanglement in TranSVAE at Hugging Face Spaces! :hugs:
[2022.08] - Our paper is available on arXiv, click here to check it out!

Highlights

<strong>Conceptual Comparison</strong>
<img src="figs/idea.jpg" width="70%">
<strong>Graphical Model</strong>
<img src="figs/graph.png" width="60%">
<strong>Framework Overview</strong>
<img src="figs/framework.png" width="96%">

Installation

Please refer to INSTALL.md for the installation details.

Data Preparation

Please refer to DATA_PREPARE.md for the details to prepare the 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, and 5Sprites datasets.

Getting Started

Please refer to GET_STARTED.md to learn more usage about this codebase.

Main Results

UCF101 - HMDB51

Method	Backbone	U<sub>101</sub> → H<sub>51</sub>	H<sub>51</sub> → U<sub>101</sub>	Average
DANN (JMLR'16)	ResNet-101	75.28	76.36	75.82
JAN (ICML'17)	ResNet-101	74.72	76.69	75.71
AdaBN (PR'18)	ResNet-101	72.22	77.41	74.82
MCD (CVPR'18)	ResNet-101	73.89	79.34	76.62
TA<sup>3</sup>N (ICCV'19)	ResNet-101	78.33	81.79	80.06
ABG (MM'20)	ResNet-101	79.17	85.11	82.14
TCoN (AAAI'20)	ResNet-101	87.22	89.14	88.18
MA<sup>2</sup>L-TD (WACV'22)	ResNet-101	85.00	86.59	85.80
Source-only	I3D	80.27	88.79	84.53
DANN (JMLR'16)	I3D	80.83	88.09	84.46
ADDA (CVPR'17)	I3D	79.17	88.44	83.81
TA<sup>3</sup>N (ICCV'19)	I3D	81.38	90.54	85.96
SAVA (ECCV'20)	I3D	82.22	91.24	86.73
CoMix (NeurIPS'21)	I3D	86.66	93.87	90.22
CO<sup>2</sup>A (WACV'22)	I3D	87.78	95.79	91.79
TranSVAE (Ours)	I3D	87.78	98.95	93.37
Oracle	I3D	95.00	96.85	95.93

Jester

Task	Source-only	DANN	ADDA	TA<sup>3</sup>N	CoMix	TranSVAE (Ours)	Oracle
J<sub>S</sub> → J<sub>T</sub>	51.5	55.4	52.3	55.5	64.7	66.1	95.6

Epic-Kitchens

Task	Source-only	DANN	ADDA	TA<sup>3</sup>N	CoMix	TranSVAE (Ours)	Oracle
D<sub>1</sub> → D<sub>2</sub>	32.8	37.7	35.4	34.2	42.9	50.5	64.0
D<sub>1</sub> → D<sub>3</sub>	34.1	36.6	34.9	37.4	40.9	50.3	63.7
D<sub>2</sub> → D<sub>1</sub>	35.4	38.3	36.3	40.9	38.6	50.3	57.0
D<sub>2</sub> → D<sub>3</sub>	39.1	41.9	40.8	42.8	45.2	58.6	63.7
D<sub>3</sub> → D<sub>1</sub>	34.6	38.8	36.1	39.9	42.3	48.0	57.0
D<sub>3</sub> → D<sub>2</sub>	35.8	42.1	41.4	44.2	49.2	58.0	64.0
Average	35.3	39.2	37.4	39.9	43.2	52.6	61.5

Ablation Study

UCF101 → HMDB51 <img src="figs/ablation-ucf2hmdb.png">

HMDB51 → UCF101 <img src="figs/ablation-hmdb2ucf.png">

Domain Transfer Example

Source (Original)	Target (Original)	Source (Original)	Target (Original)


Reconstruct ($\mathbf{z}_d^{\mathcal{S}}$ + $\mathbf{z}_t^{\mathcal{S}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}}$ + $\mathbf{z}_t^{\mathcal{T}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{S}}$ + $\mathbf{z}_t^{\mathcal{S}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}}$ + $\mathbf{z}_t^{\mathcal{T}}$)


Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{0}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{0}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{0}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{0}$)


Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{S}}$)	Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{T}}$)	Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{S}}$)	Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{T}}$)


Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{z}_t^{\mathcal{T}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{z}_t^{\mathcal{S}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{z}_t^{\mathcal{T}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{z}_t^{\mathcal{S}}$)

TODO List

License

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png" /></a> This work is under the <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

Acknowledgement

We acknowledge the use of the following public resources during the course of this work: 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, 5Sprites, 6I3D, and 7TRN.

Citation

If you find this work helpful, please kindly consider citing our paper:

@inproceedings{wei2023transvae,
  title = {Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective},
  author = {Wei, Pengfei and Kong, Lingdong and Qu, Xinghua and Ren, Yi and Xu, Zhiqiang and Jiang, Jing and Yin, Xiang},
  booktitle = {Advances in Neural Information Processing Systems}, 
  year = {2023},
}