Home

Awesome

Temporal Attentive Alignment for Video Domain Adaptation

PWC PWC

[Important] Please check here or https://github.com/cmhungsteve/TA3N for the most updated repo!


This is the official PyTorch implementation of our papers: <img align="right" src="webpage/OLIVES.png" width="10%">

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation
Min-Hung Chen, Zsolt Kira, Ghassan AlRegib (Advisor), Jaekwon Yoo, Ruxin Chen, Jian Zheng
International Conference on Computer Vision (ICCV), 2019 [Oral (acceptance rate: 4.6%)]
[arXiv][Oral][Poster][Open Access][Blog]

Temporal Attentive Alignment for Video Domain Adaptation
Min-Hung Chen, Zsolt Kira, Ghassan AlRegib (Advisor)
CVPR Workshop (Learning from Unlabeled Videos), 2019
[arXiv]

<p align="center"> <img src="webpage/Overview.png?raw=true" width="60%"> </p>

Although various image-based domain adaptation (DA) techniques have been proposed in recent years, domain shift in videos is still not well-explored. Most previous works only evaluate performance on small-scale datasets which are saturated. Therefore, we first propose two largescale video DA datasets with much larger domain discrepancy: UCF-HMDB<sub>full</sub> and Kinetics-Gameplay. Second, we investigate different DA integration methods for videos, and show that simultaneously aligning and learning temporal dynamics achieves effective alignment even without sophisticated DA methods. Finally, we propose Temporal Attentive Adversarial Adaptation Network (TA<sup>3</sup>N), which explicitly attends to the temporal dynamics using domain discrepancy for more effective domain alignment, achieving state-of-the-art performance on four video DA datasets.

<p align="center"> <img src="webpage/SOTA_small.png?raw=true" width="49%"> <img src="webpage/SOTA_large.png?raw=true" width="50%"> </p>

Contents

<!-- * [Video Demo](#video-demo) -->

Requirements


Dataset Preparation

Data structure

You need to extract frame-level features for each video to run the codes. To extract features, please check dataset_preparation/.

Folder Structure:

DATA_PATH/
  DATASET/
    list_DATASET_SUFFIX.txt
    RGB/
      CLASS_01/
        VIDEO_0001.mp4
        VIDEO_0002.mp4
        ...
      CLASS_02/
      ...

    RGB-Feature/
      VIDEO_0001/
        img_00001.t7
        img_00002.t7
        ...
      VIDEO_0002/
      ...

RGB-Feature/ contains all the feature vectors for training/testing. RGB/ contains all the raw videos.

There should be at least two DATASET folders: source training set and validation set. If you want to do domain adaption, you need to have another DATASET: target training set.

Input data

<!-- The input pre-trained feature representations will be released soon. --> <!-- ([`Link`]()) -->

File lists for training/validation

The file list list_DATASET_SUFFIX.txt is required for data feeding. Each line in the list contains the full path of the video folder, video frame number, and video class index. It looks like:

DATA_PATH/DATASET/RGB-Feature/VIDEO_0001/ 100 0
DATA_PATH/DATASET/RGB-Feature/VIDEO_0002/ 150 1
......

To generate the file list, please check dataset_preparation/.


Usage

<!-- * demo video: Run `./script_demo_video.sh` -->

All the commonly used variables/parameters have comments in the end of the line. Please check Options.

Training

All the outputs will be under the directory exp_path.

Testing

You can choose one of model_weights for testing. All the outputs will be under the directory exp_path.

<!-- #### Video Demo `demo_video.py` overlays the predicted categories and confidence values on one video. Please see "Results". -->

Options

Domain Adaptation

<!-- In both `./script_train_val.sh` and `./script_demo_video.sh`, there are several options related to our Domain Adaptation approaches. -->

In ./script_train_val.sh, there are several options related to our DA approaches.

<!-- * options for the DA approaches: * discrepancy-based: DAN, JAN * adversarial-based: RevGrad * Normalization-based: AdaBN * Ensemble-based: MCD -->

More options

For more details of all the arguments, please check opts.py.

Notes

The options in the scripts have comments with the following types:


Citation

If you find this repository useful, please cite our papers:

@inproceedings{chen2019temporal,
  title={Temporal attentive alignment for large-scale video domain adaptation},
  author={Chen, Min-Hung and Kira, Zsolt and AlRegib, Ghassan and Woo, Jaekwon and Chen, Ruxin and Zheng, Jian},
  booktitle={IEEE International Conference on Computer Vision (ICCV)},
  year={2019},
  url={https://arxiv.org/abs/1907.12743}
}

@article{chen2019taaan,
  title={Temporal Attentive Alignment for Video Domain Adaptation},
  author={Chen, Min-Hung and Kira, Zsolt and AlRegib, Ghassan},
  journal={CVPR Workshop on Learning from Unlabeled Videos},
  year={2019},
  url={https://arxiv.org/abs/1905.10861}
}

Acknowledgments

Some codes are borrowed from TSN, pytorch-tsn, TRN-pytorch, and Xlearn.


Contact

Min-Hung Chen <br> cmhungsteve AT gatech DOT edu