

Representation Learning via Global Temporal Alignment and Cycle-Consistency

Isma Hadji<sup>1</sup>, Konstantinos G. Derpanis<sup>1</sup>, and Allan D. Jepson<sup>1</sup>

<sup>1</sup>Samsung AI Center (SAIC) - Toronto   

<div align="center"> <img src="demo/teaser.png" width="600px"/> </div>

This work introduces a representation learning approach based on (globally) aligning pairs of temporal sequences (e.g., video) depicting the same process (e.g., human action). Our training objective is to learn an element-wise embedding function that supports the alignment process. For example, here we illustrate the alignment (denoted by black dashed lines) in the embedding space between videos of the same human action (i.e., tennis forehand) containing significant variations in their appearances and dynamics. Empirically, we show that our learned embeddings are sensitive to both human pose and fine-grained temporal distinctions, while being invariant to appearance, camera viewpoint, and background.


The proposed alignment loss enables various downstream applications. Take a look at this video for examples. Watch the video


This repo provides implementation of:


To run the code create a conda environment with packages provided in requirements.txt:

If you get an error about dtw and opencv-python

Our loss

Our main code and models is released under the Attribution-NonCommercial-ShareAlike 4.0 International License.


To train the backbone architecture used in our paper with our SmoothDTW loss and Global Cycle Consistency loss, simply run:


Here we provide a sample evaluation procedure for the synchronization task on the PennAction dataset. To get Kendall's Tau results, simply run:


If you use this code or our models, please cite our paper:

  title={Representation Learning via Global Temporal Alignment and Cycle-Consistency},
  author={Hadji, Isma and Derpanis, Konstantinos G and Jepson, Allan D},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},


The data processing, loading and training setup code was modified from this very useful repo: https://github.com/google-research/google-research/tree/master/tcc