Home

Awesome

TASED-Net

TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection (ICCV 2019)

Overview

TASED-Net is a novel fully-convolutional network architecture for video saliency detection. The main idea is simple but effective: spatially decoding 3D video features while jointly aggregating all the temporal information. TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports. We observe that our model is especially better at attending to salient moving objects.

TASED-Net is currently leading the leaderboard of DHF1K online benchmark.

ModelYear  NSS↑    CC ↑    SIM↑  AUC-J↑s-AUC↑
TASED-Net (updated)  20192.7970.4890.3930.8970.712
TASED-Net (reported)  20192.6670.4700.3610.8950.712
SalEMA20192.5740.4490.4660.8900.667
STRA-Net20192.5580.4580.3550.8950.663
ACLNet20182.3540.4340.3150.8900.601
SalGAN20172.0430.3700.2620.8660.709
SALICON20151.9010.3270.2320.8570.590
GBVS20071.4740.2830.1860.8280.554

Video Saliency Detection

Video saliency detection aims to model the gaze fixation patterns of humans when viewing a dynamic scene. Because the predicted saliency map can be used to prioritize the video information across space and time, this task has a number of applications such as video surveillance, video captioning, video compression, etc.

Examples

We compare our TASED-Net to ACLNet, which was the previously leading state-of-the-art method. As shown in the examples below, TASED-Net is better at attending to the salient information. We also would like to point out that TASED-Net has a much smaller network size (82 MB v.s. 252 MB).

Code Usage

First, clone this repository and download this weight file. Then, just run the code using

$ python run_example.py

This will generate frame-wise saliency maps. You can also specify the input and output directories as command-line arguments. For example,

$ python run_example.py ./example ./output

Notes

Citation

@inproceedings{min2019tased,
  title={TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection},
  author={Min, Kyle and Corso, Jason J},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={2394--2403},
  year={2019}
}