Home

Awesome

Deep Learning for Video Retrieval by Natural Language

Videos are everywhere. Video retrieval, i.e., finding videos that meet the information need of a specific user, is important for a wide range of applications including communication, education, entertainment, business, security etc. Among multiple ways of expressing the information need, a natural-language text is the most intuitive to start a retrieval process. For instance, to find video shots showing a person in front of a blackboard talking or writing in a classroom. Such a query can be submitted easily, by typing or speech recognition, to a video retrieval system. Given a video as a sequence of frames and a query as a sequence of words, a fundamental problem in video retrieval by natural language is how to properly associate visual and linguistic information presented in sequential order.

This page maintains an (incomplete) list of state-of-the-art open-source methods and datasets, with the TRECVID Ad-hoc Video Search (AVS) benchmark evaluation as the test bed.

Open-source methods

Datasets

Leaderboard

TRECVID 2016 AVS

MethodinfAP
Dual Encoding (Dong et al. CVPR'19)0.159
W2VV++ (Li et al. MM'19)0.151
VSE++ (Faghri et al. BMVC'18, produced by Li et al. MM'19)0.123
VideoStory (Habibian et al. PAMI'16)0.087
Markatopoulou et al. ICMR'170.064
Le et al. TRECVID'160.054
Markatopoulou et al. TRECVID'160.051
W2VV (Dong et al. T-MM'18, produced by Li et al. MM'19)0.050

TRECVID 2017 AVS

MethodinfAP
W2VV++ (Li et al. MM'19)0.213
Dual Encoding (Dong et al. CVPR'19)0.208
Snoek et al. TRECVID'170.206
Ueki et al. TRECVID'170.159
VSE++ (Faghri et al. BMVC'18, produced by Li et al. MM'19)0.154
VideoStory (Habibian et al. PAMI'17)0.150
Nguyen et al. TRECVID'170.120
W2VV (Dong et al. T-MM'18, produced by Li et al. MM'19)0.081

TRECVID 2018 AVS

MethodinfAP
Dual Encoding (Dong et al. CVPR'19)0.126
Li et al. TRECVID'180.121
W2VV++ (Li et al. MM'19)0.106
Huang et al. TRECVID'180.087
Bastan et al. TRECVID'180.082
VSE++ (Faghri et al. BMVC'18, produced by Li et al. MM'19)0.074

TRECVID 2019 AVS

work in progress ...

References