Home

Awesome

Video-guided Machine Translation

This repo contains the starter code for the VATEX Translation Challenge for Video-guided Machine Translation (VMT), aiming at translating a source language description into the target language with video information as additional spatiotemporal context.

VMT is introduced in our ICCV oral paper "VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research". VATEX is a new large-scale multilingual video description dataset, which contains over 41,250 videos and 825,000 captions in both English and Chinese and half of these captions are English-Chinese translation pairs. For more details, please check the latest version of the paper: https://arxiv.org/abs/1904.03493.

<p align="demo"> <img src="vmt_demo.png" width=75% > </p>

Prerequisites

Training

1. Download corpus files and the extracted video features

First, under the vmt/ directory, download train/val/test json file:

./data/download.sh

Then download the I3D video features from here for trainval and here for test

# set up your DIR/vatex_features for storing large video features
mkdir DIR/vatex_features

wget https://vatex-feats.s3.amazonaws.com/trainval.zip -P DIR/vatex_features
unzip DIR/vatex_features/trainval.zip
wget https://vatex-feats.s3.amazonaws.com/public_test.zip -P DIR/vatex_features
unzip DIR/vatex_features/public_test.zip

cd vmt/
ln -s DIR/vatex_features data/vatex_features

2. Training

To train the baseline VMT model:

python train.py

The default hyperparamters are set in configs.yaml.

Evaluation

Run

python eval.py

Specify the model name in configs.yaml. The script will generate a json file for submission to the VMT Challenge on CodaLab.

Results

The baseline VMT model achieves the following performance on corpus-level bleu score (the numbers here are slightly different from those in the paper due to different evaluation setups. For fair comparison, please compare with the performance here):

ModelEN -> ZHZH -> EN
BLEU-431.124.6

On the evaluation server, we report cumulative corpus-level BLEU score (up to 4-gram) and each individual n-gram score for reference, shown as B-1, ..., B-4.

Model performance is evaluated by cumulative BLEU-4 score in the challenge.

Reference

Please cite our paper if you use our code or dataset:

@InProceedings{Wang_2019_ICCV,
author = {Wang, Xin and Wu, Jiawei and Chen, Junkun and Li, Lei and Wang, Yuan-Fang and Wang, William Yang},
title = {VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}