Home

Awesome

Cross-lingual Vision-Language Navigation

We introduce a new dataset for Cross-Lingual Vision-Language Navigation.

Cross-lingual Room-to-Room (XL-R2R) Dataset

The XL-R2R dataset is built upon the R2R dataset and extends it with Chinese instructions. XL-R2R preserves the same splits as in R2R and thus consists of train, val-seen, and val-unseen splits with both English and Chinese instructions, and test split with English instructions only.

Data is formatted as follows:

{
  "distance": float,
  "scan": str,
  "path_id": int,
  "path": [str x num_steps],
  "heading": float,
  "instructions": [str x 3],
}

For the test set, only the first path_id (starting location) is included (a test server is hosted by Anderson et al. for scoring uploaded trajectories).