Home

Awesome

Flickr8K-CN

Flickr8K-CN is a bilingual (English-to-Chinese) extension of the popular Flickr8K set, used for evaluating image captioning in a cross-lingual setting.

Chinese sentencesFlickr8k-trainFlickr8k-valFlickr8k-test
human written:white_check_mark::white_check_mark::white_check_mark:
human translation:x::x::white_check_mark:
machine translation (baidu):white_check_mark::white_check_mark::white_check_mark:
machine translation (google):white_check_mark::white_check_mark::white_check_mark:

Data

Sentences

  1. Original English sentences
  2. Chinese sentences written by native Chinese speakers
  3. Chinese sentences generated by Baidu translation (icmr2016 version, version 20160815)
  4. Chinese sentences generated by Google translation (icmr2016 version, version 20160816)
  5. Chinese sentences generated by human translation (only the test set is covered)

Dataset split

Image features

  1. 1,024-dim GoogleNet pool5, read by bigfile.py

Citations

  1. Xirong Li, Weiyu Lan, Jianfeng Dong, Hailong Liu, Adding Chinese Captions to Images, ACM ICMR 2016