Awesome
Flickr8K-CN
Flickr8K-CN is a bilingual (English-to-Chinese) extension of the popular Flickr8K set, used for evaluating image captioning in a cross-lingual setting.
Chinese sentences | Flickr8k-train | Flickr8k-val | Flickr8k-test |
---|---|---|---|
human written | :white_check_mark: | :white_check_mark: | :white_check_mark: |
human translation | :x: | :x: | :white_check_mark: |
machine translation (baidu) | :white_check_mark: | :white_check_mark: | :white_check_mark: |
machine translation (google) | :white_check_mark: | :white_check_mark: | :white_check_mark: |
Data
Sentences
- Original English sentences
- Chinese sentences written by native Chinese speakers
- Chinese sentences generated by Baidu translation (icmr2016 version, version 20160815)
- Chinese sentences generated by Google translation (icmr2016 version, version 20160816)
- Chinese sentences generated by human translation (only the test set is covered)
Dataset split
- imageids of 6K training images, 1k validation images, 1k test images
Image features
- 1,024-dim GoogleNet pool5, read by bigfile.py
Citations
- Xirong Li, Weiyu Lan, Jianfeng Dong, Hailong Liu, Adding Chinese Captions to Images, ACM ICMR 2016