Awesome

Case-Sensitive-Scene-Text-Recognition-Datasets

This project is part of the research work of the following paper:

UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World (CVPR 2020) [GitHubRepo]

If you find this project useful in your research, you are encouraged to cite our paper:

@inproceedings{long2020unreal,
  title={UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World},
  author={Long, Shangbang and Yao, Cong},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

Background

The annotations of $4$ of the most popular scene text recognition datasets are incomplete. They are IIIT5K, SVT, SVTP, and CUTE-80. They only provide case-insensitive annotations and no punctuation marks.

For better understanding of scene text recognition models, we re-annotate these datasets and release them.

Dataset Statistics

Dataset Name	#Image
CUTE80	288
IIIT5K test set	3000
IIIT5K training set	2000
SVT test set	647
SVT training set	257
SVTP test set	645