Home

Awesome

Word Images

IMGUR5K Handwriting Dataset

To run the code for downloading the urls and generate corresponding annotations :

Usage: python download_imgur5k.py --dataset_info_dir <dir_with_annotaion_and_hashes> --output_dir <path_to_store_images>

Requirements

IMGUR5K download code works with

Downloading images of IMGUR5K

Run the command and set <path_to_store_images> to the target image directory

How IMGUR5K download works

The code checks the validity of urls by checking the hash of the url with the groundtruth md5 hash. If the image is pristine, the annotations are added to the generated annotations file and the respective splits.

Full documentation

IMGUR5K is shared as a set of image urls with annotations. This code downloads the images and verifies the hash to the image to avoid data contamination.

REQUIRED FILES:

Output:

[All imgur5k_annotations_*.json's format is similar to the format of imgur5k_annotations.json]

NOTE: Apart from the ~5K images employed in TextStyleBrush paper, ~4K more images are added to the dataset to foster the research in Handwritten Recognition.

Statistics

DescriptionCount
# Page Images8,177
# Word Images230,573
# Lexicons (case-sensitive)49,317

The ratio for train/val/test splits is 80%:10%:10% at the level of page images and the details are provided in the respective json files created as part of the output.

Disclaimer: The dataset is provided using public links to each image, and the availability of these images is controlled by IMGUR or the original user (who uploaded it).

Contribution

See the CONTRIBUTING file for how to help out.

License

IMGUR5K is Creative Commons Attribution-NonCommercial 4.0 International Public licensed, as found in the LICENSE file.

Citation

If you find this data useful, please consider citing our paper:

Praveen Krishnan, Rama Kovvuri, Guan Pang, Boris Vassilev and Tal Hassner, TextStyleBrush: Transfer of Text Aesthetics from a Single Example, arXiv: 2106.08385 2021.

@misc{krishnan2021textstylebrush,
      title={TextStyleBrush: Transfer of Text Aesthetics from a Single Example}, 
      author={Praveen Krishnan and Rama Kovvuri and Guan Pang and Boris Vassilev and Tal Hassner},
      year={2021},
      eprint={2106.08385},
      archivePrefix={arXiv},
}