Home

Awesome

You can easily run the models with comics-ocr package!

A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition

The purpose of this work is to enable research on comics by improving the text quality of the largest comics dataset shared in COMICS. During the process of generating high-quality text data, text detection and recognition models are trained and selected to create an end-to-end SOTA OCR pipeline for comics. The models are trained with custom-labeled data that we also share for text detection and recognition tasks.

COMICS vs COMICS TEXT+ Comparison

Description

This repository includes pointers to the code and data described in A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition

Getting Started

Dependencies

Execution Information

#  in the appropriate environment with MMOCR toolkit run the below commands
# Training
python tools/train.py {config_path e.g. fcenet_r50dcnv2_fpn_1500e_ctw1500_custom} --load-from {pretrained_model_path}
# Testing
python tools/test.py {config_path} {fine_tuned_model_path} --eval hmean-iou
#  in the appropriate environment with MMOCR toolkit run the below commands
# Training
python tools/train.py {config_path e.g. master_custom_dataset} --load-from {pretrained_model_path}
# Testing
python tools/test.py {config_path} {fine_tuned_model_path} --eval --eval acc
ocr_detector_config="./mmocr/work_dirs/fcenet_r50dcnv2_fpn_1500e_ctw1500_custom/fcenet_r50dcnv2_fpn_1500e_ctw1500_custom.py",
ocr_detector_checkpoint='./mmocr/work_dirs/fcenet_r50dcnv2_fpn_1500e_ctw1500_custom/best_0_hmean-iou:hmean_epoch_5.pth',
recog_config='./mmocr/work_dirs/master_custom_dataset/master_custom_dataset.py',
ocr_recognition_checkpoint='./mmocr/work_dirs/master_custom_dataset/best_0_1-N.E.D_epoch_4.pth',
det='FCE_CTW_DCNv2',
recog='MASTER'

text_extractor = TextExtractor(batch_mode=True,
                              det=det,
                              det_ckpt=ocr_detector_checkpoint,
                              det_config=ocr_detector_config,
                              recog=recog,
                              recog_ckpt=ocr_recognition_checkpoint,
                              recog_config=recog_config)
textbox_img_path = './imgs/sample_textbox.jpg'
ocr_text = text_extractor.extract_text(textbox_img_path)
print(ocr_text)

Results

Text Detection Benchmarking Results

Text Recognition Benchmarking Results

e2e Benchmarking Results

COMICS vs COMICS TEXT+ Comparison

We replicated the model presented in The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives. to see whether improvement on text quality would affect the results for Cloze Style Tasks. With COMICS Text+, we achieve SOTA results and can see improvement on our replcation results in almost all of the cases that relies heavily on text.

Replication results of Cloze Tasks

Results of cloze tasks with COMICS Text+

Authors

Gürkan Soykan
twitter LinkedIn

License

This project is licensed under the [NAME HERE] License - see the LICENSE.md file for details

Acknowledgments