Awesome

ODM

method

This repository is the official implementation for the following paper:

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Chen Duan, Pei Fu, Shan Guo, Qianyi Jiang, Xiaoming Wei, CVPR 2024

Part of the code is inherited from oCLIP.

Data

Download SynthText

ODM model

Download ODM

Train

Single-GPU:

python -u src/training/main.py     \
--save-frequency 20     \
--report-to tensorboard     \
--train-data /path/to/data      \
--char-dict-pth /path/to/char    \
--gt_dir /path/to/gt \
--csv-img-key filepath     \
--csv-caption-key title     \
--warmup 10000     \
--batch-size=32  \
--lr=1e-4    \
--wd=0.1     \
--epochs=100     \
--workers=4    \
--model RN50_Seg_Clip  \
--gpu 0 \
--logs=/path/to/save \
--prefix demo \

Multi-GPU

python -u src/training/main.py     \
--save-frequency 20     \
--report-to tensorboard     \
--train-data /path/to/data      \
--char-dict-pth /path/to/char    \
--gt_dir /path/to/gt \
--csv-img-key filepath     \
--csv-caption-key title     \
--warmup 10000     \
--batch-size=32  \
--lr=1e-4    \
--wd=0.1     \
--epochs=100     \
--workers=4    \
--model RN50_Seg_Clip  \
--logs=/path/to/save \
--prefix demo \

Visualization

Citation

@inproceedings{duan2024odm,
  title={ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting},
  author={Duan, Chen and Fu, Pei and Guo, Shan and Jiang, Qianyi and Wei, Xiaoming},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15587--15597},
  year={2024}
}