Home

Awesome

Introduction

vedastr is an open source scene text recognition toolbox based on PyTorch. It is designed to be flexible in order to support rapid implementation and evaluation for scene text recognition task.

Features

License

This project is released under Apache 2.0 license.

Benchmark and model zoo

Note:

MODELCASE SENSITIVEIIIT5k_3000SVTIC03_867IC13_1015IC15_2077SVTPCUTE80AVERAGE
ResNet-CTCFalse87.9784.5490.5488.2867.9972.7177.0881.58
ResNet-FCFalse88.8088.4192.8590.3472.3279.3876.7484.24
TPS-ResNet-BiLSTM-AttentionFalse90.9388.7293.8992.1276.4180.3179.5186.49
Small-SATRNFalse91.9788.1094.8193.5075.6483.8880.9087.19

TPS : Spatial transformer network

Small-SATRN: On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention, training phase is case sensitive while testing phase is case insensitive.

AVERAGE : Average accuracy over all test datasets

CASE SENSITIVE : If true, the output is case sensitive and contain common characters. If false, the output is not case sensetive and contains only numbers and letters.

Installation

Requirements

We have tested the following versions of OS and softwares:

Install vedastr

  1. Create a conda virtual environment and activate it.
conda create -n vedastr python=3.6 -y
conda activate vedastr
  1. Install PyTorch and torchvision following the official instructions, e.g.,
conda install pytorch torchvision -c pytorch
  1. Clone the vedastr repository.
git clone https://github.com/Media-Smart/vedastr.git
cd vedastr
vedastr_root=${PWD}
  1. Install dependencies.
pip install -r requirements.txt

Prepare data

  1. Download Lmdb data from deep-text-recognition-benchmark, which contains training, validation and evaluation data. Note: we use the ST dataset released by ASTER.

  2. Make directory data as follows:

cd ${vedastr_root}
mkdir ${vedastr_root}/data
  1. Put the download LMDB data into this data directory, the structure of data directory will look like as follows:
data
└── data_lmdb_release
    ├── evaluation
    ├── training
    │   ├── MJ
    │   │   ├── MJ_test
    │   │   ├── MJ_train
    │   │   └── MJ_valid
    │   └── ST
    └── validation

Train

  1. Config

Modify configuration files in configs/ according to your needs(e.g. configs/tps_resnet_bilstm_attn.py).

  1. Run
# train using GPUs with gpu_id 0, 1, 2, 3
python tools/train.py configs/tps_resnet_bilstm_attn.py "0, 1, 2, 3" 

Snapshots and logs by default will be generated at ${vedastr_root}/workdir/name_of_config_file(you can specify workdir in config files).

Test

  1. Config

Modify configuration as you wish(e.g. configs/tps_resnet_bilstm_attn.py).

  1. Run
# test using GPUs with gpu_id 0, 1
./tools/dist_test.sh configs/tps_resnet_bilstm_attn.py path/to/checkpoint.pth "0, 1" 

Inference

  1. Run
# inference using GPUs with gpu_id 0
python tools/inference.py configs/tps_resnet_bilstm_attn.py checkpoint_path img_path "0"

Deploy

  1. Install volksdep following the official instructions

  2. Benchmark (optional)

# Benchmark model using GPU with gpu_id 0
CUDA_VISIBLE_DEVICES="0" python tools/benchmark.py configs/resnet_ctc.py checkpoint_path out_path --dummy_input_shape "3,32,100"

More available arguments are detailed in tools/deploy/benchmark.py.

The result of resnet_ctc is as follows(test device: Jetson AGX Xavier, CUDA:10.2):

frameworkversioninput shapedata typethroughput(FPS)latency(ms)
PyTorch1.5.0(1, 1, 32, 100)fp326415.81
TensorRT7.1.0.16(1, 1, 32, 100)fp321099.66
PyTorch1.5.0(1, 1, 32, 100)fp1611310.75
TensorRT7.1.0.16(1, 1, 32, 100)fp163083.55
TensorRT7.1.0.16(1, 1, 32, 100)int8(entropy_2)4492.38
  1. Export model to ONNX format
# export model to onnx using GPU with gpu_id 0
CUDA_VISIBLE_DEVICES="0" python tools/torch2onnx.py configs/resnet_ctc.py checkpoint_path --dummy_input_shape "3,32,100" --dynamic_shape

More available arguments are detailed in tools/torch2onnx.py.

  1. Inference SDK

You can refer to FlexInfer for details.

Citation

If you use this toolbox or benchmark in your research, please cite this project.

@misc{2020vedastr,
    title  = {vedastr: A Toolbox for Scene Text Recognition},
    author = {Sun, Jun and Cai, Hongxiang and Xiong, Yichao},
    url    = {https://github.com/Media-Smart/vedastr},
    year   = {2020}
}

Contact

This repository is currently maintained by Jun Sun(@ChaseMonsterAway), Hongxiang Cai (@hxcai), Yichao Xiong (@mileistone).