Awesome
Shape Robust Text Detection with Progressive Scale Expansion Network
A reimplement of PSENet with mxnet-gluon. Just train on ICPR.
- Support TensorboardX
- Support hybridize to depoly
- Fast, 45ms/per_image when we resize max_side to 784
Thanks for the author's (@whai362) great work!
Requirements
-
Python 2.7
-
mxnet1.4.0
-
pyclipper
-
Polygon2
-
OpenCV 4+ (for c++ version pse)
-
TensorboardX
Introduction
To reimplement PSENet by Gluon, here are some problem that I occur.
Diceloss about kernels isn't convergence.
- First, I doubt the label about kernel is not correct. However, I verify them again so that they are absolute right.
- Second, I doubt the
mx.nd.split
cannot be backwarded. However the diceloss about score map bysplit
is well. So it cannot be raise this problem. - Here the network is based on resnet50, and the output of FPN is input_size/4,so there may not be any text instance in min_kernel_map. So I set the number of kernels to 3
Maybe upsampling output to input_size is a good choice. I will try it in my spare time.
Evaluation
Dataset | Recall | Precision | F1-score | Speed |
---|---|---|---|---|
ICPR(max_side=784) | 0.56 | 0.67 | 0.61 | 45ms/image |
Usage
Pretrained-models
- gluoncv_model_zoo:resnet50_v1b, you can replace it with others,the default path of pretrained-model in
~/.mxnet/
Also you can download maskrcnn_coco from gluoncv_model_zoo
to get a warm start.
Make
cd pse
make
Here I add -Wl,-undefined,dynamic_lookup
to avoid some compile error, which is different from original PSENet.
Train
python scripts/train.py $data_path $ckpt
data_path
: path of dataset, which the prefix of image and annoation must be same, for example, a.jpg, a.txtckpt
: the filename of pretrained-mdel
Loss curve:
Text loss | Kernel loss | All_loss | Pixel_accuracy |
Some Results
Inference
python eval.py $data_path $ckpt $output_dir $gpu_or_cpu
TODO:
- Upsamping to input_size
- Train on ICDAR and evaluate