Home

Awesome

<div align="center">

MRN: Multiplexed Routing Network <br/> for Incremental Multilingual Text Recognition

ICCV 2023 ArXiv preprint Blog LICENSE

Method |IMLTR Dataset | Getting Started | Citation

</div>

It started as code for the paper:

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition (Accepted by ICCV 2023)

This project is a toolkit for the novel scenario of Incremental Multilingual Text Recognition (IMLTR), the project supports many incremental learning methods and proposes a more applicable method for IMLTR: Multiplexed Routing Network (MRN) and the corresponding dataset. The project provides an efficient framework to assist in developing new methods and analyzing existing ones under the IMLTR task, and we hope it will advance the IMLTR community.

<div align="center"> <img width="1066" alt="image" src="https://github.com/simplify23/MRN/assets/39580716/fc2b6e12-f511-4a55-9cb4-46ca3e03b004"> </div>

Methods

Incremental Learning Methods

you can change config config/crnn_mrn.py for different il methods or setting.

common=dict(
    il="mrn",  # joint_mix | joint_loader | base | lwf | wa | ewc | der  | mrn
    memory="random",  # None | random
    memory_num=2000,
    start_task = 0  # checkpoint start
)

Text Recognition Methods

you can change config config/crnn_mrn.py for different text recognition modules or setting.

""" Model Architecture """
common=dict(
    batch_max_length = 25,
    imgH = 32,
    imgW = 256,
)
model=dict(
    model_name="TRBA",
    Transformation = "TPS",      #None TPS
    FeatureExtraction = "ResNet",    #VGG ResNet SVTR
    SequenceModeling = "BiLSTM",  #None BiLSTM
    Prediction = "Attn",           #CTC Attn
    num_fiducial=20,
    input_channel=4,
    output_channel=512,
    hidden_size=256,
)

IMLTR Dataset

The Dataset can be downloaded from BaiduNetdisk(passwd:c07h).

dataset
├── MLT17_IL
│   ├── test_2017
│   ├── train_2017
├── MLT19_IL
│   ├── test_2019
│   ├── train_2019

Incremental MLT17: MLT17 has 68,613 training instances and 16,255 validation instances, which are from 6 scripts and 9 languages: Chinese, Japanese, Korean, Bangla, Arabic, Italian, English, French, and German. The last four use Latin script. Incremental MLT17 use the validation set for test due to the unavailability of test data. Tasks are split by scripts and modeled sequentially. Special symbols are discarded at the preprocessing step as with no linguistic meaning.

Incremental MLT19: MLT19 has 89,177 text instances coming from 7 scripts. Since the inaccessibility of test set, Incremental MLT19 randomly split the training instances to 9:1 script-by-script, for model training and test. To be consistent with Incremental MLT17 dataset, we discard the Hindi script and also special symbols. Statistics of the two datasets are shown in the following.

DatasetCategories
Task1Task2Task3Task4Task5Task6
ChineseLatinJapaneseKoreanArabicBangla
MLT171Train Instance2687474114609563137113237
Test Instance5291107313501230983713
Train Class18953251620112473112
MLT192Train Instance2897529215324610742303542
Test Instance3225882590679470393
Train Class20862201728116073102

Getting Started

Dependency

conda create -n mrn python=3.7 -y
conda activate mrn
conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
pip3 install lmdb pillow torchvision nltk natsort fire tensorboard tqdm opencv-python einops timm mmcv shapely scipy
pip3 install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.1/index.html

Training

python3 tiny_train.py --config=config/crnn_mrn.py --exp_name CRNN_real

Arguments

tiny_train.py (as a default, evaluate trained model on IMLTR datasets at the end of training.

Config Detail

For detailed configuration modifications please use the config file config/crnn_mrn.py

common=dict(
    exp_name="TRBA_MRN",  # Where to store logs and models
    il="mrn",  # joint_mix | joint_loader | base | lwf | wa | ewc | der  | mrn
    memory="random",  # None | random
    memory_num=2000,
    batch_max_length = 25,
    imgH = 32,
    imgW = 256,
    manual_seed=111,
    start_task = 0
)

""" Model Architecture """
model=dict(
    model_name="TRBA",
    Transformation = "TPS",      #None TPS
    FeatureExtraction = "ResNet",    #VGG ResNet
    SequenceModeling = "BiLSTM",  #None BiLSTM
    Prediction = "Attn",           #CTC Attn
    num_fiducial=20,
    input_channel=4,
    output_channel=512,
    hidden_size=256,
)



""" Optimizer """
optimizer=dict(
    schedule="super", #default is super for super convergence, 1 for None, [0.6, 0.8] for the same setting with ASTER
    optimizer="adam",
    lr=0.0005,
    sgd_momentum=0.9,
    sgd_weight_decay=0.000001,
    milestones=[2000,4000],
    lrate_decay=0.1,
    rho=0.95,
    eps=1e-8,
    lr_drop_rate=0.1
)


""" Data processing """
train = dict(
    saved_model="",  # "path to model to continue training"
    Aug="None",  # |None|Blur|Crop|Rot|ABINet
    workers=4,
    lan_list=["Chinese","Latin","Japanese", "Korean", "Arabic", "Bangla"],
    valid_datas=[
                 "../dataset/MLT17_IL/test_2017",
                 "../dataset/MLT19_IL/test_2019"
                 ],
    select_data=[
                 "../dataset/MLT17_IL/train_2017",
                 "../dataset/MLT19_IL/train_2019"
                 ],
    batch_ratio="0.5-0.5",
    total_data_usage_ratio="1.0",
    NED=True,
    batch_size=256,
    num_iter=10000,
    val_interval=5000,
    log_multiple_test=None,
    grad_clip=5,
)

Data Analysis

The experimental results of each task are recorded in data_any.txt and can be used for analysis of the data.

Acknowledgements

This implementation has been based on these repositories:

Citation

Please consider citing this work in your publications if it helps your research.

@article{zheng2023mrn,
  title={MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition},
  author={Zheng, Tianlun and Chen, Zhineng and Huang, BingChen and Zhang, Wei and Jiang, Yu-Gang},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023}
}

License

This project is released under the Apache 2.0 license.

Footnotes

  1. Nayef, N., et al. (2017). MLT 2017.

  2. Nayef, N., et al. (2019). MLT 2019.