Awesome

E2TIMT

E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

The official repository for ICDAR 2023 Conference paper:

Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, and Chengqing Zong. E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation. In The 17th Document Analysis and Recognition (ICDAR 2023), San José, California, USA. August 21-26, 2023. pp 70–88. Cham. Springer Nature Switzerland. arXiv, Springer_Link

1. Introduction

Text image machine translation (TIMT) aims to translate texts embedded in images from one source language to another target language. Existing methods, both two-stage cascade and one-stage end- to-end architectures, suffer from different issues. The cascade models can benefit from the large-scale optical character recognition (OCR) and MT datasets but the two-stage architecture is redundant. The end-to- end models are efficient but suffer from training data deficiency. To this end, in our paper, we propose an end-to-end TIMT model fully making use of the knowledge from existing OCR and MT datasets to pursue both an effective and efficient framework. More specifically, we build a novel modal adapter effectively bridging the OCR encoder and MT decoder. End-to-end TIMT loss and cross-modal contrastive loss are utilized jointly to align the feature distribution of the OCR and MT tasks. Extensive experiments show that the proposed method outperforms the existing two-stage cascade models and one-stage end-to-end models with a lighter and faster architecture. Furthermore, the ablation studies verify the generalization of our method, where the proposed modal adapter is effective to bridge various OCR and MT models.

2. Usage

2.1 Requirements

python==3.6.2
pytorch == 1.3.1
torchvision==0.4.2
numpy==1.19.1
lmdb==0.99
PIL==7.2.0
jieba==0.42.1
nltk==3.5
six==1.15.0
natsort==7.0.1

2.2 Train the Model

bash ./train_model_guide.sh

2.3 Evaluate the Model

bash ./test_model_guide.sh

2.4 Datasets

We use the dataset released in E2E_TIT_With_MT.

3. Acknowledgement

The reference code of the provided methods are:

EriCongMa/E2E_TIT_With_MT
clovaai/deep-text-recognition-benchmark
OpenNMT/OpenNMT-py
THUNLP-MT/THUMT

We thanks for all these researchers who have made their codes publicly available.

4. Citation

If you want to cite our paper, please use this bibtex version:

Springer offered bib citation format

@InProceedings{10.1007/978-3-031-41731-3_5,
author="Ma, Cong
and Zhang, Yaping
and Tu, Mei
and Zhao, Yang
and Zhou, Yu
and Zong, Chengqing",
editor="Fink, Gernot A.
and Jain, Rajiv
and Kise, Koichi
and Zanibbi, Richard",
title="E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation",
booktitle="Document Analysis and Recognition - ICDAR 2023",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="70--88",
isbn="978-3-031-41731-3"
}

DBLP offered bib citation format

@inproceedings{DBLP:conf/icdar/MaZTZZZ23a,
  author       = {Cong Ma and
                  Yaping Zhang and
                  Mei Tu and
                  Yang Zhao and
                  Yu Zhou and
                  Chengqing Zong},
  editor       = {Gernot A. Fink and
                  Rajiv Jain and
                  Koichi Kise and
                  Richard Zanibbi},
  title        = {{E2TIMT:} Efficient and Effective Modal Adapter for Text Image Machine
                  Translation},
  booktitle    = {Document Analysis and Recognition - {ICDAR} 2023 - 17th International
                  Conference, San Jos{\'{e}}, CA, USA, August 21-26, 2023, Proceedings,
                  Part {VI}},
  series       = {Lecture Notes in Computer Science},
  volume       = {14192},
  pages        = {70--88},
  publisher    = {Springer},
  year         = {2023},
  url          = {https://doi.org/10.1007/978-3-031-41731-3\_5},
  doi          = {10.1007/978-3-031-41731-3\_5},
  timestamp    = {Fri, 16 Aug 2024 07:47:11 +0200},
  biburl       = {https://dblp.org/rec/conf/icdar/MaZTZZZ23a.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

If you have any issues, please contact with email.