Home

Awesome

ReS2TIM: Reconstruct Syntactic Structures from Table Images

conferrence

Xue, Wenyuan, Qingyong Li, and Dacheng Tao. "ReS2TIM: Reconstruct Syntactic Structures from Table Images." 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019.

Abstract

Tables often represent densely packed but structured data. Understanding table semantics is vital for effective information retrieval and data mining. Unlike web tables, whose semantics are readable directly from markup language and contents, the full analysis of tables published as images requires the conversion of discrete data into structured information. This paper presents a novel framework to convert a table image into its syntactic representation through the relationships between its cells. In order to reconstruct the syntactic structures of a table, we build a cell relationship network to predict the neighbors of each cell in four directions. During the training stage, a distance-based sample weight is proposed to handle the class imbalance problem. According to the detected relationships, the table is represented by a weighted graph that is then employed to infer the basic syntactic table structure. Experimental evaluation of the proposed framework using two datasets demonstrates the effectiveness of our model for cell relationship detection and table structure inference.

Note

Getting Started

Requirements

Create the environment from the environment.yml file conda env create --file environment.yml or install the software needed in your environment independently.

dependencies:
  - python=3.7
  - torchvision==0.6.0
  - pytorch==1.5.0
  - pip:
    - dominate==2.5.2
    - opencv-python==4.4.0.42
    - pandas==1.1.1
    - tqdm==4.48.2
    - scipy==0.5.2
    - visdom==0.1.8

Datasets Preparation

cd ./datasets
tar -zxvf cmdd.tar.gz
tar -zxvf icdar13table.tar.gz
## The './datasets/' folder should look like:
- datasets/
  - cmdd/
    - src_image/
    - src_set/
    - labels_src.json
    - prepare.py
  - icdar13table/
    - eu-dataset/
    - us-dataset/
    - eu-us-dataset/
    - src_page_image/
    - src_set/
    - prepare.py
cd ./datasets/cmdd
python prepare.py
cd ../icdar13table
python prepare.py
cd ..
rm cmdd.tar.gz
rm icdar13table.tar.gz

Training and evaluation

  1. Train and fine tune.
# train on the cmdd dataset
python train.py --dataroot ./datasets/cmdd --gpu_ids 2 --model res2tim --dataset_mode cell_rel --lr 0.0005 --pair_batch 10000 --niter 5 --niter_decay 95 --use_mask --name res2tim_cmdd

# Copy the best model on CMDD to the folder of icdar13table for initialization
mkdir ./checkpoints/icdar13table
cp ./checkpoints/cmdd/best_net_Res2Tim.pth ./checkpoints/icdar13table/best_net_Res2Tim.pth

# train on the icdar13table dataset based on the CMDD pretrained model
python train.py --dataroot ./datasets/icdar13table --gpu_ids 2 --model res2tim --dataset_mode cell_rel --lr 0.0005 --pair_batch 10000 --niter 5 --niter_decay 95 --use_mask --name res2tim_icdar13table --continue_train --epoch prt
  1. Evaluation for neighbor relationship detection and cell location inference. Use your training models, or download our pretrained models and put them under './checkpoints/res2tim_cmdd/' and './checkpoints/res2tim_icdar13table/', respectively. CMDD pretrained model: Google Drive, 百度网盘(b7pt). ICDAR13Table pretrained model: Google Drive, 百度网盘(2grp).
# CMDD 
python test.py --dataroot ./datasets/cmdd --gpu_ids 5 --model res2tim --dataset_mode cell_rel --pair_batch 10000 --use_mask --name res2tim_cmdd --epoch best

# ICDAR13TABLE
python test.py --dataroot ./datasets/icdar13table --gpu_ids 5 --model res2tim --dataset_mode cell_rel --pair_batch 10000 --use_mask --name res2tim_icdar13table --epoch best
  1. Key options.

Experiment Results

  1. Results of neighbor relationship detection
<table> <tr> <td> </td> <td colspan="2">CMDD</td> <td colspan="2">ICDAR 2013 Dataset</td> <tr> <tr> <td> </td> <td>Precision</td> <td>Recall</td> <td>Precision</td> <td>Recall</td> <tr> <tr> <td>The paper reports</td> <td>0.999</td> <td>0.997</td> <td>0.926</td> <td>0.447</td> <tr> <tr> <td>This implementation</td> <td>0.999</td> <td>0.996</td> <td>0.866</td> <td>0.841</td> <tr> </table> 2. Results of cell location inference <table> <tr> <td colspan="6">CMDD</td> <tr> <tr> <td> </td> <td>cell_loc</td> <td>row1</td> <td>row2</td> <td>col1</td> <td>col2</td> <tr> <tr> <td>The paper reports</td> <td>0.999</td> <td>0.999</td> <td>0.999</td> <td>0.999</td> <td>0.999</td> <tr> <tr> <td>This implementation</td> <td>0.996</td> <td>0.999</td> <td>0.997</td> <td>0.999</td> <td>0.999</td> <tr> </table> <table> <tr> <td colspan="6">ICDAR 2013 Dataset</td> <tr> <tr> <td> </td> <td>cell_loc</td> <td>row1</td> <td>row2</td> <td>col1</td> <td>col2</td> <tr> <tr> <td>The paper reports</td> <td>0.015</td> <td>0.053</td> <td>0.064</td> <td>0.166</td> <td>0.163</td> <tr> <tr> <td>This implementation</td> <td>0.174</td> <td>0.306</td> <td>0.264</td> <td>0.576</td> <td>0.492</td> <tr> </table>

Custom dataset Preparation

Refer to ./datasets/cmdd/prepare.py and ./datasets/icdar13table/prepare.py for you own dataset preparation.

Citation

Please consider citing this work in your publications if it helps your research.

@inproceedings{xue2019res2tim,  
  title={ReS2TIM: Reconstruct Syntactic Structures from Table Images},  
  author={Xue, Wenyuan and Li, Qingyong and Tao, Dacheng},  
  booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)},  
  pages={749--755},  
  year={2019},  
  organization={IEEE}  
}