Home

Awesome

<p align=center>Doc2Graph</p>

model

PWC PWC

Python PyTorch

This library is the implementation of the paper Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks, accepted at TiE @ ECCV 2022.

The model and pipeline aims at being task-agnostic on the domain of Document Understanding. It is an ongoing project, these are the steps already achieved and the ones we would like to implement in the future:

Index:


News!

python src/main.py -addG -addT -addE -addV --gpu 0 --weights e2e-funsd-best.pt --inference --docs /path/to/document

Environment Setup

Setup the initial conda environment

conda create -n doc2graph python=3.9 ipython cudatoolkit=11.3 -c anaconda &&
conda activate doc2graph &&
cd doc2graph

Then, install setuptools-git-versioning and doc2graph package itself. The following has been tested only on linux: for different OS installations refer directly to PyTorch and DGL original documentation.

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url &&
https://download.pytorch.org/whl/cu113 &&
pip install dgl-cu113 dglgo -f https://data.dgl.ai/wheels/repo.html &&
pip install setuptools-git-versioning && pip install -e . &&
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.3.0/en_core_web_lg-3.3.0.tar.gz

Finally, create the project folder structure and download data:

python src/main.py --init

The script will download and setup:

Checkpoints You can download our model checkpoints here. Place them into src/models/checkpoints.


Training

  1. To train our Doc2Graph model (using CPU) use:
python src/main.py [SETTINGS]
  1. Instead, to test a trained Doc2Graph model (using GPU) [weights can be one or more file]:
python src/main.py [SETTINGS] --gpu 0 --test --weights *.pt

The project can be customized either changing directly configs/base.yaml file or providing these flags when calling src/main.py.

Features

Data

Graphs

Inference (only for KiE)

Change directly configs/train.yaml for training settings or pass these flags to src/main.py. To create your own model (changing hyperparams) copy configs/models/*.yaml.

Training/Testing

Testing

You can use our pretrained models over the test sets of FUNSD1 and Pau Riba's3 datasets.

  1. On FUNSD we were able to perform both Semantic Entity Labeling and Entity Linking:

E2E-FUNSD-GT:

python src/main.py -addG -addT -addE -addV --gpu 0 --test --weights e2e-funsd-best.pt

E2E-FUNSD-YOLO:

python src/main.py -addG -addT -addE -addV --gpu 0 --test --weights e2e-funsd-best.pt --node-granularity yolo
  1. on Pau Riba's dataset, we were able to perform both Layout Analysis and Table Detection

E2E-PAU:

python src/main.py -addG -addT -addE -addV --gpu 0 --test --weights e2e-pau-best.pt --src-data PAU --edge-type knn

Cite this project

If you want to use our code in your project(s), please cite us:

@InProceedings{10.1007/978-3-031-25069-9_22,
author="Gemelli, Andrea
and Biswas, Sanket
and Civitelli, Enrico
and Llad{\'o}s, Josep
and Marinai, Simone",
editor="Karlinsky, Leonid
and Michaeli, Tomer
and Nishino, Ko",
title="Doc2Graph: A Task Agnostic Document Understanding Framework Based on Graph Neural Networks",
booktitle="Computer Vision -- ECCV 2022 Workshops",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="329--344",
abstract="Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis. The application of Graph Neural Networks (GNNs) has become crucial in various document-related tasks since they can unravel important structural patterns, fundamental in key information extraction processes. Previous works in the literature propose task-driven models and do not take into account the full power of graphs. We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model, to solve different tasks given different types of documents. We evaluated our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection. Our code is freely accessible on https://github.com/andreagemelli/doc2graph.",
isbn="978-3-031-25069-9"
}

Footnotes

  1. G. Jaume et al., FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents, ICDARW 2019 ↩ ↩2

  2. Hieu M. Vu et al., REVISING FUNSD DATASET FOR KEY-VALUE DETECTION IN DOCUMENT IMAGES, arXiv preprint 2020 ↩

  3. P. Riba et al, Table Detection in Invoice Documents by Graph Neural Networks, ICDAR 2019 ↩ ↩2