Home

Awesome

Here I present Detectron2 object detection models trained on PubLayNet dataset, ranging from 81.139 to 86.690 in validation AP scores (possibly even better results can be achieved with longer training times).

Preview

Dataset overview

PubLayNet is a very large (over 300k images & over 90 GB in weight) dataset for document layout analysis. It contains images of research papers and articles and annotations for various elements in a page such as “text”, “list”, “figure” etc in these research paper images. The dataset was obtained by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. Originally provided by IBM here.

Data DescriptionZipped File NamePurpose
Train 0 Dataset, 13 GBtrain-0.tar.gzTraining
Train 1 Dataset, 13 GBtrain-1.tar.gzTraining
Train 2 Dataset, 13 GBtrain-2.tar.gzTraining
Train 3 Dataset, 13 GBtrain-3.tar.gzTraining
Train 4 Dataset, 13 GBtrain-4.tar.gzTraining
Train 5 Dataset, 13 GBtrain-5.tar.gzTraining
Train 6 Dataset, 13 GBtrain-6.tar.gzTraining
Evaluation Dataset, 3 GBval.tar.gzEvaluation
Labels Dataset, 314 MBlabels.tar.gz

Models overview

Models were trained on train part of the dataset, consisting of 335 703 images, and evaluated on val part of the dataset with 11 245 images. All the AP scores were obtained on the val dataset. Inference times were taken from official Detectron model zoo descriptions.

To provide you with some options, and to experiment with different models I have trained 4 versions for this dataset. 2 of them are from basic object detection, and 2 contain segmentation masks. You can choose which better suits your task by comparing accuracy scores and inference times of these models.

Models from COCO Object Detection:

NameInference time (s/im)Box APFolder nameModel zoo configTrained model
R50-FPN0.03881.139faster_rcnn_R_50_FPN_3xClick me!Click me!
R101-FPN0.05184.295faster_rcnn_R_101_FPN_3xClick me!Click me!

Models from COCO Instance Segmentation with Mask R-CNN:

NameInference time (s/im)Box APMask APFolder nameModel zoo configTrained model
R50-FPN0.04383.66682.268mask_rcnn_R_50_FPN_3xClick me!Click me!
R101-FPN0.05686.69082.105mask_rcnn_R_101_FPN_3xClick me!Click me!

Model folder

Each model's directory in git will contain these files:

File nameDescription
download.txtA text file that contains a link from where you can download the trained model
train.pyTraining script, that trains the model found in training_output sub-folder for a given number of epochs
test.pyTesting script, that runs the model found in training_output in inference mode for 6 randomly preselected images (3 from training, 3 from evaluation datasets) and displays all predictions on each image in a pop-up window
eval.pyTesting script, that runs the model found in training_output in inference mode for all images found in dataset's val folder and val.json. Evaluation is performed through Detectron's COCOEvaluator
evaluation.txtAs eval.py takes some time to execute (from 10 to 20 minutes) I've recorded last output of evaluation in this separate text file

Generally all trained models .pth files should be available through here or here, but if not - refer to individual download.txt links

Using the models

In usage_example module I've provided a sample script example.py that builds Detectron objects (config and predictor) for my trained model and runs inference on it, with a sample interpretation of Detectron's outputs. Please note that test.py and eval.py in model folders use common functions shared between models in this project. While example.py provides a completely clean setup, assuming you only want to download the models and use them for inference directly. It requires only Pillow, OpenCV, numpy and Detectron2 to run.

Hardware used

Believe it or not but training of these models was performed on a regular consumer-grade personal gaming PC with one NVIDIA 2070 SUPER (8GB) GPU, Intel Core i5-10600K CPU and 32 GB RAM.

Links

In case you’d like to check my other work or contact me: