Home

Awesome

PDF Library Benchmarks

This benchmark is about reading pure PDF files - notscanned documents and not documents that applied OCR.

Benchmarking machine

Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz

Input Documents

#NameFile SizePages
12201.002142.4MiB22
2GeoTopo-book5.1MiB117
32201.001511.5MiB12
41707.097257.0MiB134
52201.000212.6MiB10
62201.000372.9MiB33
72201.0006914.7MiB15
82201.001782.3MiB16
92201.002011.3MiB9
101602.065412.9MiB16
112201.00200284.8KiB7
122201.000221.1MiB11
132201.00029797.6KiB12
141601.036421004.9KiB8

Libraries

NameLast PyPI ReleaseLicenseVersionDependencies
Borb2023-06-23AGPL/Commercial2.1.16
pypdfium22023-07-04Apache-2.0 or BSD-3-Clause4.18.0PDFium (Foxit/Google)
pdfminer.six2022-11-05MIT/X20221105
pdfplumber2023-07-29MIT0.10.2pdfminer.six
pdfrw2017-09-18MIT0.4
pdftotext-GPL0.86.1build-essential libpoppler-cpp-dev pkg-config python3-dev
PyMuPDF2023-08-24GNU AFFERO GPL 3.0 / Commerical1.23.1MuPDF
pypdf2023-08-26BSD 3-Clause3.15.4
Tika2023-01-01Apache v22.6.0Apache Tika

Text Extraction Speed

#LibraryAverage 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1PyMuPDF 0.1s0.4s0.2s0.2s0.2s0.0s0.1s0.0s0.0s0.0s0.0s0.0s0.0s0.0s0.0s
2pypdfium2 0.2s1.9s0.2s0.2s0.2s0.0s0.1s0.1s0.1s0.0s0.1s0.0s0.0s0.0s0.0s
3pdftotext 0.3s0.8s1.0s0.3s0.8s0.1s0.2s0.2s0.1s0.0s0.1s0.1s0.1s0.0s0.0s
4Tika 1.1s12.9s0.9s0.6s0.4s0.1s0.3s0.2s0.1s0.1s0.1s0.1s0.1s0.0s0.0s
5pypdf 2.6s18.7s4.8s5.3s2.3s0.7s0.9s0.4s0.5s0.3s0.6s0.5s0.4s0.4s0.2s
6pdfminer.six 4.5s26.0s12.9s8.0s4.6s1.3s2.1s1.0s1.2s0.8s1.5s0.9s0.9s0.6s0.6s
7pdfplumber 6.7s41.7s10.9s11.5s8.4s2.4s4.3s2.0s1.9s1.9s2.7s1.8s1.7s1.0s1.2s
8Borb 34.7s111.2s105.0s1.4s87.2s21.1s7.4s83.5s16.4s20.3s5.4s3.4s18.8s3.2s2.1s

Image Extraction Speed

#LibraryAverage 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1PyMuPDF 0.5s0.3s0.5s0.0s1.7s0.4s0.0s3.2s0.4s0.4s0.1s0.0s0.3s0.2s0.0s
2pypdf 2.8s16.4s2.1s0.8s9.2s1.1s0.0s6.7s0.9s0.9s0.4s0.0s0.7s0.2s0.1s
3pdfminer.six 6.5s31.8s13.7s9.2s24.0s1.5s2.3s1.5s1.4s0.9s1.5s0.9s1.0s0.6s0.5s

Watermarking Speed

#LibraryAverage 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1PyMuPDF 0.0s0.0s0.1s0.0s0.1s0.0s0.0s0.0s0.0s0.0s0.0s0.0s0.0s0.0s0.0s
2pdfrw 0.1s0.0s0.4s0.0s0.3s0.1s0.1s0.1s0.1s0.1s0.1s0.0s0.1s0.0s0.0s
3pypdf 0.4s0.6s1.7s0.4s0.9s0.2s0.3s0.4s0.3s0.2s0.3s0.1s0.2s0.0s0.2s

Watermarking File Size

#LibraryAverage 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1pdfrw 3.4MB2.5MB5.7MB1.6MB7.3MB2.7MB3.1MB15.4MB2.4MB1.3MB3.0MB0.3MB1.1MB0.8MB1.0MB
2pypdf 3.5MB2.5MB5.7MB1.6MB7.3MB2.7MB3.1MB15.4MB2.4MB1.3MB3.0MB0.3MB1.1MB0.8MB1.0MB
3PyMuPDF 3.7MB2.7MB6.8MB1.7MB8.5MB2.8MB3.4MB15.5MB2.5MB1.4MB3.2MB0.3MB1.2MB0.9MB1.1MB

Text Extraction Quality

#LibraryAverage 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1pypdfium2 98%99%97%94%99%98%96%99%98%99%99%98%98%99%99%
2pypdf 97%98%93%94%98%98%96%97%98%99%99%98%98%98%99%
3PyMuPDF 97%98%96%93%97%98%96%98%98%98%98%97%97%98%99%
4Tika 96%99%98%92%97%98%96%93%97%98%93%98%93%98%96%
5pdftotext 93%96%93%91%94%92%96%96%96%97%83%94%96%96%79%
6pdfminer.six 90%95%79%86%92%86%93%95%93%92%92%93%86%98%86%
7pdfplumber 75%94%84%61%97%61%93%61%89%57%59%67%59%98%67%
8Borb 45%70%79%0%40%48%92%0%64%51%41%55%43%0%53%