Awesome
PDF Library Benchmarks
This benchmark is about reading pure PDF files - notscanned documents and not documents that applied OCR.
Benchmarking machine
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Input Documents
Libraries
Name | Last PyPI Release | License | Version | Dependencies |
---|
Borb | 2023-06-23 | AGPL/Commercial | 2.1.16 | |
pypdfium2 | 2023-07-04 | Apache-2.0 or BSD-3-Clause | 4.18.0 | PDFium (Foxit/Google) |
pdfminer.six | 2022-11-05 | MIT/X | 20221105 | |
pdfplumber | 2023-07-29 | MIT | 0.10.2 | pdfminer.six |
pdfrw | 2017-09-18 | MIT | 0.4 | |
pdftotext | - | GPL | 0.86.1 | build-essential libpoppler-cpp-dev pkg-config python3-dev |
PyMuPDF | 2023-08-24 | GNU AFFERO GPL 3.0 / Commerical | 1.23.1 | MuPDF |
pypdf | 2023-08-26 | BSD 3-Clause | 3.15.4 | |
Tika | 2023-01-01 | Apache v2 | 2.6.0 | Apache Tika |
Text Extraction Speed
# | Library | Average | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
---|
1 | PyMuPDF | 0.1s | 0.4s | 0.2s | 0.2s | 0.2s | 0.0s | 0.1s | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s |
2 | pypdfium2 | 0.2s | 1.9s | 0.2s | 0.2s | 0.2s | 0.0s | 0.1s | 0.1s | 0.1s | 0.0s | 0.1s | 0.0s | 0.0s | 0.0s | 0.0s |
3 | pdftotext | 0.3s | 0.8s | 1.0s | 0.3s | 0.8s | 0.1s | 0.2s | 0.2s | 0.1s | 0.0s | 0.1s | 0.1s | 0.1s | 0.0s | 0.0s |
4 | Tika | 1.1s | 12.9s | 0.9s | 0.6s | 0.4s | 0.1s | 0.3s | 0.2s | 0.1s | 0.1s | 0.1s | 0.1s | 0.1s | 0.0s | 0.0s |
5 | pypdf | 2.6s | 18.7s | 4.8s | 5.3s | 2.3s | 0.7s | 0.9s | 0.4s | 0.5s | 0.3s | 0.6s | 0.5s | 0.4s | 0.4s | 0.2s |
6 | pdfminer.six | 4.5s | 26.0s | 12.9s | 8.0s | 4.6s | 1.3s | 2.1s | 1.0s | 1.2s | 0.8s | 1.5s | 0.9s | 0.9s | 0.6s | 0.6s |
7 | pdfplumber | 6.7s | 41.7s | 10.9s | 11.5s | 8.4s | 2.4s | 4.3s | 2.0s | 1.9s | 1.9s | 2.7s | 1.8s | 1.7s | 1.0s | 1.2s |
8 | Borb | 34.7s | 111.2s | 105.0s | 1.4s | 87.2s | 21.1s | 7.4s | 83.5s | 16.4s | 20.3s | 5.4s | 3.4s | 18.8s | 3.2s | 2.1s |
Image Extraction Speed
# | Library | Average | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
---|
1 | PyMuPDF | 0.5s | 0.3s | 0.5s | 0.0s | 1.7s | 0.4s | 0.0s | 3.2s | 0.4s | 0.4s | 0.1s | 0.0s | 0.3s | 0.2s | 0.0s |
2 | pypdf | 2.8s | 16.4s | 2.1s | 0.8s | 9.2s | 1.1s | 0.0s | 6.7s | 0.9s | 0.9s | 0.4s | 0.0s | 0.7s | 0.2s | 0.1s |
3 | pdfminer.six | 6.5s | 31.8s | 13.7s | 9.2s | 24.0s | 1.5s | 2.3s | 1.5s | 1.4s | 0.9s | 1.5s | 0.9s | 1.0s | 0.6s | 0.5s |
Watermarking Speed
# | Library | Average | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
---|
1 | PyMuPDF | 0.0s | 0.0s | 0.1s | 0.0s | 0.1s | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s |
2 | pdfrw | 0.1s | 0.0s | 0.4s | 0.0s | 0.3s | 0.1s | 0.1s | 0.1s | 0.1s | 0.1s | 0.1s | 0.0s | 0.1s | 0.0s | 0.0s |
3 | pypdf | 0.4s | 0.6s | 1.7s | 0.4s | 0.9s | 0.2s | 0.3s | 0.4s | 0.3s | 0.2s | 0.3s | 0.1s | 0.2s | 0.0s | 0.2s |
Watermarking File Size
# | Library | Average | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
---|
1 | pdfrw | 3.4MB | 2.5MB | 5.7MB | 1.6MB | 7.3MB | 2.7MB | 3.1MB | 15.4MB | 2.4MB | 1.3MB | 3.0MB | 0.3MB | 1.1MB | 0.8MB | 1.0MB |
2 | pypdf | 3.5MB | 2.5MB | 5.7MB | 1.6MB | 7.3MB | 2.7MB | 3.1MB | 15.4MB | 2.4MB | 1.3MB | 3.0MB | 0.3MB | 1.1MB | 0.8MB | 1.0MB |
3 | PyMuPDF | 3.7MB | 2.7MB | 6.8MB | 1.7MB | 8.5MB | 2.8MB | 3.4MB | 15.5MB | 2.5MB | 1.4MB | 3.2MB | 0.3MB | 1.2MB | 0.9MB | 1.1MB |
Text Extraction Quality
# | Library | Average | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
---|
1 | pypdfium2 | 98% | 99% | 97% | 94% | 99% | 98% | 96% | 99% | 98% | 99% | 99% | 98% | 98% | 99% | 99% |
2 | pypdf | 97% | 98% | 93% | 94% | 98% | 98% | 96% | 97% | 98% | 99% | 99% | 98% | 98% | 98% | 99% |
3 | PyMuPDF | 97% | 98% | 96% | 93% | 97% | 98% | 96% | 98% | 98% | 98% | 98% | 97% | 97% | 98% | 99% |
4 | Tika | 96% | 99% | 98% | 92% | 97% | 98% | 96% | 93% | 97% | 98% | 93% | 98% | 93% | 98% | 96% |
5 | pdftotext | 93% | 96% | 93% | 91% | 94% | 92% | 96% | 96% | 96% | 97% | 83% | 94% | 96% | 96% | 79% |
6 | pdfminer.six | 90% | 95% | 79% | 86% | 92% | 86% | 93% | 95% | 93% | 92% | 92% | 93% | 86% | 98% | 86% |
7 | pdfplumber | 75% | 94% | 84% | 61% | 97% | 61% | 93% | 61% | 89% | 57% | 59% | 67% | 59% | 98% | 67% |
8 | Borb | 45% | 70% | 79% | 0% | 40% | 48% | 92% | 0% | 64% | 51% | 41% | 55% | 43% | 0% | 53% |