Awesome
<div align="center"> <img src="./docs/images/banner.png" width="320px" alt="PDF2ZH"/> <h2 id="title">PDFMathTranslate</h2> <p> <!-- PyPI --> <a href="https://pypi.org/project/pdf2zh/"> <img src="https://img.shields.io/pypi/v/pdf2zh"></a> <a href="https://pepy.tech/projects/pdf2zh"> <img src="https://static.pepy.tech/badge/pdf2zh"></a> <a href="https://hub.docker.com/repository/docker/byaidu/pdf2zh"> <img src="https://img.shields.io/docker/pulls/byaidu/pdf2zh"></a> <a href="https://gitcode.com/Byaidu/PDFMathTranslate/overview"> <img src="https://gitcode.com/Byaidu/PDFMathTranslate/star/badge.svg"></a> <a href="https://huggingface.co/spaces/reycn/PDFMathTranslate-Docker"> <img src="https://img.shields.io/badge/%F0%9F%A4%97-Online%20Demo-FF9E0D"></a> <a href="https://www.modelscope.cn/studios/AI-ModelScope/PDFMathTranslate"> <img src="https://img.shields.io/badge/ModelScope-Demo-blue"></a> <a href="https://github.com/Byaidu/PDFMathTranslate/pulls"> <img src="https://img.shields.io/badge/contributions-welcome-green"></a> <a href="https://t.me/+Z9_SgnxmsmA5NzBl"> <img src="https://img.shields.io/badge/Telegram-2CA5E0?style=flat-squeare&logo=telegram&logoColor=white"></a> <!-- License --> <a href="./LICENSE"> <img src="https://img.shields.io/github/license/Byaidu/PDFMathTranslate"></a> </p><a href="https://trendshift.io/repositories/12424" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12424" alt="Byaidu%2FPDFMathTranslate | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</div>PDF scientific paper translation and bilingual comparison.
- 📊 Preserve formulas, charts, table of contents, and annotations (preview).
- 🌐 Support multiple languages, and diverse translation services.
- 🤖 Provides commandline tool, interactive user interface, and Docker
Feel free to provide feedback in GitHub Issues, Telegram Group or QQ Group.
<h2 id="updates">Updates</h2>- [Dec. 19 2024] Non-PDF/A documents are now supported using
-cp
(by @reycn) - [Dec. 13 2024] Additional support for backend by (by @YadominJinta)
- [Dec. 10 2024] The translator now supports OpenAI models on Azure (by @yidasanqian)
You can try our application out using either of the following demos:
- Public free service online without installation (recommended).
- Demo hosted on HuggingFace
- Demo hosted on ModelScope without installation.
Note that the computing resources of the demo are limited, so please avoid abusing them.
<h2 id="install">Installation and Usage</h2>Methods
For different use cases, we provide four distinct methods to use our program:
<details open> <summary>1. Commandline</summary>-
Python installed (3.8 <= version <= 3.12)
-
Install our package:
pip install pdf2zh
-
Execute translation, files generated in current working directory:
pdf2zh document.pdf
-
Download setup.bat
-
Double-click to run.
pip install pdf2zh
-
Start using in browser:
pdf2zh -i
-
If your browswer has not been started automatically, goto
<img src="./docs/images/gui.gif" width="500"/>http://localhost:7860/
See documentation for GUI for more details.
</details> <details> <summary>4. Docker</summary>-
Pull and run:
docker pull byaidu/pdf2zh docker run -d -p 7860:7860 byaidu/pdf2zh
-
Open in browser:
http://localhost:7860/
For docker deployment on cloud service:
<div> <a href="https://www.heroku.com/deploy?template=https://github.com/Byaidu/PDFMathTranslate"> <img src="https://www.herokucdn.com/deploy/button.svg" alt="Deploy" height="26"></a> <a href="https://render.com/deploy"> <img src="https://render.com/images/deploy-to-render-button.svg" alt="Deploy to Koyeb" height="26"></a> <a href="https://zeabur.com/templates/5FQIGX?referralCode=reycn"> <img src="https://zeabur.com/button.svg" alt="Deploy on Zeabur" height="26"></a> <a href="https://app.koyeb.com/deploy?type=git&builder=buildpack&repository=github.com/Byaidu/PDFMathTranslate&branch=main&name=pdf-math-translate"> <img src="https://www.koyeb.com/static/images/deploy/button.svg" alt="Deploy to Koyeb" height="26"></a> </div> </details>Unable to install?
The present program needs an AI model(wybxc/DocLayout-YOLO-DocStructBench-onnx
) before working and some users are not able to download due to network issues. If you have a problem with downloading this model, we provide a workaround using the following environment variable:
set HF_ENDPOINT=https://hf-mirror.com
If the solution does not work to you / you encountered other issues, please refer to frequently asked questions.
<h2 id="usage">Advanced Options</h2>Execute the translation command in the command line to generate the translated document example-mono.pdf
and the bilingual document example-dual.pdf
in the current working directory. Use Google as the default translation service.
In the following table, we list all advanced options for reference:
Option | Function | Example |
---|---|---|
files | Local files | pdf2zh ~/local.pdf |
links | Online files | pdf2zh http://arxiv.org/paper.pdf |
-i | Enter GUI | pdf2zh -i |
-p | Partial document translation | pdf2zh example.pdf -p 1 |
-li | Source language | pdf2zh example.pdf -li en |
-lo | Target language | pdf2zh example.pdf -lo zh |
-s | Translation service | pdf2zh example.pdf -s deepl |
-t | Multi-threads | pdf2zh example.pdf -t 1 |
-o | Output dir | pdf2zh example.pdf -o output |
-f , -c | Exceptions | pdf2zh example.pdf -f "(MS.*)" |
-cp | Compatibility Mode | pdf2zh example.pdf --compatible |
--share | Public link | pdf2zh -i --share |
--authorized | Authorization | pdf2zh -i --authorized users.txt [auth.html] |
--prompt | Custom Prompt | pdf2zh --prompt [prompt.txt] |
For detailed explanations, please refer to our document about Advanced Usage for a full list of each option.
<h2 id="downstream">Secondary Development (APIs)</h2>For downstream applications, please refer to our document about API Details for futher information about:
- Python API, how to use the program in other Python programs
- HTTP API, how to communicate with a server with the program installed
-
Parse layout with DocLayNet based models, PaddleX, PaperMage, SAM2
-
Fix page rotation, table of contents, format of lists
-
Fix pixel formula in old papers
-
Async retry except KeyboardInterrupt
-
Knuth–Plass algorithm for western languages
-
Support non-PDF/A files
-
Document merging: PyMuPDF
-
Document parsing: Pdfminer.six
-
Document extraction: MinerU
-
Document Preview: Gradio PDF
-
Multi-threaded translation: MathTranslate
-
Layout parsing: DocLayout-YOLO
-
Document standard: PDF Explained, PDF Cheat Sheets
-
Multilingual Font: Go Noto Universal