Home

Awesome

pdfannots

Build status PyPI version

This program extracts annotations (highlights, comments, etc.) from a PDF file, and formats them as Markdown or exports them to JSON. It is primarily intended for use in reviewing submissions to scientific conferences/journals.

Sample/demo of pdfannots extracting Markdown from an annotated PDF

For the default Markdown format, the output is as follows:

For each annotation, the page number is given, along with the associated (highlighted/underlined) text, if any. Additionally, if the document embeds outlines (aka bookmarks), such as those generated by the LaTeX hyperref package, they are printed to help identify to which section in the document the annotation refers.

Installation

To install the latest released version from PyPI, use a command such as:

python3 -m pip install pdfannots

Usage

See pdfannots --help (in a source tree: pdfannots.py --help) for options and invocation.

Dependencies

Known issues and limitations

FAQ

  1. I'd like to change how the output is formatted.

    Some minor tweaks (e.g.: word wrap, skipping or reordering output sections) can be accomplished via command-line arguments.

    All of the output comes from the relevant Printer subclass; more elaborate changes can be accomplished there. Pull requests to introduce new output formats or variants as printers are welcomed.

  2. I think I got a review generated by this tool...

    I hope that it was a constructive review, and that the annotations helped the reviewer give you more detailed feedback so you can improve your paper. This is, after all, just a tool, and it should not be an excuse for reviewer sloppiness.