Awesome
page-to-text
Extracts the text from a PAGE file and writes it to stdout
.
Note that this tool does not consider ReadingOrder
if available in the PAGE-XML, but instead writes output based of the order in the XML tree.
Use like:
python page_to_text.py <page-xml-file>