Awesome
page2tei
PAGE2TEI was created and is maintained by Dario Kampkaspar and is licensed under the MIT license.
How to use
Apply page2tei-0.xsl to the METS File:
java -jar saxon9he.jar -xsl:page2tei-0.xsl -s:mets.xml -o:[your tei file].xml
Additional stylesheets can be applied to the output created by the basic transformation:
combine-continued.xsl
(or set parametercombine=true()
) — try to combine entities that are split over a line break into one elementsimplify-coordinates.xsl
(parameterbounding-rectangles=true()
by default) — convert polygons into bounding rectanglestokenize.xsl
(or set parametertokenize=true()
) — perform (very basic!) whitespace tokenization
Parameters
You can set the following parameters when calling page2tei-0.xsl
(via command line or via an oXygen scenario; in oXygen, the parameters should be marked as “XPath“):
- rs (default:
true()
): creaters type="..."
for person/place/org (default) orpersName
etc. - tokenize (default:
false()
): Whether to run white space tokenization - combine (default:
false()
): Whether to combine entities over line breaks - ab (default:
false()
): If false(), region types that correspond to valid TEI elements will be returned as this element; types that do not correspond to a TEI element will be returned as tei:ab[@type]. If set to true(), all region types (except for paragraph, heading) will be returned as tei:ab. - word-coordinates (default:
false()
): If true(), export the (estimated) word coordinates to the facsimile section. - bounding-rectangles (default:
true()
): Whether to create bounding rectangles from polygons (default: true()) - withoutBaseline (default:
false()
): Whether to export lines without baseline or not - withoutTextline (default:
false()
): Whether to export regions without text lines
Contributors
- @tboenig
- @peterstadler
- @tillgrallert
Some contributions to this software were created within the scope of a project funded by the German BMBF, project ID 16TOA015A.