Awesome
Description
This is a Python module that converts XDXF formatted dictionary texts into HTML, written in modern C++.
It depends on nothing except for a C++11-compliant compiler. The parser does no error checking to minimise overhead.
Installation
pip install xdxf2html
Building
python3 setup.py build
Usage
>>> import xdxf2html
>>> xdxf2html.convert('''<k>Liverpool</k>
... <blockquote><dtrn>a large city and port in north-west England, on the River Mersey. It first became important during the <kref>Industrial Revolution</kref>, producing and exporting cotton goods. It was also a major port for the slave trade, receiving profits from the sale of slaves in America. In the 20th century the city became famous as the home of the <kref>Beatles</kref> and for Liverpool and Everton football clubs. Among its many famous buildings are the Royal Liver Building with its two towers, the Anglican and Roman Catholic cathedrals, and the <kref>Walker Art Gallery</kref>.</dtrn> <rref>portlpool.jpg</rref></blockquote>
... <blockquote>See also <kref>Mersey beat</kref>.</blockquote>''', 'test_dict')
'<h3 class="headword">Liverpool</h3><div class="xdxf-definition" style="margin-left: 0em;">a large city and port in north-west England, on the River Mersey. It first became important during the <a href="/api/lookup/test_dict/Industrial Revolution">Industrial Revolution</a>, producing and exporting cotton goods. It was also a major port for the slave trade, receiving profits from the sale of slaves in America. In the 20th century the city became famous as the home of the <a href="/api/lookup/test_dict/Beatles">Beatles</a>and for Liverpool and Everton football clubs. Among its many famous buildings are the Royal Liver Building with its two towers, the Anglican and Roman Catholic cathedrals, and the <a href="/api/lookup/test_dict/Walker Art Gallery">Walker Art Gallery</a>.<img src="/api/cache/test_dict/portlpool.jpg" alt="portlpool.jpg"/></div><div class="xdxf-definition" style="margin-left: 0em;">See also <a href="/api/lookup/test_dict/Mersey beat">Mersey beat</a>.</div>'
The module has only one method: convert
, which takes two arguments: the XDXF text and the name of the dictionary. It returns the HTML text.
Appendix: a hopefully complete listing of XDXF tags, both standard and non-standard
This section will only include tags found within the dictionary 'body', i.e. <lexicon>
.
Representational tags
<b>
,<i>
,<u>
,<sub>
,<sup>
,<tt>
,<br>
: Could be directly translated into HTML.<c>
: Colour, indicated by the attributec
. Converted to a<span>
with the stylecolor: c
. The default colour isdarkgreen
.
<div>
like tags
<ar>
: Article, ignored in this project.<def>
: Definition, converted to a<div>
with the classxdxf-definition
. If the attributecmt
is present, write it as a<span>
with the classcomment
inside the<div>
; if the attributefreq
is present, write it as a<span>
with the classfrequency
inside the<div>
.<ex>
: Example, converted to a<div>
with the classexample
and the stylemargin-left: 1em; color: grey;
.<co>
: Comment, converted to a<div>
with the classcomment
.<sr>
: 'Semantic relations', converted to a<div>
with the classsemantic-relations
.<etm>
: Etymology, converted to a<div>
with the stylecolor: grey;
.<blockquote>
: Not in the specification, but I've seen it. Converted to a<p>
.
<span>
like tags
<k>
: Keyword, namely the headword, converted to an<h3>
with the classheadword
.<opt>
: Optional part of the keyword, converted to a<span>
with the classoptional
.<deftext>
: Definition text, ignored in this project.<gr>
: Grammatical information, converted to a<span>
with the stylefont-style: italic; color: darkgreen;
.<pos>
,<tense>
: Ignored.<tr>
: Transcription, converted to a<span>
with the classtranscription
.<kref>
: Keyword reference, converted to an<a>
with thehref
attribute properly set (in this project, to/api/lookup/name of dictionary/keyword
). If the attributetype
is set, prepend the value of the attribute to the keyword with a colon and a space.<iref>
: External reference, could be directly converted to an<a>
.<dtrn>
: Definition translation, in practice used as an equivalent of<def>
. So we'd better ignore it.<abbr>
,<abr>
: Abbreviation, converted to a<span>
with the classabbreviation
and the stylecolor: darkgreen; font-style: italic;
.<ex_orig>
,<ex_trn>
: Example original and translation, converted to a<span>
with the classexample-original
andexample-translation
respectively.<exm>
,<prv>
,<oth>
: Used inside<ex>
, ignored in this project.<mrkd>
: Marked text, converted to a<span>
with the stylebackground-color: yellow;
.<nu>
: Not-used, not explained in the specification. Ignored in this project.<di>
: 'Don't index', ignored in this project.<syn>, <ant>, <hpr>, <hpn>, <par>, <spv>, <mer>, <hol>, <ent>, <rel>, <phr>
: Phrasemes (don't know really what they are).<syn>
and<ant>
are converted to<span>
with the classsynonym
andantonym
respectively, and the tag name is prepended to the text content with:
. The rest are ignored.<categ>
: Category, ignored in this project.
Media files
Always in an <rref>
tag. The standard prescribes that the filename should be specified in the lctn
attribute, but in practice, the filename is just the text child of the tag. Converted to <img>
, <audio>
, <video>
or <a>
depending on the file extension.