Awesome
xmllpegparser
xmllpegparser
is a fast XML parser who uses LPeg
library.
Installation
luarocks install --local https://raw.githubusercontent.com/jonathanpoelen/lua-xmllpegparser/master/xmllpegparser-2.2-0.rockspec
# or in your local directory lua-xmllpegparser
luarocks make --local xmllpegparser-2.2-0.rockspec
Test
Run ./example.lua
.
./example.lua xmlfile [replaceentities]
replaceentities
= anything, only to enable replacement of entities.
xmllpegparser API
Parsing
parse(xmlstring[, visitorOrsubEntities[, visitorInitArgs...]])
:
Returns a tupledocument table, (string error or nil)
(seevisitor.finish
).
IfsubEntities
istrue
, the entities are replaced and atentity
member is added to the documenttable
.parseFile(filename[, visitorOrsubEntities[, visitorInitArgs...]])
:
Returns a tupledocument table, error file or error document
.
Entity
defaultEntitiyTable()
:
Returns the default entity table ({ quot='"', ... }
).createEntityTable(docEntities[, resultEntities])
:
Creates an entity table from the document entity table. ReturnresultEntities
.mkReplaceEntities(entityTable_or_func)
:
Returns an LPeg expression that can replace entitiesreplaceEntities(s, entityTable_or_func)
:
Returns astring
.
Parsers
parser(visitor[, safeVisitor: bool])
:
Returns a parser. If all visitor functions returnnil
(exceptedaccuattr
,init
andfinish
), thensafeVisitor
may betrue
and the parser will optimize the visitor's calls.lazyParser(visitorCreator)
:
Returns a parser.
parser(visitorCreator())
is used on the first call ofmyparser.parse(...)
.mkVisitor(evalEntities: bool, defaultEntities: table | function | nil, withoutPosition)
:
Ifnot defaultEntities
andevalEntities
thendefaultEntities = defaultEntityTable()
.
IfwithoutPosition
, thenpos
parameter does not exist for the visitor functions except forfinish
.treeParser
:
The default parser used byparse(str, false)
treeParserWithReplacedEntities
:
The default parser used byparse(str, true)
treeParserWithoutPos
:
Parser withoutpos
parametertreeParserWithoutPosWithReplacedEntities
:
Parser withoutpos
parameter
Global parser options
enableWithoutPosParser([bool])
:
Enable default parser withtreeParserWithoutPos*
version.
enableParserWithoutPos(false)
is same tosetDefaultParsers()
.
Returns the previous parsers.setDefaultParsers(parser, parserWithReplacedEntities | bool | nil)
:
IfparserWithReplacedEntities == true
, thenparserWithReplacedEntities = p
.
nil
orfalse
value restore the default parser.
Returns the previous parsers.
Utility
toString(doc: table, indentationText: nil | string, params: nil | table)
:\indentationText
corresponds to the text used at each indentation level. Ifnil
, there is no formatting.params
is table withshortEmptyElements: bool = true
: empty tag are self-closed or not.stableAttributes: bool | function = true
: Iftrue
, attribute are sorted by name. If a function, it takes the attribute table and should return an iterator function that gives the attribute name and its value.inlineTextLengthMax: number = 9999999
: a node that contains only one text is formatted on one line. When the text exceeds this value, it is indented.escape: table
: table offunction(string):string
attr
: text in double quotetext
: text nodecdata
: text between<![CDATA[
and]]>
comment
: text between<!--
and-->
escapeFunctions(escapeAmp: bool = false)
:
Utility function forparams.escape
parameter oftoString
escapeAmp
: escape&
char in text and attribute
escapeComment(string):string
: replace--
with—
escapeAttribute(string):string
: replace<
with<
and"
with"
escapeAttributeAndAmp(string):string
: likeescapeAttribute
+ replace&
with&
escapeCDATA(string):string
: replace]]>
with]]>]]><![CDATA[
escapeText(string):string
: replace<
with<
escapeTextAndAmp(string):string
replace<
with<
and&
with&
Document structure (default parser)
-- pos member = index of string
document = {
children = {
{ pos=number, parent=table or nil, text=string[, cdata=true] } or
{ pos=number, parent=table or nil, tag=string, attrs={ { name=string, value=string }, ... }, children={ ... } },
...
},
bad = { children={ ... } } -- when a closed node has no match
preprocessor = { { pos=number, tag=string, attrs={ { name=string, value=string }, ... } },
doctype = { pos=number, name=string, ident=string or nil, pubident=string or nil, dtd=string or nil }, -- if there is a doctype
error = string, -- if error
lastpos = number, -- last known position of parse()
entities = { { pos=number, name=string, value=string }, ... },
tentities = { name=value, ... } -- only if subEntities = true
}
Parser structure
{
parse = function(xmlstring, visitorInitArgs...) ... end,
parseFile = function(filename, visitorInitArgs...) ... end,
__call = function(xmlstring, visitorInitArgs...) ... end,
}
Visitor structure
Each member is optionnal.
{
withPos = bool -- indicates if pos parameter exists in function parameter (except `finish`)
init = function(...), -- called before parsing, returns the position of the beginning of match or nil
finish = function(err, pos, xmlstring), -- called after parsing, returns (doc, err) or nil
proc = function(pos, name, attrs), -- for `<?...?>`
entity = function(pos, name, value),
doctype = function(pos, name, ident, pubident, dtd), -- called after all entity()
accuattr = function(table, name, value), -- `table` is an accumulator that will be transmitted to tag.attrs. Set to `false` for disable this function.
-- If `nil` and `tag` is `not nil`, a default accumalator is used.
-- If `false`, the accumulator is disabled.
-- (`tag(pos, name, accuattr(accuattr({}, attr1, value1), attr2, value2)`)
tag = function(pos, name, attrs), -- for a new tag (`<a>` or `<a/>`)
open = function(), -- only for a open node (`<a>` not `<a/>`), called after `tag`.
close = function(name),
text = function(pos, text),
cdata = function(pos, text), -- or `text` if nil
comment = function(str)
}
Default parser limitations
- Non-validating
- No DTD support
- Ignore processing instructions