Awesome
zek
Zek is a prototype for creating a Go struct from an XML document. The resulting struct works best for reading XML (see also #14), to create XML, you might want to use something else.
It was developed at Leipzig University Library to shorten the time to go from raw XML to a struct that allows to access XML data in Go programs.
Skip the fluff, just the code.
Given some XML, run:
$ curl -s https://raw.githubusercontent.com/miku/zek/master/fixtures/e.xml | zek -e
// Rss was generated 2018-08-30 20:24:14 by tir on sol.
type Rss struct {
XMLName xml.Name `xml:"rss"`
Text string `xml:",chardata"`
Rdf string `xml:"rdf,attr"`
Dc string `xml:"dc,attr"`
Geoscan string `xml:"geoscan,attr"`
Media string `xml:"media,attr"`
Gml string `xml:"gml,attr"`
Taxo string `xml:"taxo,attr"`
Georss string `xml:"georss,attr"`
Content string `xml:"content,attr"`
Geo string `xml:"geo,attr"`
Version string `xml:"version,attr"`
Channel struct {
Text string `xml:",chardata"`
Title string `xml:"title"` // ESS New Releases (Display...
Link string `xml:"link"` // http://tinyurl.com/ESSNew...
Description string `xml:"description"` // New releases from the Ear...
LastBuildDate string `xml:"lastBuildDate"` // Mon, 27 Nov 2017 00:06:35...
Item []struct {
Text string `xml:",chardata"`
Title string `xml:"title"` // Surficial geology, Aberde...
Link string `xml:"link"` // https://geoscan.nrcan.gc....
Description string `xml:"description"` // Geological Survey of Cana...
Guid struct {
Text string `xml:",chardata"` // 304279, 306212, 306175, 3...
IsPermaLink string `xml:"isPermaLink,attr"`
} `xml:"guid"`
PubDate string `xml:"pubDate"` // Fri, 24 Nov 2017 00:00:00...
Polygon []string `xml:"polygon"` // 64.0000 -98.0000 64.0000 ...
Download string `xml:"download"` // https://geoscan.nrcan.gc....
License string `xml:"license"` // http://data.gc.ca/eng/ope...
Author string `xml:"author"` // Geological Survey of Cana...
Source string `xml:"source"` // Geological Survey of Cana...
SndSeries string `xml:"SndSeries"` // Bedford Institute of Ocea...
Publisher string `xml:"publisher"` // Natural Resources Canada,...
Edition string `xml:"edition"` // prelim., surficial data m...
Meeting string `xml:"meeting"` // Geological Association of...
Documenttype string `xml:"documenttype"` // serial, open file, serial...
Language string `xml:"language"` // English, English, English...
Maps string `xml:"maps"` // 1 map, 5 maps, Publicatio...
Mapinfo string `xml:"mapinfo"` // surficial geology, surfic...
Medium string `xml:"medium"` // on-line; digital, digital...
Province string `xml:"province"` // Nunavut, Northwest Territ...
Nts string `xml:"nts"` // 066B, 095J; 095N; 095O; 0...
Area string `xml:"area"` // Aberdeen Lake, Mackenzie ...
Subjects string `xml:"subjects"`
Program string `xml:"program"` // GEM2: Geo-mapping for Ene...
Project string `xml:"project"` // Rae Province Project Mana...
Projectnumber string `xml:"projectnumber"` // 340521, 343202, 340557, 3...
Abstract string `xml:"abstract"` // This new surficial geolog...
Links string `xml:"links"` // Online - En ligne (PDF, 9...
Readme string `xml:"readme"` // readme | https://geoscan....
PPIid string `xml:"PPIid"` // 34532, 35096, 35438, 2563...
} `xml:"item"`
} `xml:"channel"`
}
Online
- try online via WASM: https://xml-to-go.github.io/, thanks YaroslavPodorvanov!
- try it online at https://blog.kowalczyk.info/tools/xmltogo/ -- thanks, kjk!
About
Upsides:
- it works fine for non-recursive structures,
- does not need XSD or DTD,
- it is relatively convenient to access attributes, children and text,
- will generate a single struct, which make for a quite compact representation,
- simple user interface,
- comments with examples,
- schema inference across multiple files.
Downsides:
- experimental, early, buggy, unstable prototype,
- no support for recursive types (similar to Russian Doll strategy, [1])
- no type inference, everything is accessible as string (without a schema, type inference may fail if the type guess is wrong)
Bugs:
Mapping between XML elements and data structures is inherently flawed: an XML element is an order-dependent collection of anonymous values, while a data structure is an order-independent collection of named values.
https://golang.org/pkg/encoding/xml/#pkg-note-BUG
Related projects:
- https://github.com/bemasher/JSONGen
- https://github.com/dutchcoders/XMLGen
- https://github.com/gnewton/chidley
- https://github.com/twpayne/go-xmlstruct
And other awesome XML utilities.
Presentations:
Install
$ go install github.com/miku/zek/cmd/zek@latest
Debian and RPM packages:
It's in AUR, too.
Usage
$ zek -h
-B use a fixed banner string (e.g. for CI)
-C emit less compact struct
-F skip formatting
-I use verbatim innerxml instead of chardata
-P string
if set, write out struct within a package with the given name
-S int
read at most this many tags, approximately (0=unlimited)
-c emit more compact struct (noop, as this is the default since 0.1.7)
-d debug output
-e add comments with example
-j add JSON tags
-m omit empty Text fields
-max-examples int
limit number of examples (default 10)
-n string
use a different name for the top-level struct
-o string
if set, write to output file, not stdout
-p write out an example program
-s strict parsing and writing
-t string
emit struct for tag matching this name
-u filter out duplicated examples
-version
show version
-x int
max chars for example (default 25)
Examples:
$ cat fixtures/a.xml
<a></a>
$ zek -C < fixtures/a.xml
type A struct {
XMLName xml.Name `xml:"a"`
Text string `xml:",chardata"`
}
Debug output dumps the internal tree as JSON to stdout.
$ zek -d < fixtures/a.xml
{"name":{"Space":"","Local":"a"}}
Example program:
package main
import (
"encoding/json"
"encoding/xml"
"fmt"
"log"
"os"
)
// A was generated 2017-12-05 17:35:21 by tir on apollo.
type A struct {
XMLName xml.Name `xml:"a"`
Text string `xml:",chardata"`
}
func main() {
dec := xml.NewDecoder(os.Stdin)
var doc A
if err := dec.Decode(&doc); err != nil {
log.Fatal(err)
}
b, err := json.Marshal(doc)
if err != nil {
log.Fatal(err)
}
fmt.Println(string(b))
}
$ zek -C -p < fixtures/a.xml > sample.go && go run sample.go < fixtures/a.xml | jq . && rm sample.go
{
"XMLName": {
"Space": "",
"Local": "a"
},
"Text": ""
}
More complex example:
$ zek < fixtures/d.xml
// Root was generated 2019-06-11 16:27:04 by tir on hayiti.
type Root struct {
XMLName xml.Name `xml:"root"`
Text string `xml:",chardata"`
A []struct {
Text string `xml:",chardata"`
B []struct {
Text string `xml:",chardata"`
C string `xml:"c"`
D string `xml:"d"`
} `xml:"b"`
} `xml:"a"`
}
$ zek -p < fixtures/d.xml > sample.go && go run sample.go < fixtures/d.xml | jq . && rm sample.go
{
"XMLName": {
"Space": "",
"Local": "root"
},
"Text": "\n\n\n\n",
"A": [
{
"Text": "\n \n \n",
"B": [
{
"Text": "\n \n ",
"C": "Hi",
"D": ""
},
{
"Text": "\n \n \n ",
"C": "World",
"D": ""
}
]
},
{
"Text": "\n \n",
"B": [
{
"Text": "\n \n ",
"C": "Hello",
"D": ""
}
]
},
{
"Text": "\n \n",
"B": [
{
"Text": "\n \n ",
"C": "",
"D": "World"
}
]
}
]
}
Annotate with comments:
$ zek -e < fixtures/l.xml
// Records was generated 2019-06-11 16:29:35 by tir on hayiti.
type Records struct {
XMLName xml.Name `xml:"Records"`
Text string `xml:",chardata"` // \n
Xsi string `xml:"xsi,attr"`
Record []struct {
Text string `xml:",chardata"`
Header struct {
Text string `xml:",chardata"`
Status string `xml:"status,attr"`
Identifier string `xml:"identifier"` // oai:ojs.localhost:article...
Datestamp string `xml:"datestamp"` // 2009-06-24T14:48:23Z, 200...
SetSpec string `xml:"setSpec"` // eppp:ART, eppp:ART, eppp:...
} `xml:"header"`
Metadata struct {
Text string `xml:",chardata"`
Rfc1807 struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
Xsi string `xml:"xsi,attr"`
SchemaLocation string `xml:"schemaLocation,attr"`
BibVersion string `xml:"bib-version"` // v2, v2, v2...
ID string `xml:"id"` // http://jou...
Entry string `xml:"entry"` // 2009-06-24...
Organization []string `xml:"organization"` // Proceeding...
Title string `xml:"title"` // Introducti...
Type string `xml:"type"`
Author []string `xml:"author"` // KRAMPEN, G..
Copyright string `xml:"copyright"` // Das Urhebe...
OtherAccess string `xml:"other_access"` // url:http:/...
Keyword string `xml:"keyword"`
Period []string `xml:"period"`
Monitoring string `xml:"monitoring"`
Language string `xml:"language"` // en, en, en, e...
Abstract string `xml:"abstract"` // After a short...
Date string `xml:"date"` // 2009-06-22 12...
} `xml:"rfc1807"`
} `xml:"metadata"`
About string `xml:"about"`
} `xml:"Record"`
}
Only consider a nested element
$ zek -t metadata fixtures/z.xml
// Metadata was generated 2019-06-11 16:33:26 by tir on hayiti.
type Metadata struct {
XMLName xml.Name `xml:"metadata"`
Text string `xml:",chardata"`
Dc struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
Title struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
} `xml:"title"`
Identifier struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
} `xml:"identifier"`
Rights struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
Lang string `xml:"lang,attr"`
} `xml:"rights"`
AccessRights struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
} `xml:"accessRights"`
} `xml:"dc"`
}
Inference across files
$ zek fixtures/a.xml fixtures/b.xml fixtures/c.xml
// A was generated 2017-12-05 17:40:14 by tir on apollo.
type A struct {
XMLName xml.Name `xml:"a"`
Text string `xml:",chardata"`
B []struct {
Text string `xml:",chardata"`
} `xml:"b"`
}
This is also useful, if you deal with archives containing XML files:
$ unzip -p 4082359.zip '*.xml' | zek -e
Given a directory full of zip files, you can combined find, unzip and zek:
$ for i in $(find ftp/b571 -type f -name "*zip"); do unzip -p $i '*xml'; done | zek -e
Another example (tarball with thousands of XML files, seemingly MARC):
$ tar -xOzf /tmp/20180725.125255.tar.gz | zek -e
// OAIPMH was generated 2018-09-26 15:03:29 by tir on sol.
type OAIPMH struct {
XMLName xml.Name `xml:"OAI-PMH"`
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
Xsi string `xml:"xsi,attr"`
SchemaLocation string `xml:"schemaLocation,attr"`
ListRecords struct {
Text string `xml:",chardata"`
Record struct {
Text string `xml:",chardata"`
Header struct {
Text string `xml:",chardata"`
Identifier struct {
Text string `xml:",chardata"` // aleph-pub:000000001, ...
} `xml:"identifier"`
} `xml:"header"`
Metadata struct {
Text string `xml:",chardata"`
Record struct {
Text string `xml:",chardata"`
Xmlns string `xml:"xmlns,attr"`
Xsi string `xml:"xsi,attr"`
SchemaLocation string `xml:"schemaLocation,attr"`
Leader struct
Text string `xml:",chardata"` // 00001nM2.01200024
} `xml:"leader"`
Controlfield []struct {
Text string `xml:",chardata"` // 00001nM2.01200024
Tag string `xml:"tag,attr"`
} `xml:"controlfield"`
Datafield []struct {
Text string `xml:",chardata"`
Tag string `xml:"tag,attr"`
Ind1 string `xml:"ind1,attr"`
Ind2 string `xml:"ind2,attr"`
Subfield []struct {
Text string `xml:",chardata"` // KM0000002
Code string `xml:"code,attr"`
} `xml:"subfield"`
} `xml:"datafield"`
} `xml:"record"`
} `xml:"metadata"`
} `xml:"record"`
} `xml:"ListRecords"`
}
Generate a package
If you want in include generated file in the build process, e.g. with go
generate, you may find -P
and -o
helpful.
$ cat fixtures/b.xml
<a><b></b></a>
Run on the command line or via go generate:
$ zek -P mypkg -o data.go < fixtures/b.xml
This would write out the following in data.go
file:
// Code generated by zek; DO NOT EDIT.
package mypkg
import "encoding/xml"
// A was generated 2021-09-16 11:23:06 by tir on trieste.
type A struct {
XMLName xml.Name `xml:"a"`
Text string `xml:",chardata"`
B string `xml:"b"`
}
Note that any existing file will be overwritten, without any warning.
Use innerxml instead of chardata
You may want chardata
or innerxml
tag. Default is chardata
, to use innerxml
use the -I
flag.
$ zek -B -I fixtures/d.xml
// Root was generated automatically by zek 0.1.24. DO NOT EDIT.
type Root struct {
XMLName xml.Name `xml:"root"`
Text string `xml:",innerxml"`
A []struct {
Text string `xml:",innerxml"`
B []struct {
Text string `xml:",innerxml"`
C string `xml:"c"`
D string `xml:"d"`
} `xml:"b"`
} `xml:"a"`
}
Misc
As a side effect, zek seems to be a useful for debugging. Example:
This record is emitted from a typical OAI server (OJS, not even uncommon), yet one can quickly spot the flaw in the structure.
Over 30 different struct generated manually in the course of a few hours (around five minutes per source): https://git.io/vbTDo.
-- Current extent leader: 1532 lines struct