Home

Awesome

jump to a detailed explanation

jump straight to a simple try-it-yourself-example

jump to the Darwin Core translator page

Guid-O-Matic

Software to convert fielded text (CSV) files to RDF serialized as XML, Turtle, or JSON-LD

"You can write better software than this..."

Note: 2018-08-29. There is a problem with JSON-LD generated by Guid-O-Matic. If a property is repeated, Guid-O-Matic creates two triples - repeating the property with the value each time. This is valide JSON-LD, but consuming applications only recognize the last instance of the property for that subject. The appropriate syntax is to include the multiple values as an array of a single property. So this is only a problem when a subject had duplicate predicates.

What is the purpose of Guid-O-Matic ?

Best Practices in the biodiversity informatics community, as embodied in the TDWG GUID Applicability Statement dictate that globally unique identifiers (GUIDs, rhymes with "squids") should be resolvable (i.e. dereferenceable, Recommendation R7) and that the default metadata response format should be RDF serialized as XML (Recommendation R10). In practice, machine-readable metadata is rarely provided when the requested content-type is some flavor of RDF. I think the reason is because people think it is "too hard" to generate the necessary RDF.

The purpose of Guid-O-Matic is mostly to show that it is not really that hard to create RDF. Anybody who can create a spreadsheet or a Darwin Core Archive (DwCa) can generate RDF with little additional effort. In production, providers would probably not use spreadsheets as a data source, but the point of Guid-O-Matic is to demonstrate a general strategy and allow users to experiment with different graph structures and play with the generated serializations.

Why is it called "Guid-O-Matic" and not something like "RDF-Generator-O-Matic"?

Because I already had the cute squid picture and "RDF Generator" doesn't rhyme with "squid".

Why did you write this script in XQuery and not something like Python or PHP?

I am not a very good Python programmer and I don't know PHP. Once you understand what Guid-O-Matic does, you can write your own (better) code to do the same thing.

I used XQuery because I'm in a working group that includes a lot of Digital Humanists, and they love XML. Also, the awesome XQuery processor, BaseX, is free and easily downloaded and installed. So anybody can easily run the Guid-O-Matic scripts. In addition, BaseX can run as a web server, so in theory, one could call the RDF-generating functions in response to a HTTP request and actually use the scripts to provide RDF online.

What did Guid-O-Matic 1.1 do?

I wrote Guid-O-Matic 1 in about 2010. Version 1.1 had a very limited scope:

Version 1.1 also was written in an old version of Visual Basic, which had the advantage that it could run as an executable, but had the disadvantage that you couldn't hack it unless you had a copy of Visual Basic and knew how to use it. Even I don't have a functioning copy of that version of Visual Basic any more, so I can't even look at the source code now. But it doesn't really matter because I don't advise that anyone try to mess with version 1.1 anyway. I'm only posting it here for historical reasons (and so that you can try running it to see the great squid graphic on the UI!).

What does Guid-O-Matic 2 do?

Version 2 is intended to be as general as is practical considering that the source data are being pulled from a CSV file. It:

Version 2 is written in XQuery (a W3C Recommendation). It can be run using BaseX, a free XQuery processor. Instructions for setting everything up are elsewhere.

In addition to the main script that generates the RDF, there is an additional script that processes a Darwin Core Archive so that it can be used as source data. It pulls information from the meta.xml file to generate hackable mappings from the CSV files to the RDF.

Can I try it?

Yes, please do! If all you want to do is see what happens, do the following:

You can play around with changing the identifier for the focal resource (the first parameter of the function) to generate RDF for other temple sites, and the serialization (the second parameter). Suggested values are given in the comments above the function.

If you want to try more complicated things like changing the properties or graph model, or if you want to set up mappings for your own data, you will need to read more detailed instructions. To take it a step further and try using a Darwin Core archive as input also requires reading more instructions.

* Tang-Song temple data provided by Dr. Tracy Miller of the Vanderbilt University Department of History of Art, who graciously let us use her data as a guinea pig in our Semantic Web working group. Please contact her for more information about the data.