Home

Awesome

Corpus Query Language Parser

Build Status Java 8+ License

This library implements a Corpus Query Language Parser in Java 1.6, using Antlr v4.

As no offical Corpus Query Language specification is available, the gramar definition for this parser was derived from running jjdoc against the javacc cql grammar from the Institute of Dutch Lexicology BlackLab project.

The parser generates an AST (Abstract Syntax Tree) which you can then use in your own application for whatever you wish. The class Main shows how the parser can be used. You can also execute Main as an application if you want to understand the node-tree produced by the parser.

Obtaining

The compiled artifact can be obtained from Maven Central by adding the following to the <dependencies> section of your pom.xml:

<dependency>
    <groupId>com.evolvedbinary.cql</groupId>
    <artifactId>corpusql-parser</artifactId>
    <version>1.2.0-SNAPSHOT</version>
</dependency>

If you are a Scala, Groovy or Clojure person then you can still use the artifact from Maven Central with your favourite build tool, however I will assume you know what you're doing ;-)

Example

An example of this parser being used in another application is the Corpus Query Language Module for eXist-db; Which shows how to traverse the AST for a custom application, in this specific case generating an XML vocabulary of Corpus Query Language.

Future Work