Awesome
Corpus Query Language Parser
This library implements a Corpus Query Language Parser in Java 1.6, using Antlr v4.
As no offical Corpus Query Language specification is available, the gramar definition for this parser was derived from running jjdoc
against the javacc cql
grammar from the Institute of Dutch Lexicology BlackLab project.
The parser generates an AST (Abstract Syntax Tree) which you can then use in your own application for whatever you wish. The class Main shows how the parser can be used. You can also execute Main
as an application if you want to understand the node-tree produced by the parser.
Obtaining
The compiled artifact can be obtained from Maven Central by adding the following to the <dependencies>
section of your pom.xml
:
<dependency>
<groupId>com.evolvedbinary.cql</groupId>
<artifactId>corpusql-parser</artifactId>
<version>1.2.0-SNAPSHOT</version>
</dependency>
If you are a Scala, Groovy or Clojure person then you can still use the artifact from Maven Central with your favourite build tool, however I will assume you know what you're doing ;-)
Example
An example of this parser being used in another application is the Corpus Query Language Module for eXist-db; Which shows how to traverse the AST for a custom application, in this specific case generating an XML vocabulary of Corpus Query Language.
Future Work
- Provide some tutorials or better documentation