Home

Awesome

RDF Processing Toolkit

News

Previous entries

Overview

RDF/SPARQL Workflows on the Command Line made easy. The RDF Processing Toolkit (RPT) integrates several of our tools into a single CLI frontend: It features commands for running SPARQL-statements on triple and quad based data both streaming and static. SPARQL extensions for working with CSV, JSON and XML are included. So is an RML toolkit that allows one to convert RML to SPARQL (or TARQL). Ships with Jena's ARQ and TDB SPARQL engines as well as one based on Apache Spark.

RPT is Java tool which comes with debian and rpm packaging. It is invoked using rpt <command> where the following commands are supported:

Check this documentation for the supported SPARQL extensions with many examples

Example Usage

rpt integrate data.nt update.ru more-data.ttl query.rq

rpt integrate --jq file.ttl '?s { ?s a foaf:Person }' | jq '.[].s'
# Group RDF into graph based on consecutive subjects and for each named graph count the number of triples
cat file.ttl | ngs subjects | ngs map --sparql 'CONSTRUCT { ?s eg:triples ?c} { SELECT ?s COUNT(*) { ?s ?p ?o } GROUP ?s }

# Count number of named graphs
rpt ngs wc file.trig

# Output the first 3 graphs produced by another command
./produce-graphs.sh | ngs head -n 3

Canned Queries

RPT ships with several useful queries on its classpath. Classpath resources can be printed out using cpcat. The following snippet shows examples of invocations and their output:

Overview

$ rpt cpcat spo.rq
CONSTRUCT WHERE { ?s ?p ?o }

$ rpt cpcat gspo.rq
CONSTRUCT WHERE { GRAPH ?g { ?s ?p ?o } }

Any resource (query or data) on the classpath can be used as an argument to the integrate command:

rpt integrate yourdata.nt spo.rq
# When spo.rq is executed then the data is queried and printed out on STDOUT

Reference

The exact definitions can be viewed with rpt cpcat resource.rq.

Example Use Cases

Building

The build requires maven.

For convenience, this Makefile which defines essential goals for common tasks. To build a "jar-with-dependencies" use the distjar goal. The path to the created jar bundle is shown when the build finishes. In order to build and and install a deb or rpm package use the deb-rere or rpm-rere goals, respectively.

$ make

make help                # Show these help instructions
make distjar             # Create only the standalone jar-with-dependencies of rpt
make rpm-rebuild         # Rebuild the rpm package (minimal build of only required modules)
make rpm-reinstall       # Reinstall rpm (requires prior build)
make rpm-rere            # Rebuild and reinstall rpm package
make deb-rebuild         # Rebuild the deb package (minimal build of only required modules)
make deb-reinstall       # Reinstall deb (requires prior build)
make deb-rere            # Rebuild and reinstall deb package
make docker              # Build Docker image
make release-bundle      # Create files for Github upload

A docker image is available at https://registry.hub.docker.com/r/aksw/rpt

License

The source code of this repo is published under the Apache License Version 2.0. Dependencies may be licensed under different terms. When in doubt please refer to the licenses of the dependencies declared in the pom.xml files. The dependency tree can be viewed with Maven using mvn dependency:tree.

Acknowledgements

History