Home

Awesome

rdftools

rdftools is a python wrapper over a number of RDF related tools

Important Notes

This software is the product of research carried out at the University of Zurich and comes with no warranty whatsoever. Have fun!

TODO's

How to Compile/Install the Project

Ensure that libraptor2 v2.0.13+ and cityhash are installed on your system (either using the package manager of the OS or compiled from source).

To install rdftools you have two options: 1) manual installation (install requirements first) or 2) automatic with pip

Manual installation:

$ git clone https://github.com/cosminbasca/rdftools
$ cd rdftools
$ python setup.py install

Install the project with pip:

$ pip install https://github.com/cosminbasca/rdftools

Also have a look at the build.sh, clean.sh, test.sh scripts included in the codebase

To include the latest JVM RDF tools update to the latest of jvmrdftools and create an assembly:

$ sbt compile assembly

copy the resulting jar from the target folder to the lib folder inside the rdftools.tools.jvmrdftools module and reinstall the python package.

The tools

To find out what a tool does, simply supply the --help comand line argument to any of the tools Available tools:

usage: rdfconvert [-h] [--clear] [--dst_format DST_FORMAT]
                  [--buffer_size BUFFER_SIZE] [--version]
                  SOURCE

rdftools v0.9.2, rdf converter, based on libraptor2

positional arguments:
  SOURCE                the source file or location (of files) to be converted

optional arguments:
  -h, --help            show this help message and exit
  --clear               clear the original files (delete) - this action is
                        permanent, use with caution!
  --dst_format DST_FORMAT
                        the destination format to convert to. Supported
                        parsers: ['rdfxml', 'ntriples', 'turtle', 'trig',
                        'guess', 'rss-tag-soup', 'rdfa', 'nquads', 'grddl'].
                        Supported serializers ['rdfxml', 'rdfxml-abbrev',
                        'turtle', 'ntriples', 'rss-1.0', 'dot', 'html',
                        'json', 'atom', 'nquads'].
  --buffer_size BUFFER_SIZE
                        the buffer size in Mb of the input buffer (the parser
                        will only parse XX Mb at a time)
  --version             the current version
usage: rdfconvert2 [-h] [--clear] [--dst_format DST_FORMAT]
                   [--workers WORKERS] [--version]
                   SOURCE

rdftools v0.9.2, rdf converter (2), makes use of rdf2rdf bundled - requires
java

positional arguments:
  SOURCE                the source file or location (of files) to be converted

optional arguments:
  -h, --help            show this help message and exit
  --clear               clear the original files (delete) - this action is
                        permanent, use with caution!
  --dst_format DST_FORMAT
                        the destination format to convert to
  --workers WORKERS     the number of workers (default -1 : all cpus)
  --version             the current version
usage: rdfencode [-h] [--version] SOURCE

rdftools v0.9.2, encode the RDF file(s)

positional arguments:
  SOURCE      the source file or location (of files) to be encoded

optional arguments:
  -h, --help  show this help message and exit
  --version   the current version
usage: genlubm [-h] [--univ UNIV] [--index INDEX] [--seed SEED]
               [--ontology ONTOLOGY] [--workers WORKERS] [--version]
               OUTPUT

rdftools v0.9.2, lubm dataset generator wrapper (bundled) - requires java

positional arguments:
  OUTPUT               the location in which to save the generated
                       distributions

optional arguments:
  -h, --help           show this help message and exit
  --univ UNIV          number of universities to generate
  --index INDEX        start university
  --seed SEED          the seed
  --ontology ONTOLOGY  the lubm ontology
  --workers WORKERS    the number of workers (default -1 : all cpus)
  --version            the current version
usage: genlubmdistro [-h] [--distro DISTRO] [--univ UNIV] [--index INDEX]
                     [--seed SEED] [--ontology ONTOLOGY] [--pdist PDIST]
                     [--sites SITES] [--clean] [--workers WORKERS] [--version]
                     OUTPUT

rdftools v0.9.4, lubm dataset generator wrapper (bundled) - requires java

positional arguments:
  OUTPUT               the location in which to save the generated
                       distributions

optional arguments:
  -h, --help           show this help message and exit
  --distro DISTRO      the distibution to use, valid values are ['seedprop',
                       'uni2many', 'horizontal', 'uni2one']
  --univ UNIV          number of universities to generate
  --index INDEX        start university
  --seed SEED          the seed
  --ontology ONTOLOGY  the lubm ontology
  --pdist PDIST        the probabilities used for the uni2many distribution,
                       valid choices are ['3S', '7S', '5S'] or file with
                       probabilities split by line
  --sites SITES        the number of sites
  --clean              delete the generated universities
  --workers WORKERS    the number of workers (default -1 : all cpus)
  --version            the current version
usage: genvoid [-h] [--version] SOURCE

rdftools v0.9.2, generate void statistics for RDF source file

positional arguments:
  SOURCE      the source file to be analized

optional arguments:
  -h, --help  show this help message and exit
  --version   the current version
usage: genvoid2 [-h] [--dataset_id DATASET_ID] [--use_nx] [--version] SOURCE

rdftools v0.9.2, generate a VoiD descriptor using the nxparser java package

positional arguments:
  SOURCE                the source file to be analized

optional arguments:
  -h, --help            show this help message and exit
  --dataset_id DATASET_ID
                        dataset id
  --use_nx              if true (default false) use the nx parser builtin void
                        generator
  --version             the current version
usage: ntround [-h] [--prefix PREFIX] [--precision PRECISION] [--version] PATH

rdftools v0.9.2, rounds ntriple files in a folder, (rounds the floating point literals)

positional arguments:
  PATH                  location of the indexes

optional arguments:
  -h, --help            show this help message and exit
  --prefix PREFIX       the prefix used for files that are transformed, cannot
                        be the enpty string!
  --precision PRECISION
                        the precision to round to, if 0, floating point
                        numbers are rounded to long
  --version             the current version

Thanks a lot to