Awesome
rdftools
rdftools is a python wrapper over a number of RDF related tools
- rdf parsers / serializers
- void utilities
- lubm generator
- etc
Important Notes
This software is the product of research carried out at the University of Zurich and comes with no warranty whatsoever. Have fun!
TODO's
- The project is not documented (yet)
How to Compile/Install the Project
Ensure that libraptor2 v2.0.13+ and cityhash are installed on your system (either using the package manager of the OS or compiled from source).
To install rdftools you have two options: 1) manual installation (install requirements first) or 2) automatic with pip
Manual installation:
$ git clone https://github.com/cosminbasca/rdftools
$ cd rdftools
$ python setup.py install
Install the project with pip:
$ pip install https://github.com/cosminbasca/rdftools
Also have a look at the build.sh, clean.sh, test.sh scripts included in the codebase
To include the latest JVM RDF tools update to the latest of jvmrdftools and create an assembly:
$ sbt compile assembly
copy the resulting jar from the target folder to the lib folder inside the rdftools.tools.jvmrdftools module and reinstall the python package.
The tools
To find out what a tool does, simply supply the --help comand line argument to any of the tools Available tools:
- rdfconvert, convert RDF files from source format to a destination format using the libraptor2 C RDF parser
usage: rdfconvert [-h] [--clear] [--dst_format DST_FORMAT]
[--buffer_size BUFFER_SIZE] [--version]
SOURCE
rdftools v0.9.2, rdf converter, based on libraptor2
positional arguments:
SOURCE the source file or location (of files) to be converted
optional arguments:
-h, --help show this help message and exit
--clear clear the original files (delete) - this action is
permanent, use with caution!
--dst_format DST_FORMAT
the destination format to convert to. Supported
parsers: ['rdfxml', 'ntriples', 'turtle', 'trig',
'guess', 'rss-tag-soup', 'rdfa', 'nquads', 'grddl'].
Supported serializers ['rdfxml', 'rdfxml-abbrev',
'turtle', 'ntriples', 'rss-1.0', 'dot', 'html',
'json', 'atom', 'nquads'].
--buffer_size BUFFER_SIZE
the buffer size in Mb of the input buffer (the parser
will only parse XX Mb at a time)
--version the current version
- rdfconvert2 convert RDF files from source format to a destination format using the rdf2rdf java RDF parser
usage: rdfconvert2 [-h] [--clear] [--dst_format DST_FORMAT]
[--workers WORKERS] [--version]
SOURCE
rdftools v0.9.2, rdf converter (2), makes use of rdf2rdf bundled - requires
java
positional arguments:
SOURCE the source file or location (of files) to be converted
optional arguments:
-h, --help show this help message and exit
--clear clear the original files (delete) - this action is
permanent, use with caution!
--dst_format DST_FORMAT
the destination format to convert to
--workers WORKERS the number of workers (default -1 : all cpus)
--version the current version
- rdfencode, endode an ntriples file to a binary format (each S, P, O string is hashed with cityhash 64 bit)
usage: rdfencode [-h] [--version] SOURCE
rdftools v0.9.2, encode the RDF file(s)
positional arguments:
SOURCE the source file or location (of files) to be encoded
optional arguments:
-h, --help show this help message and exit
--version the current version
- genlubm, generate a LUBM dataset (in parallel)
usage: genlubm [-h] [--univ UNIV] [--index INDEX] [--seed SEED]
[--ontology ONTOLOGY] [--workers WORKERS] [--version]
OUTPUT
rdftools v0.9.2, lubm dataset generator wrapper (bundled) - requires java
positional arguments:
OUTPUT the location in which to save the generated
distributions
optional arguments:
-h, --help show this help message and exit
--univ UNIV number of universities to generate
--index INDEX start university
--seed SEED the seed
--ontology ONTOLOGY the lubm ontology
--workers WORKERS the number of workers (default -1 : all cpus)
--version the current version
- genlubmdistro generate a LUBM dataset (in parallel) and mix the universities to N sites with the specified distribution
usage: genlubmdistro [-h] [--distro DISTRO] [--univ UNIV] [--index INDEX]
[--seed SEED] [--ontology ONTOLOGY] [--pdist PDIST]
[--sites SITES] [--clean] [--workers WORKERS] [--version]
OUTPUT
rdftools v0.9.4, lubm dataset generator wrapper (bundled) - requires java
positional arguments:
OUTPUT the location in which to save the generated
distributions
optional arguments:
-h, --help show this help message and exit
--distro DISTRO the distibution to use, valid values are ['seedprop',
'uni2many', 'horizontal', 'uni2one']
--univ UNIV number of universities to generate
--index INDEX start university
--seed SEED the seed
--ontology ONTOLOGY the lubm ontology
--pdist PDIST the probabilities used for the uni2many distribution,
valid choices are ['3S', '7S', '5S'] or file with
probabilities split by line
--sites SITES the number of sites
--clean delete the generated universities
--workers WORKERS the number of workers (default -1 : all cpus)
--version the current version
- genvoid, generate VoID statistics from the source file
usage: genvoid [-h] [--version] SOURCE
rdftools v0.9.2, generate void statistics for RDF source file
positional arguments:
SOURCE the source file to be analized
optional arguments:
-h, --help show this help message and exit
--version the current version
- genvoid2, generate VoID statistics from the RDF source file, using the nxparser VoID exporter
usage: genvoid2 [-h] [--dataset_id DATASET_ID] [--use_nx] [--version] SOURCE
rdftools v0.9.2, generate a VoiD descriptor using the nxparser java package
positional arguments:
SOURCE the source file to be analized
optional arguments:
-h, --help show this help message and exit
--dataset_id DATASET_ID
dataset id
--use_nx if true (default false) use the nx parser builtin void
generator
--version the current version
- ntround, round all numeric literals (typed or untyped) in an ntriples files with the given precision
usage: ntround [-h] [--prefix PREFIX] [--precision PRECISION] [--version] PATH
rdftools v0.9.2, rounds ntriple files in a folder, (rounds the floating point literals)
positional arguments:
PATH location of the indexes
optional arguments:
-h, --help show this help message and exit
--prefix PREFIX the prefix used for files that are transformed, cannot
be the enpty string!
--precision PRECISION
the precision to round to, if 0, floating point
numbers are rounded to long
--version the current version
Thanks a lot to
- University of Zurich and the Swiss National Science Foundation for generously funding the research that led to this software.